According to Wolfram Mathworld:
A convenient definition of an outlier is a point which falls more than 1.5 times the interquartile range above the third quartile or below the first quartile
To illustrate this I'll use the Titanic passenger dataset, specifically it has an Age column. Plotting Age as a box plot gives:
So there are outliers at the top end of the data range, the open circles in the plot. If you find yourself in an interview for a data job you might be asked how you would identify and remove outliers from data, this is how you can do it in pandas:
Identify outliers and remove them from the dataset
there are no lower outliers, the upper outliers are: 66, 65, 71, 70, 65, 65, 71, 66, 69.0, 80, 70, 70 and 74.
df_filtered is the dataset minus the outliers.