According to Wolfram Mathworld: A convenient definition of an To illustrate this I'll use the Titanic passenger dataset, specifically it has an Age column. Plotting Age as a box plot gives: So there are outliers at the top end of the data range, the open circles in the plot. If you find yourself in an interview for a data job you might be asked how you would identify and remove outliers from data, this is how you can do it in pandas: Identify outliers and remove them from the dataset
there are no lower outliers, the upper outliers are: 66, 65, 71, 70, 65, 65, 71, 66, 69.0, 80, 70, 70 and 74. df_filtered is the dataset minus the outliers.
0 Comments
## Leave a Reply. |
## AuthorAbout 12 years ago I decided I wanted to change career. I only had a vague notion that I'd like to 'work in IT'. Several years later I found data analytics - I had found my new home. ## Archives
January 2019
## Categories |