Discovering Data
  • Home
  • Blog

Cleaning data

4/26/2017

0 Comments

 
Get two months free premium Skillshare membership with this affiliate link: Skillshare

Data set used - mental health in IT, available on Kaggle

Data can be messy. The data set above for example had a 'Gender' field which contained many variations on male and female, for example: Male, male, M, m, man, F, f, Female and so on. The first thing I wanted to do was set all values to either male or female. One way to do this is with the following code:

    

0 Comments



Leave a Reply.

    This blog includes:

    Scripts mainly in Python with a few in R covering NLP, Pandas, Matplotlib and others. See the home page for links to some of the scripts.  Also includes some explanations of basic data science terminology.

    Archives

    October 2018
    June 2018
    April 2018
    June 2017
    April 2017
    March 2017
    February 2017
    January 2017
    November 2016
    September 2016
    July 2016
    June 2016
    May 2016
    November 2015
    November 2014

    RSS Feed

Proudly powered by Weebly
  • Home
  • Blog