Data set used - mental health in IT, available on Kaggle
Data can be messy. The data set above for example had a 'Gender' field which contained many variations on male and female, for example: Male, male, M, m, man, F, f, Female and so on. The first thing I wanted to do was set all values to either male or female. One way to do this is with the following code:
This blog includes:
Scripts mainly in Python with a few in R covering NLP, Pandas, Matplotlib and others. See the home page for links to some of the scripts. Also includes some explanations of basic data science terminology.