Discovering Data
  • Home
  • Blog

Unusual factors affecting your salary

7/9/2017

0 Comments

 
Picture
Since 2011 Stackoverflow have conducted a developer survey. The raw data can be downloaded from here. The following is a brief analysis of part of the 2017 survey data. Specifically analysis of some factors affecting salary. David Robinson, data scientist at Stack Overflow found an unusual relationship between the use of tabs and spaces and salary. His R code is available here. I decided to see if I could recreate the results using Python and Pandas. My Python code is available here. The following graph clearly shows the relationship for the 2017 data.
Picture
The 2015 survey also includes spaces v tabs data but the salary and experience data is in a different format making a direct comparison with the 2017 data difficult but it is possible to plot the data and see if there a similar correlation. The next graph shows that there may be a similar relationship.
Picture
It is difficult to explain this relationship however correlation does not mean causation. I also looked at the 2017 data for Masters v Degrees. When you take the overall average for developers with a Masters degree you find they have a slightly higher salary than developers with just a degree. However when you divide the data by experience this is no longer true. Now it seems having a Masters degree will negatively impact on your salary as you become more experience, see the graph below:
Picture
Again this is not what I expected.
0 Comments

Survivability on the Titanic

7/2/2017

0 Comments

 
Picture
The dataset is available here.
The following Python script shows the percentages of survivors for different groups:

import pandas as pd

df = pd.read_csv('titanic3.csv')
df_male = df[df['sex']=='male']
df_female = df[df['sex']=='female']

df_class_group = df.groupby('pclass').mean()
df_class_group_male = df_male.groupby('pclass').mean()
df_class_group_female = df_female.groupby('pclass').mean()

print(df['survived'].mean())
print(df_female['survived'].mean())
print(df_male['survived'].mean())
print(df_class_group)
print(df_class_group_male)
print(df_class_group_female)


Only 38% of the passengers survived the sinking but this is only part of the story, we can dig down further to see how belonging to different groups would determine a passenger's chances of survival. If we divide the passengers into male and female we can see that only 19% of male passengers survived whereas 73% of female passengers survived. We can also divide by class - there were 3 classes of ticket on the Titanic: first, second and third. The percentage survival rate (male and female) by class were:
Class
Survival rate
First
62%
Second
43%
Third
26%
If we divide by both gender and class:
Class
Male
Female
First
34.1
96.5
Second
14.6
88.7
Third
15.2
49.1
It is clear that first class female passengers had the best chance of survival. It is also interesting that  the class divisions break down for male second and third class passengers, in the case of male passengers being a second class passenger did not increase your chances of survival compared to male third class passengers.

The difference in ticket price:
Class
average price (1912 money)
price in today's money
First
£87.50
£7,000 (about $9,100)
Second
£21.18
£1,700 (about $2,200)
Third
£13.30
£1,000 (about $1,400)
0 Comments

    Archives

    June 2018
    December 2017
    August 2017
    July 2017
    June 2017
    May 2017
    April 2017
    March 2017
    February 2017
    January 2017
    December 2016
    November 2016
    October 2016
    September 2016
    August 2016
    July 2016
    June 2016

    RSS Feed

Proudly powered by Weebly
  • Home
  • Blog