Suicide rates are increasing:
Some of this increase might be explained by growing population for example say one percent of people kill themselves, if this percentage remains constant then the actual number of suicides will increase as the population size increases.
The magority of suicides are male:
different age groups show different trends, over 80s (dark green), 70 to 79 (purple), 60 to 69 (pink) and under 20 (blue) are all quite flat, whereas 20 to 29 (red), 30 to 39 (green), 40 to 49 (brown) and 50 to 59' (orange) all show increases. Or in other words the number of suicides in the age range, twenty to fifty nine is increasing.
Google Trends for 'iPhone slow' clearly shows a periodic spike in interest.
These spikes occur in September 2013, September 2014, September 2015, ....
The release schedule for iPhones is September 2013, September 2014, September 2015, ....
Is it coincidence that people search for 'iPhone slow' around the time of a new iPhone release? It seems not. Apple have now admitted (or partially admitted) that they do slow down older models, according to a BBC report they claim it was "to prolong the life of the devices". Other people suspect Apple deliberately slowed older models just before the release of each new iPhone to encourage iPhone users to upgrade.
In the Google Trends graph above interest searches for 'iPhone slow' rapidly spiked they gradually fell away, but in 2017 the pattern changed, the searches didn't fall away they continued and grew stronger. More people were perhaps becoming aware of what Apple has been up to.
The data set is available on Kaggle.
The code used to analyse the data is available here.
Data can and is used and abused by people with agendas - for example politicians. Let's say for some reason I want to convince you that CA made a greater sacrifice during the Vietnam war than any other state. The raw data backs up this claim but doesn't take into account CA's population size compared to other states.
Plotting the number of casualties by state gives:
This is a little misleading, because CA, TX and so on are large populous states so it is not surprising that the greatest number of casualties came from these states. I added in state populations from 1967 and calculated the casualty rates per capita per state, then plotted the data again:
This map suggests Missouri had a disproportionately large casualty rate normalised by state population compared to other states while California had a relatively low casualty rate.
Don't let the politicians, media or corporations use data to trick you.
Since 2011 Stackoverflow have conducted a developer survey. The raw data can be downloaded from here. The following is a brief analysis of part of the 2017 survey data. Specifically analysis of some factors affecting salary. David Robinson, data scientist at Stack Overflow found an unusual relationship between the use of tabs and spaces and salary. His R code is available here. I decided to see if I could recreate the results using Python and Pandas. My Python code is available here. The following graph clearly shows the relationship for the 2017 data.
The 2015 survey also includes spaces v tabs data but the salary and experience data is in a different format making a direct comparison with the 2017 data difficult but it is possible to plot the data and see if there a similar correlation. The next graph shows that there may be a similar relationship.
It is difficult to explain this relationship however correlation does not mean causation. I also looked at the 2017 data for Masters v Degrees. When you take the overall average for developers with a Masters degree you find they have a slightly higher salary than developers with just a degree. However when you divide the data by experience this is no longer true. Now it seems having a Masters degree will negatively impact on your salary as you become more experience, see the graph below:
Again this is not what I expected.
The dataset is available here.
The following Python script shows the percentages of survivors for different groups:
import pandas as pd
df = pd.read_csv('titanic3.csv')
df_male = df[df['sex']=='male']
df_female = df[df['sex']=='female']
df_class_group = df.groupby('pclass').mean()
df_class_group_male = df_male.groupby('pclass').mean()
df_class_group_female = df_female.groupby('pclass').mean()
Only 38% of the passengers survived the sinking but this is only part of the story, we can dig down further to see how belonging to different groups would determine a passenger's chances of survival. If we divide the passengers into male and female we can see that only 19% of male passengers survived whereas 73% of female passengers survived. We can also divide by class - there were 3 classes of ticket on the Titanic: first, second and third. The percentage survival rate (male and female) by class were:
If we divide by both gender and class:
It is clear that first class female passengers had the best chance of survival. It is also interesting that the class divisions break down for male second and third class passengers, in the case of male passengers being a second class passenger did not increase your chances of survival compared to male third class passengers.
The difference in ticket price:
The following website lists UK parliamentary petitions.
You can download the data in json or csv format.
The top ten petitions over the last few years are:
1 EU Referendum Rules triggering a 2nd EU Referendum
2 Prevent Donald Trump from making a State Visit to the United Kingdom.
3 Give the Meningitis B vaccine to ALL children, not just newborn babies.
4 Block Donald J Trump from UK entry
5 Stop all immigration and close the UK borders until ISIS is defeated.
6 Accept more asylum seekers and increase support for refugee migrants in the UK.
7 Consider a vote of No Confidence in Jeremy Hunt, Health Secretary
8 Donald Trump should make a State Visit to the United Kingdom.
9 Make the production, sale and use of cannabis legal.
10 Stop spending a fixed 0.7 per cent slice of our national wealth on Foreign Aid
Over the last year or so Brexit, the EU and a possible official visit by Trump to the UK are big issues, immigration and health are also prominent. There are some right wing petitions: 5, 8 and 10. Number 5 is calling for something like Trump's travel ban in the US. And some left wing petitions: 2, 4, 6 and 7.
If we take all the petition titles and remove the stop words we can generate the following word cloud:
The following post from my code blog explains how to generate word clouds.
The technical process:
[('northern', 147), ('ireland', 102), ('n', 99), ('dup', 88), ('new', 40),
('westminster', 29), ('uk', 26), ('support', 25), ('public', 25), ('national', 21),
('people', 18), ('united', 18), ('deal', 17), ('needs', 17), ('manifesto', 16),
('trade', 16), ('must', 14), ('also', 14), ('best', 14), ('wminster', 13), ('eu', 13),
('across', 13), ('irelands', 13), ('economic', 13), ('identity', 13), ('military', 12),
('would', 12), ('better', 12), ('positive', 12), ('believes', 12)]
[('northern', 51), ('ireland', 33), ('ulster', 23), ('unionist', 22), ('best', 18),
('united', 16), ('party', 16), ('brexit', 13), ('westminster', 13), ('mps', 12),
('local', 12), ('union', 12), ('work', 11), ('manifesto', 11), ('one', 11), ('remain', 9),
('great', 8), ('need', 8), ('many', 8), ('yet', 8), ('would', 8), ('public', 7),
('south', 7), ('tom', 7), ('executive', 7), ('kingdom', 7), ('people', 7), ('election', 7),
('danny', 6), ('better', 6)]
[('northern', 170), ('alliance', 110), ('ireland', 103), ('uk', 75),
('support', 65), ('ensure', 51), ('would', 43), ('change', 40), ('public', 37),
('economic', 35), ('also', 32), ('westminster', 32), ('work', 30), ('government', 30),
('european', 29), ('direction', 27), ('need', 27), ('deal', 26), ('across', 26),
('executive', 26), ('international', 25), ('political', 25), ('brexit', 24),
('must', 24), ('eu', 24), ('significant', 23),
('continue', 23), ('tax', 23), ('human', 22), ('welfare', 22)]
[('brexit', 44), ('eu', 35), ('sinn', 32), ('rights', 30), ('tory', 25), ('health', 23),
('north', 23), ('irish', 19), ('access', 18), ('european', 17), ('vote', 16),
('within', 16), ('party', 15), ('special', 14), ('services', 13), ('ireland', 13),
('status', 13), ('election', 13), ('cuts', 12), ('funding', 12), ('designated', 12),
('dup', 12), ('new', 11), ('unity', 10), ('trade', 9), ('priorities', 9), ('people', 9),
('british', 8), ('good', 8), ('friday', 8)]
[('tuv', 97), ('northern', 95), ('ireland', 81), ('politics', 42), ('would', 41),
('principled', 41), ('talking', 41), ('straight', 41), ('stormont', 35), ('sinn', 34),
('fein', 30), ('people', 27), ('need', 27), ('dup', 22), ('government', 22), ('public', 20),
('believes', 19), ('first', 19), ('uk', 19), ('must', 18), ('irish', 17), ('health', 16),
('brexit', 15), ('eu', 15), ('one', 15), ('money', 15), ('assembly', 15), ('could', 14),
('language', 14), ('mental', 14)]
[('sdlp', 262), ('northern', 154), ('ireland', 104), ('new', 97), ('ensure', 85),
('must', 58), ('support', 58), ('people', 55), ('believes', 51), ('also', 45),
('investment', 39), ('public', 39), ('housing', 39), ('education', 39), ('better', 37),
('government', 37), ('local', 35), ('development', 35), ('services', 32), ('strategy', 32),
('economy', 31), ('across', 31), ('health', 30), ('areas', 30), ('community', 30),
('sector', 29), ('economic', 28), ('work', 27), ('social', 27), ('future', 26)]
Note the SDLP manifesto for the Westminster election can not be scanned using the pdf to text script so I scanned their Stormont manifesto from March 2017.
If we define 'most interesting' as the person who got the most searches on Google from within Northern Ireland then the local politicians may be surprised/worried to learn that none of them make it into the top five. On May 10, 2017 the ranking was:
1. Jeremy Corbyn
2. Theresa May
3. Paul Nuttal
4. Tim Farron
5. Nicola Sturgeon
Three of the above politicians are leaders of parties that don't even organise in Northern Ireland.
A week ago the list was a little more encouraging for two of the local parties:
1. Theresa May
2. Jeremy Corbyn
3. Tim Farron
4. Arlene Foster
5. Gerry Adams
The dataset is available on Github.
If the yellowstone super volcano erupted it would send an umbrella cloud high into the atmosphere and an area up to 500 miles in diameter would be covered by up to 4 inches of dust. Much of the mid west would be impacted. Sulphur dioxide would also be released in large quantities causing acid rain and a rapid cooling of surface temperature across the planet.
The dataset includes answers to the question: How familiar are you with the Yellowstone Supervolcano?
Yesterday Sinn Fein announced their candidate for N. Belfast, BBC story. It was a clever choice and will probably win them more votes. This is another seat where the SDLP have been in terminal decline for more than a decade but just refuse to die.
This means the nationalist vote is split just enough to ensure the DUP hold this seat. The unionist vote will be round 46% to 47%, the nationalist vote about the same. The unionist and nationalist votes have been converging for a while, I think this will continue or the nationalist vote will be slightly higher than the unionist. But the unionist vote is not split so they will win. The Alliance party have also increased their vote here but remain well behind the two main tribes.