Suicide rates are increasing:
Some of this increase might be explained by growing population but when the data is visualised per-capita it is clear the increase cannot be explained as increased population:
the magority of suicides are male:
different age groups show different trends, over 80s (dark green), 70 to 79 (purple), 60 to 69 (pink) and under 20 (blue) are all quite flat, whereas 20 to 29 (red), 30 to 39 (green), 40 to 49 (brown) and
50 to 59' (orange) all show increases
The data for deaths from stabbing in London was found here.
The number of deaths to April 24 = 47
average age = 30, median age = 24
oldest = 70
youngest = 17
How are stabbings distributed over days of the week?
Word cloud crated from the comments in the data:
From the above, areas in London most impacted include: Camden, Peckham, Hackney, Southall and Islington. Note also that it does not just involve young men, one man was stabbed by a woman in her 20s.
Python code to generate some of the above:
import pandas as pd
import matplotlib.pyplot as plt
from wordcloud import WordCloud, STOPWORDS
df = pd.read_csv('london_knife_crime.csv',parse_dates=['date']) #you need to create a csv file containing the data in your working directory or a different directory but then you need to pass in the full path to the csv_read function
df['day_of_week'] = df['date'].dt.weekday_name
df['day_of_week'].value_counts().plot(kind='bar',title='day of week of stabbing')
text = df['comment'].str.cat(sep=' ')
stopwords = set(STOPWORDS)
wordcloud = WordCloud(background_color="green", stopwords=stopwords).generate(text)
Google Trends for 'iPhone slow' clearly shows a periodic spike in interest.
These spikes occur in September 2013, September 2014, September 2015, ....
The release schedule for iPhones is September 2013, September 2014, September 2015, ....
Is it coincidence that people search for 'iPhone slow' around the time of a new iPhone release? It seems not. Apple have now admitted (or partially admitted) that they do slow down older models, according to a BBC report they claim it was "to prolong the life of the devices". Other people suspect Apple deliberately slowed older models just before the release of each new iPhone to encourage iPhone users to upgrade.
In the Google Trends graph above interest searches for 'iPhone slow' rapidly spiked they gradually fell away, but in 2017 the pattern changed, the searches didn't fall away they continued and grew stronger. More people were perhaps becoming aware of what Apple has been up to.
Google trends shows some increased interest in wildfires this year:
Some recent headlines include:
"Southern California wildfires trigger mass destruction, hurting families, economy", Fox News
"California wildfires by the numbers: $177M spent, more than 1,000 structures destroyed", CNN
"Christmas wildfires: How climate change puts California at risk all year round", the Independent
Source of Dataset used in this analysis: Kaggle.
citation for data: Short, Karen C. 2017. Spatial wildfire occurrence data for the United States, 1992-2015 [FPA_FOD_20170508]. 4th Edition. Fort Collins, CO: Forest Service Research Data Archive. https://doi.org/10.2737/RDS-2013-0009.4
2017 seems to have been a bad year for California, is this part of a trend? The dataset covers the time period 1992 - 2015. If we plot the number of wildfires for each year in California in that time period we get:
The data does not suggest a strong trend either up or down, of course we are missing data for 2016 and 2017, perhaps a trend is developing in the last 3 years or so.
Of course it is not just the number of fires that is important, the size of the fires is also important. How has this changed over the two decades covered by the data?
As with the number of fires the average fire size does not show a definite trend. The third headline above makes a claim that climate change has made wildfires more common all year round, to test this we can extract data for two years, say 1993 and 2013 (20 years apart) and see if there is any difference in the number of fires per month:
Looking at just these two years we can say that the distribution of wildfires during the year is different. In 2013 wildfires are slightly more evenly distributed through the year while in 1993 they are slightly more concentrated in the summer months.
The majority of wildfires in CA are the result of human activity including arson. Natural fires due to lightning account for less than 15% of all wildfires in CA.
Is it possible to predict if a fire was started maliciously using Machine Learning?
The simple answer is yes. Using a Random Forest algorithm it is possible to get an accuracy of over 92% (for the data in the dataset). The algorithm uses the year, month and day of the week plus the latitude and longitude of the location where the fire started to predict if the fire was the result of arson.
Does AI pose a threat to society?
What is AI?
AI is an approach to problem solving which differs significantly from traditional computing.
Say we have a robot in an empty room and we want it to find the door and leave the room. The traditional computing approach to this problem would require programming the robot with specific instructions such as move forward 5 units, turn right by ninety degrees and so on. This approach will work but only for one starting position. It also requires precise knowledge about the location of the door and the starting point of the robot. The AI approach is to give the robot the ability to solve the problem by itself, this solution will work for all starting positions. In this case machine vision might be one possible solution. The robot has the ability to visually scan the room, the AI attempts to distinguish a door from walls and windows. Once it recognises a door it moves in that direction.
This kind of AI is not self-aware nor does it understand the concept of a door or the concept of leaving a room. It was trained to recognise doors and once the door is identified it will attempt to move in that direction.
Warnings about AI
Bill Gates and Stephen Hawking have both issued warnings about AI becoming too powerful. Also, earlier this year there was a twitter spat between Elon Musk and Mark Zuckerberg over the risks of AI and then a letter to the UN signed by a number of leading researchers and leaders in the tech industry warning of the dangers of weaponised AI.
Narrow v General
While the concerns being raised about AI becoming too powerful are sensible I think we are still a long way from the self-aware AI of science fiction movies which rises up and builds an army of human hating killing machines. The AI we have now is narrow intelligence – at best it can perform a task to a level which is as good as or better than a human expert. Much of the best AI we have is built around neural networks and the back-propagation algorithm. We will never get to general AI (self-aware machines) from back propagation. Geoffrey Hinton, the inventor of the back-propagation algorithm recently said: 'my view is throw it all away and start again'. It can never lead to true AI. This however does not mean that things like Machine Learning and Neural Nets will have no negative impacts on us. In the next fifteen to thirty years AI could directly affect millions of workers by the loss of jobs to machines. A recent report from PWC predicts that up to 30% of existing jobs in the UK could be automated out of existence, other industrialised countries can expect similar effects or worse. One report from the previous US administration put the figure closer to 50% in the US. If around one third to half of the working age population suddenly finds itself unemployed and possibly unemployable how will they react? AI is also being used in areas such as law enforcement and banking. In the future it may be an algorithm not a person who decides if you are eligible for a loan or if you are likely to commit a crime.
Artificial stupidity is perhaps more dangerous than artificial intelligence at least in the near term. Badly designed and poorly tested algorithms making decisions that impact people in the real world. And good old fashioned human ignorance also poses risks - politicians, business people and the military making decisions about the use of AI even though they don't understand the tech. There is also the danger of AI being hacked, and of hackers developing their own AI.
The way forward
My answer to the title question: 'does AI pose a threat to society?' Is yes it does. But it also offers many potential positives such as advances in medicine, engineering and business.
In the early 19th century a group of English workers, the Luddites, attempted to stop progress by smashing the machinery that was taking away their livelihood. They failed and the technology destroyed their ability to earn money. AI will not be stopped, it will change the world whether people and governments are ready or not.
The OECD measures life satisfaction across a number of countries. One of their findings is a link between education and satisfaction, people with more education are more satisfied. You can download the OECD data and test this for yourself. The graph below shows life satisfaction plotted against years spent in education, there is some indication of a positive correlation but this correlation is only 0.38 (measured using Spearman test).
There is a lot of scatter. If instead we look at earnings and satisfaction, the correlation coefficient is 0.74.
The correlation between earnings and education is also stronger at 0.41 than between satisfaction and education.
So perhaps the apparent correlation between Life Satisfaction and Education is actually due to a stronger correlation between wealth and satisfaction, the link being the more education you have the more likely you are to earn more. Maybe we are more materialistic than we like to admit.
This word cloud was generated by scraping Trump's twitter account for July and August:
Compare to a word cloud from May:
'Great' is a keyword for Trump. Other recurring themes and key words include America(n), job, media, fake, news, Russia(n), election, healthcare/Obamacare and thank. Trump's twitter account tends to be more positive than negative - apart from Fox he doesn't trust the media so Twitter is his main way of communicating his message and that message is more positive than negative.
The data set is available on Kaggle.
The code used to analyse the data is available here.
Plotting the number of casualties by state gives:
This is a little misleading, because CA, TX and so on are large populous states so it is not surprising that the greatest number of casualties came from these states. I added in state populations from 1967 and normalised the casualty rates per state, then plotted the data again:
This map suggests Missouri had a disproportionately large casualty rate normalised by state population compared to other states while California had a relatively low casualty rate.
Since 2011 Stackoverflow have conducted a developer survey. The raw data can be downloaded from here. The following is a brief analysis of part of the 2017 survey data. Specifically analysis of some factors affecting salary. David Robinson, data scientist at Stack Overflow found an unusual relationship between the use of tabs and spaces and salary. His R code is available here. I decided to see if I could recreate the results using Python and Pandas. My Python code is available here. The following graph clearly shows the relationship for the 2017 data.
The 2015 survey also includes spaces v tabs data but the salary and experience data is in a different format
making a direct comparison with the 2017 data difficult but it is possible to plot the data and see if there a similar correlation. The next graph shows that there may be a similar relationship.
It is difficult to explain this relationship however correlation does not mean causation. I also looked at the 2017 data for Masters v Degrees. When you take the overall average for developers with a Masters degree you find they have a slightly higher salary than developers with just a degree. However when you divide the data by experience this is no longer true. Now it seems having a Masters degree will negatively impact on your salary as you become more experience, see the graph below:
Again this is not what I expected.
The dataset is available here.
The following Python script shows the percentages of survivors for different groups:
import pandas as pd
df = pd.read_csv('titanic3.csv')
df_male = df[df['sex']=='male']
df_female = df[df['sex']=='female']
df_class_group = df.groupby('pclass').mean()
df_class_group_male = df_male.groupby('pclass').mean()
df_class_group_female = df_female.groupby('pclass').mean()
Only 38% of the passengers survived the sinking but this is only part of the story, we can dig down further
to see how belonging to different groups would determine a passenger's chances of survival. If we divide the passengers into male and female we can see that only 19% of male passengers survived whereas 73% of female passengers survived. We can also divide by class - there were 3 classes of ticket on the Titanic: first, second and third. The percentage survival rate (male and female) by class were:
If we divide by both gender and class:
It is clear that first class female passengers had the best chance of survival. It is also interesting that the class divisions break down for male second and third class passengers, in the case of male passengers being a second class passenger did not increase your chances of survival compared to male third class passengers.
The difference in ticket price: