The dataset is available on Kaggle. The dataset goes back more than 150 years. I wanted to see if modern day police officers are in greater danger of being shot and killed while on-duty. However it looks as though the opposite is true:
time period officers killed by gunfire as a % of total officers killed
pre 1850 50%
1850 - 1899 76.9%
1900 - 1929 71.4%
1930 - 1949 52.6%
1950 - 1969 41.1%
1970 - 1989 48%
1990 - 2008 35.6%
Note that this is the percentage of officers killed by gunfire not the number of officers who have been shot. Some police officers in the US now wear body armour which may give them some protection. Two categories which have shown an increase are heart attack, up from 3.2% in the period 1850 - 1899 to 7.4% in the period 1990 - 2008. Also vehicle accidents which resulted in death, increased from 3.2% in the period 1850 - 1899 to 31.8% in the period 1990 - 2008.
Most people, including me, got it wrong but interestingly my low tech prediction based on Ohio was more accurate than all the fancy complex models used by the experts. I think organisations such as FiveThirtyEight missed some of the indicators that pointed to a possible Trump win. In my last post I mentioned Ohio polls which clearly indicated a drop in support for Clinton in the last 10 days or so of the campaign. Florida also shows a similar pattern. These states were important, without them Trump could never have won.
Clinton (blue) and Trump were very close for most of October, but in the last week or so of the campaign there was a clear separation with Clinton's support dropping (the trend is less clear than the Ohio graph).
A number of people are saying that the polls got it wrong but most national polls showed Clinton with a small lead - this was accurate as she did win the popular vote. The problem for Clinton was - you don't win the US presidential election by winning the popular vote, you have to win the electoral college vote.
Ohio has voted for the winning president on 28 out of the last 30 US elections. So while using Ohio alone to predict the result is not very scientific it is interesting. Using polling data available on Kaggle I plotted the poll results for the last year or so:
This clearly shows that Clinton's vote(in blue) has fallen away in the last week white Trump has stayed constant. So based on this Trump is about 24 hours away from being elected president of the USA.
According to Wikipedia:
Benford's law, also called the first-digit law, is an observation about the frequency distribution of leading digits in many real-life sets of numerical data. Graphically:
To test this I used a dataset containing country populations of all countries. The distribution of first digits is:
Plotting this data gives:
It is very close to the theoretical graph bove. Benford's law is admissible in US courts and has been used to show that financial data submitted by the Greek government to the EU was probably false.
How does it work? Think about what happens when you double numbers:
start with: 1,2,3,4,5,6,7,8,9
then double: 2,4,6,8,10,12,14,16,18
then again: 4,8,12,16,20,24,28,32,36
and son on, If you count up the first digits: 1:8, 2:5, 3:3, 4:2, 5:1 ..... ones are more common than twos etc. The same conclusion is reached if you just write down the numbers 1,2,3,4,....and continue until you are too bored to go on, then count the number of ones, two, threes....
The code used to generate the above graphic is in my code blog.