Google trends shows some increased interest in wildfires this year:
Some recent headlines include:
"Southern California wildfires trigger mass destruction, hurting families, economy", Fox News
"California wildfires by the numbers: $177M spent, more than 1,000 structures destroyed", CNN
"Christmas wildfires: How climate change puts California at risk all year round", the Independent
Source of Dataset used in this analysis: Kaggle.
citation for data: Short, Karen C. 2017. Spatial wildfire occurrence data for the United States, 1992-2015 [FPA_FOD_20170508]. 4th Edition. Fort Collins, CO: Forest Service Research Data Archive. https://doi.org/10.2737/RDS-2013-0009.4
2017 seems to have been a bad year for California, is this part of a trend? The dataset covers the time period 1992 - 2015. If we plot the number of wildfires for each year in California in that time period we get:
The data does not suggest a strong trend either up or down, of course we are missing data for 2016 and 2017, perhaps a trend is developing in the last 3 years or so.
Of course it is not just the number of fires that is important, the size of the fires is also important. How has this changed over the two decades covered by the data?
As with the number of fires the average fire size does not show a definite trend. The third headline above makes a claim that climate change has made wildfires more common all year round, to test this we can extract data for two years, say 1993 and 2013 (20 years apart) and see if there is any difference in the number of fires per month:
Looking at just these two years we can say that the distribution of wildfires during the year is different. In 2013 wildfires are slightly more evenly distributed through the year while in 1993 they are slightly more concentrated in the summer months.
The majority of wildfires in CA are the result of human activity including arson. Natural fires due to lightning account for less than 15% of all wildfires in CA.
Is it possible to predict if a fire was started maliciously using Machine Learning?
The simple answer is yes. Using a Random Forest algorithm it is possible to get an accuracy of over 92% (for the data in the dataset). The algorithm uses the year, month and day of the week plus the latitude and longitude of the location where the fire started to predict if the fire was the result of arson.