Note: stats.grok.se is now deprecated, the server is no longer available so the first line of the script below will no longer work. You can still get wikipedia page view data - the new site is here. You can download the data in different formats: csv, json ...etc once you have the json file you can use the rest of the script below to parse and visualise the data.
Wikipedia make some statistics available and these can be analysed with R. For example the number of people viewing the Wikipedia page for Plaid Cymru since the start of April 2015:
This shows two definite peaks in number of views - the first around April 2nd/3rd corresponds to the ITV Leader's debate and the second around April 16th/17th corresponds to the BBC Election Debate. the interesting thing about these numbers is that they are much higher than other parties such as the SNP or UKIP. I wonder if this is because many people outside of Wales did not know who Plaid Cymru are.
To create the above graph you'll need a couple of packages installed:
Then just follow these steps:
jsonData <- getURL("http://stats.grok.se/json/en/latest30/Plaid%20Cymru")
parsedData <- fromJSON(jsonData)
plotData <- parsedData$daily_views
This blog includes:
Scripts mainly in Python with a few in R covering NLP, Pandas, Matplotlib and others. See the home page for links to some of the scripts. Also includes some explanations of basic data science terminology.