Why visualise the data? We are visual creatures, it is often easier to see meaning in diagrams rather than in lists and tables of numbers, for example we have the following data, it could be the number of customers in a shop by day over a period of about three months.
The amount of data is small but it is still difficult to take in everything when it is presented as a table. But if we plot the data as a line graph with day number on the xaxis and number of customers on the yaxis: Straight away we can see that we can divide the data into pre40 days and post40 days. With much more activity including some prominent spikes in the post40 day data. Plotting the data makes the story much easier to see. We can't say why there is a difference, there is nothing in the data we have that could answer that question. A more technical plot is the box and whisker plot. It is less user friendly than a simple line graph but does give more information on the median, quartile and range of the data. The diagram below explains the different features in the above plot. Note the above plot also contains dots which indicate outliers in the data. Each box represents a day of the week starting at Monday. I would never use a box and whisker plot in a report or presentation aimed at people who are not familiar with statistics. One more possible plot is the waterfall graph: This plot starts at the beginning of week three. Each bar represents the increase or decrease in number of customers from the previous day. It is similar to the line graph above but can be used to highlight certain events/days. For example say one day we are expecting customer numbers to increase but instead we see a decline  the waterfall graph can illustrate this clearly: Other common graphs include the bar graph and the histogram, see here for a tutorial on these.
0 Comments
Leave a Reply. 
This blog includes:Scripts mainly in Python with a few in R covering NLP, Pandas, Matplotlib and others. See the home page for links to some of the scripts. Also includes some explanations of basic data science terminology. Archives
October 2017
