Discovering Data
  • Home
  • Blog
  • become_a_data_scientist
  • Code-examples

Discovering Data

Correlation

12/4/2018

0 Comments

 
Correlation is a measure of the connection between variables. For example the amount of leg room on a flight and the cost of the ticket. This video explains the concept well:
You can try this yourself. For example, what makes people happy? The OECD measures life satisfaction and publishes the data, here. 

​If we take three variables and measure correlation with life satisfaction this gives the following plot.
Picture
correlation matrix
The darker the shade of grey the stronger the correlation. The top left to bottom right diagonal can be ignored, this is comparing the same fields so equals 1. 
Abbreviations used:
  • Satis = life satisfaction
  • earn = personal earnings
  • person = available personal time (to pursue hobbies, relax and so on)
  • ​Edu = time spent in full time education

The strongest correlation is between satisfaction and earnings, to a lesser extent education and finally personal time. Displaying data visually is often  easier to read than when the data is displayed as a table. For example the above plot suggests the strongest correlation is between life satisfaction and earnings.
The function to generate this plot is:
Function to visualise a correlation matrix

    

0 Comments



Leave a Reply.

    Author

    About 12 years ago I decided I wanted to change career. I only had a vague notion that I'd like to 'work in IT'. Several years later  I found data analytics - I had found my new home.

    Archives

    January 2019
    December 2018
    November 2018

    Categories

    All

    RSS Feed

Proudly powered by Weebly
  • Home
  • Blog
  • become_a_data_scientist
  • Code-examples