I used a data set on class size for Primary schools in Scotland. The data is made available by the Scottish Government and is available here. The code used:
import csv def read_file(filename): numbers = [] with open(filename) as f: reader = csv.reader(f) next(reader) for row in reader: numbers.append(int(row[0])) return numbers def calculate_mean(numbers): s = sum(numbers) N = len(numbers) mean = s/N return mean def find_range(numbers): lowest = min(numbers) highest = max(numbers) r = highest  lowest return r, lowest, highest def find_differences(numbers): mean = calculate_mean(numbers) diff = [] for num in numbers: diff.append(nummean) return diff def calculate_variance(numbers): diff = find_differences(numbers) squared_diff = [] for d in diff: squared_diff.append(d**2) sum_squared_diff = sum(squared_diff) variance = sum_squared_diff/len(numbers) return variance numbers = read_file('classSize.csv') m = calculate_mean(numbers) rng, l, h = find_range(numbers) variance = calculate_variance(numbers) std = variance**0.5 print ('mean = ' + str(round(m, 2))) print ('range = ' + str(rng)) print ('smallest class size = ' + str(l)) print ('largest class size = ' + str(h)) print ('standard deviation = ' + str(round(std, 2))) The code is based on examples from 'Doing Math with Python' by Amit Saha, No Starch Press. Although the book is not specifically aimed at people interested in data science or data analytics it does include chapters on probability, statistics and calculus. The code generated the following statistics: mean = 23.55 (to two decimal places) range = 47 smallest class size = 1 largest class size = 48 standard deviation = 5.38 (to two decimal places) I used Idle as a development environment and saved the .csv file and code in the same directory  this means I don't need a full path for the file name in the code.
0 Comments
Leave a Reply. 
This blog includes:Scripts mainly in Python with a few in R covering NLP, Pandas, Matplotlib and others. See the home page for links to some of the scripts. Also includes some explanations of basic data science terminology. Archives
October 2017
