The R programming course is part of the Data Science specialisation offered by Coursera. Week two introduces functions in R. There are a number of practical problems concerning air pollution data which have to be solved. The first problem requires writing a function that will calculate the mean pollutant level for a given pollutant type, directory and range of monitors. So for example the user could pass the following parameters into the function: ("specdata", "nitrate", 3:24). This will cause the function to look for the directory called "specdata", starting at the working directory, then get all data from the files in that directory corresponding to all the monitors from number 003 to number 024 inclusive and then use that data to get the mean nitrate levels. Below is a partial solution to the problem, I did not include the requirement to set the directory since publishing a complete solution would break Coursera rules.
pollutantmean <- function( pollutant = "nitrate", id){ #need code to set directory filenames <- paste(sprintf("%03d", id), ".csv", sep="") nm <- c("Date", "sulfate", "nitrate", "ID") df <- as.data.frame(matrix(nrow = 0, ncol = 4,dimnames = list(NULL, nm))) for (file in filenames){ temp_dataset <- read.table(file, header=TRUE, sep=",") df <- rbind(df, temp_dataset) rm(temp_dataset) } if (pollutant == "nitrate"){ mean(df$nitrate, na.rm = TRUE) }else{ mean(df$sulfate, na.rm = TRUE) } }
1 Comment
|
This blog includes:Scripts mainly in Python with a few in R covering NLP, Pandas, Matplotlib and others. See the home page for links to some of the scripts. Also includes some explanations of basic data science terminology. Archives
October 2018
|