The Coursera R programming course is part of the Data Science specialisation offered by Coursera. Week two introduces functions in R. There are a number of practical problems concerning air pollution data which have to be solved. Three hundred and thirty two csv files are provided. Each file contains data from one of 332 air quality monitors across the US, named 001, 002, ...,332. Each file has a date, a nitrate level and a sulphate level. The week two assignment has three parts, I provided my solution to part 1 in a previous post. Below is my partial solution to part two:
complete <- function(id) { nm <- c("ID", "nobs") df <- as.data.frame(matrix(nrow = 0, ncol = 2,dimnames = list(NULL, nm))) filenames <- paste(sprintf("%03d", id), ".csv", sep="") for (file in filenames){ nm <- c("ID", "nobs") tempdf <- as.data.frame(matrix(nrow = 0, ncol = 2,dimnames = list(NULL, nm))) tempdata <- read.csv(file) lastrow <- tail(tempdata, 1) rowcount <- nrow(na.omit(tempdata)) id <- lastrow$ID tempdf <- cbind(ID=id, nobs=rowcount) df <- rbind(df, tempdf) rm(tempdata) rm(rowcount) rm(id) rm(tempdf) } print(df) } The function generates the expected results for the test data given in the course. I did not include functionality to set the working directory as it is against Coursera rules to publish complete solutions.
2 Comments
|
This blog includes:Scripts mainly in Python with a few in R covering NLP, Pandas, Matplotlib and others. See the home page for links to some of the scripts. Also includes some explanations of basic data science terminology. Archives
October 2018
|