The R function below is a partial solution to assignment 3 part 2. I simplified the data file and renamed the outcome columns to HeartAttack, HeartFailure and Pneumonia. I also ignored the requirement to sort hospital names in the situation where there is a tie for best hospital.
The first problem to solve is how to fail the function in the case where one or both of the parameters is invalid. You are required to use the stop() function. This can be done as: best < function(state, outcome){ fullData < read.csv("outcomeofcaremeasuresshort.csv") validStates < c("AL", "TX") if (state %in% validStates){ #body of function }else{stop("invalid state")} } I have simplified the code by only including two states in my list of valid states. If a different state or an invalid state such as 'BB' is entered the stop() function throws an error and prints out the message: 'invalid state', for example the following : > bt < best("AZ", "HeartAttack") Error in best("AZ", "HeartAttack") : invalid state I have not included an outcome check but it can be dealt with in a similar way with the addition of a nested if. The main body of the function: best < function(state, outcome){ fullData < read.csv("outcomeofcaremeasuresshort.csv") validStates < c("AL", "TX") if (state %in% validStates){ stateData < fullData[fullData$State == state,] if(outcome == "HeartAttack"){ finalData < stateData[stateData$HeartAttack != "Not Available",] ordered < finalData[order(finalData$HeartAttack, decreasing=F),] } if(outcome == "HeartFailure"){ finalData < stateData[stateData$HeartFailure != "Not Available",] ordered < finalData[order(finalData$HeartFailure, decreasing=F),] } if(outcome == "Pneumonia"){ finalData < stateData[stateData$Pneumonia != "Not Available",] ordered < finalData[order(finalData$Pneumonia, decreasing=F),] } bestHospital < head(ordered, 1) return(bestHospital[1]) }else{stop("invalid state")} }
0 Comments
The raw data (for Leeds City Centre) is available here. A network of CCTV cameras is used to track the number of people passing through an area per hour. The data is provided in the form of csv files. As an example I am taking one file for September 2014. I wrote and ran the R commands on the command line rather than in RStudio.
The first step is to read the data into memory. Since I saved the data file in my working directory (you can get this using the function getwd()) I can just use the read.csv() function and pass in the name of the file: myData < read.csv("monthlydatafeedsept201420141009.csv") The next step is to extract the data just for one location, in this case the data for Briggate brigData < myData[myData$LocationName == "Briggate",] Then I can narrow this data down to Monday brigData_Mon < brigData[brigData$Weekday == "Monday",] The data can now be plotted: plot(brigData_Mon$Hour, brigData_Mon$InCount) Pedestrians per hour, Friday (Sep 2014)
I produced similar charts for the other days of the week. R can also produce summary statistics for the data
Summary data such as max and min values for particular days can be obtained from:
max(brigData_Mon$InCount) and min(brigData_Mon$InCount) 
This blog includes:Scripts mainly in Python with a few in R covering NLP, Pandas, Matplotlib and others. See the home page for links to some of the scripts. Also includes some explanations of basic data science terminology. Archives
October 2018
