Discovering Data
  • Home
  • Blog
  • become_a_data_scientist
  • Code-examples

#100DaysOfDataScience

Day 24 - first submission to DengAI

7/29/2018

0 Comments

 
df = df.dropna()
df['city'] = df['city'].map({'sj': 1, 'iq': 2})
X = df.drop(['total_cases'],axis=1).values
X_scaled = preprocessing.scale(X)
y = df['total_cases'].values
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3, random_state=0)
clf_rf = ske.RandomForestClassifier(n_estimators=50)
clf_rf = clf_rf.fit(X_train, y_train)
print(clf_rf.score(X_test,y_test)) ###13.2%
clf_gb = ske.GradientBoostingClassifier(n_estimators=50)
clf_gb = clf_gb.fit(X_train, y_train)
print(clf_gb.score(X_test,y_test)) ####6.9%
test = pd.read_csv('short_test.csv')
test['city'] = test['city'].map({'sj': 1, 'iq': 2})
test = test.dropna()
print(clf_gb.predict(test))


Note there are a couple of rows containing nulls in the prediction file, I set the number of new cases to zero for both those rows.
The above returned the grand score of 13% in testing but it did put me in 1,503 place out of 4,256 so not
too bad but still work to be done. I simplified the input - using just air temp and precipitation.
0 Comments



Leave a Reply.

Proudly powered by Weebly
  • Home
  • Blog
  • become_a_data_scientist
  • Code-examples