Drivendata competition: DengAI The dataset has data for two citiies. these cities show different patterns in the number of Dengu fever during the year. Data for San Juan Data for Iquitos: San Juan shows a definite peak in the second half of the year with a much smaller peak at the beginning of the year while Iquitos has two more balanced peaks at the beginning and end of the year. So perhaps there are some seasonal variations which produce these differences. Identifying these factors might help to predict the number of cases for each week.
Code used for above: import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn import linear_model from sklearn.metrics import mean_absolute_error from sklearn.metrics import median_absolute_error X = pd.read_csv("dengue_features_train.csv") y = pd.read_csv("dengue_labels_train.csv") print(X.head()) print(y.head()) cases_1 = y[y['city']==1] cases_2 = y[y['city']==2] e = sns.jointplot(x="weekofyear", y="total_cases", data=cases_1, kind='reg',joint_kws={'line_kws':{'color':'red'}}) f = sns.jointplot(x="weekofyear", y="total_cases", data=cases_2, kind='reg',joint_kws={'line_kws':{'color':'red'}})
0 Comments
Leave a Reply. |
|