Discovering Data
  • Home
  • Blog

#100DaysOfDataScience

Day 23 - DengAI

7/28/2018

0 Comments

 
Drivendata competition: DengAI

The dataset has data for two citiies. these cities show different patterns in the number of Dengu fever during the year.

Data for ​San Juan
Picture
Data for Iquitos:
Picture
San Juan shows a definite peak in the second half of the year with a much smaller peak at the beginning of the year while Iquitos has two more balanced peaks at the beginning and end of the year. So perhaps there are some seasonal variations which produce these differences. Identifying these factors might help to predict the number of cases for each week.

Code used for above:

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import linear_model

from sklearn.metrics import mean_absolute_error
from sklearn.metrics import median_absolute_error

X = pd.read_csv("dengue_features_train.csv")
y = pd.read_csv("dengue_labels_train.csv")
print(X.head())
print(y.head())

cases_1 = y[y['city']==1]
cases_2 = y[y['city']==2]

e = sns.jointplot(x="weekofyear", y="total_cases", data=cases_1, kind='reg',joint_kws={'line_kws':{'color':'red'}})
​f = sns.jointplot(x="weekofyear", y="total_cases", data=cases_2, kind='reg',joint_kws={'line_kws':{'color':'red'}})
0 Comments



Leave a Reply.

Proudly powered by Weebly
  • Home
  • Blog