Discovering Data
  • Home
  • Blog
  • become_a_data_scientist
  • Code-examples

#100DaysOfDataScience

Day 16 - one hot encoding

7/21/2018

0 Comments

 
​One hot encoding
Say we have a categorical variable in our dataset such as occupation:
Ocuupation
Teacher
Office worker
Accountant
​
To pass these into a machine learning algorithm we need to convert them to numeric values. We could just assign a numeric label, for example teacher = 1, office worker =2 and so on. But some algorithms could interpret this as office worker > teacher which makes no sense. One way to get around this problem is to use one hot encoding. Then we create vectors to identify the categorical value, for example

Teacher   = 100
Office worker = 010
Accountant  = 001

But one-hot vectors would increase the dimensionality however it is possible to use some dimensionality reduction like PCA. Note that if our categorical variable has many unique values then, we may want to use a more sparse encoding.
0 Comments



Leave a Reply.

Proudly powered by Weebly
  • Home
  • Blog
  • become_a_data_scientist
  • Code-examples