The math behind Markov Chains can be intimidating but the basic idea is not. Imagine we have a system with two or more states. At a given time it is possible that the system will change from being in state A to being in state B or remaining in state A. For example a sleeping baby may continue to sleep or may wake up. There is a probability associated with each transition, in the diagram below there is a probability of 0.8 that when the system is in state A it will change to state B and a probability of 0.2 that it will remain in state A. And when in state B there is a probability of 0.3 that it will remain in B and 0.7 that it will return to state A.
Markov chains can be used to take words or n-grams and build tweets. Given a dataset of actual tweets the code starts with a word or n-gram chooses the most probable word or n-gram to follow and continues. The following code will generate some Trump-like tweets. Most of the tweets will not make sense because they are just semi-random collections of words and phrases.
import pandas as pd
with open('trumptweets.txt') as ip_file:
text = ip_file.read()
text_model = markovify.Text(text, state_size=2)
for i in range(5):
I used a dataset of Trump tweets available on Kaggle and combined it with some up to date tweets I scraped from his Twitter account. Some example output:
Give Trump lemons and he thanks me!
Disgrace I spoke with other countries where we just had the worst performing stocks on the cover.
No one has worse judgement than Hillary has bad judgement.
#ICYMI- watch this afternoons rally here:_ Thank you for coming!