The following script will take a document and compare it to a set of documents to find the document similarities.
output of the script:
[ 0.48266575 0. 0.01086096 0.13409612 0.17690402]
So the comparison document most closely matches the first document. The least similar is the second document with a score of two.
This blog includes:
Scripts mainly in Python with a few in R covering NLP, Pandas, Matplotlib and others. See the home page for links to some of the scripts. Also includes some explanations of basic data science terminology.