top of page

Cosine Similarity

In this method, we can think of each word as a vector containing the positions of each letter. We are able to find the next most similar word using cosine similarity that we learned in class, which utilizes vectors in the Euclidean space to determine the angle between them. 

Image Source: Cosine similarity - Wikipedia

The resultant value of this method is bounded between -1 and 1 as cosine is bound on that interval. The closer the output is to a magnitude of 1, the closer the vectors are to each other. If the value is exactly 1, then they are the same vector. If the value is 0, it means that the vectors are as far apart as possible. Within our algorithms, we loop through every word to determine the word that is most similar to the frequency analysis we performed on the list. The words that it recommends contain the most commonly occurring letters in the most frequent locations.

bottom of page