top of page

Frequency Analysis

Letter Frequency

An influential part in selecting a starting word is by analyzing the probability that the solution shares letters with your guess. Logically, we pick words that contain the most common letters like s, t, r and vowels. Using the same line of thought, we performed letter frequency analysis on the possible guess list that the NYT uses so we could find the most common letters. Using a combination of C++ and Matlab, we were able to gather this information and display it in meaningful ways. In the image below, we have graphed the results of this analysis. As our intuition serves, the letters e, a, r, o and l appear the most in the list so logically, our first guess should include as many of these frequent letters as possible. While this isn't frequency in the Fourier domain, it still transform the data into a different representation, which aligns with methods in class.

letAnalysis.png

Source © Jackson Muller

Position Analysis

Now that we know the letters that should ideally be in our starting word, we need to figure out the optimum position for each of these letters so that they make an allowed guess while using the most common letters. Building on our C++ and Matlab algorithms, we modified them to also track the positions that these letters occur in. We established a vector for each word that contains the letter and its position. This allows us to track the magnitude of occurrences in any of the 5 possible locations of the letter. However, there is further motivation beyond just finding potential anagrams of the most common letters. Some may think that a word like "soare" is an excellent guess but if you analyze the frequency of an 'o' followed by an 'a', it is relatively uncommon. Given our strategy is to find the exact solution to the game, words like these would not be optimum even though it utilizes frequent letters. 

With this inspiration, we examined the location frequencies of some of the most common letters in the image above. In the next image, these letter positions are displayed. 

posAnalysis.png

Source © Jackson Muller

bottom of page