Background

Data Set

Over the lifetime of Wordle, there have be a variety of potential word lists. After the acquisition by the New York Times, the initial potential guess and solution words were modified to fit mainstream culture by removing inappropriate words and increasing the probability of uncommon words.

Throughout our project, we have utilized 3 different word lists of varying sizes to test the robustness of our algorithms and optimize the algorithms so they can recommend rare words to better narrow down the remaining list. The first two are the official NYT allowed guesses and possible answers lists which allows us to more closely tune the algorithms to the official game directly. The last one is a list of over 5000 words that includes many words that aren't allowed by NYT which opens up more words to our algorithm.

Our goal is to have these algorithms assist Wordle players so for that reason, we decided to use the official lists for the recommendations and to only use the extensive list for testing and optimization. These data sets can be found in our code page where we identify each list individually.

Methodology

In our efforts to find the optimal starting word and suggestive guessing progression, we implemented a variety of Machine Learning Methods. The purpose of this page is to provide the background to each of these methods so the results of the analysis can be easily comprehended in conjunction with the methodology.

Shortcomings of our Algorithms

It is important to note the shortcomings of our algorithms. As you play our demo, they may become apparent. Throughout our implementation, we decided to maintain Hard Mode. This essentially describes the rules and assumptions that we are playing Wordle under. Hard mode means that the next guess has to maintain the correct letters in the right position. Guessing a "random" word that uses different letters to further narrow down the list isn't allowed. While the User can still guess one of these words, the algorithms will only generate recommendations that follow this rule. Intuitively this makes sense until you arrive at certain words that have many other words that rhyme with it. In the image below, you will see a case like this. Since our algorithm follows Hard mode, every potential answer has an equally likely chance of happening. Because of this, it is up to the User to guess what they think the word is. Logically, a human would recognize this and look to make a guess that includes as many different letters as possible to eliminate some of the potential answers but our algorithm will not suggest these.

The reasons for Hard mode is because we wanted to make a helper, not something that will solve it in the least possible guesses. Additionally, this would make the algorithm much more complicated.