Algorithm to choose random letters for word search game that allows many words to be spelled

Asked 15/2, 2010 at 18:36 Answered 16/2, 2010 at 0:52

java algorithm computer-science linguistics

I'm making a boggle-like word game. The user is given a grid of letters like this:

O V Z W X
S T A C K
Y R F L Q

The user picks out a word using any adjacent chains of letters, like the word "STACK" across the middle line. The letters used are then replaced by the machine e.g. (new letters in lowercase):

O V Z W X
z e x o p
Y R F L Q

Notice you can now spell "OVeRFLoW" by using the new letters. My problem is: What algorithm can I use to pick new letters that maximizes the number of long words the user can spell? I want the game to be fun and involve spelling e.g. 6 letter words sometimes but, if you pick bad letters, games involve the user just spelling 3 letter words and not getting a chance to find larger words.

For example:

You could just randomly pick new letters from the alphabet. This does not work well.
Likewise, I found picking randomly but using the letter frequencies from Scrabble didn't work well. This works better in Scrabble I think as you are less constrained about the order you use the letters in.
I tried having a set of lists, each representing one of the dies from the Boggle game, and each letter would be picked from a random die side (I also wonder whether I can legally use this data in a product). I didn't notice this working well. I imagine the Boggle dice sides were chosen in some sensible manner, but I cannot find how this was done.

Some ideas I've considered:

Make a table of how often letter pairs occur together in the dictionary. For the sake of argument, say E is seen next to A 30% of the time. When picking a new letter, I would randomly pick a letter based on the frequency of this letter occurring next to a randomly chosen adjacent letter on the grid. For example, if the neighboring letter was E, the new letter would be "A" 30% of the time. The should mean there are lots of decent pairs to use scattered around the map. I could maybe improve this by making probability tables of a letter occurring between two other letters.
Somehow do a search for what words can be spelt on the current grid, taking the new letters to be wildcards. I would then replace the wildcards with letters that allowed the biggest words to be spelt. I'm not sure how you would do this efficiently however.

Any other ideas are appreciated. I wonder if there is a common way to solve this problem and what other word games use.

Edit: Thanks for the great answers so far! I forgot to mention, I'm really aiming for low memory/cpu requirements if possible, I'm probably going to use the SOWPODS dictionary (about 250,000) and my grid will be able 6 x 6.

Rabat answered 15/2, 2010 at 18:36 Comment(1)

I like your idea of using letter juxtaposition probabilities. You could expand it further: for any given letter location, figure out the probability of each letter being adjacent to its immediately surrounding letters and average these probabilities into a single one, then pick a random letter using the averaged probabilities as weights. – Gorga 15/2, 2010 at 18:49

Here's a simple method:

Write a fast solver for the game using the same word list that the player will use. Generate say 100 different possible boards at random (using letter frequencies is probably a good idea here, but not essential). For each board calculate all the words that can be generated and score the board based on the number of words found or the count weighted by word length (i.e. the total sum of word lengths of all words found). Then just pick the best scoring board from the 100 possibilities and give that to the player.

Also instead of always picking the highest scoring board (i.e. the easiest board) you could have different score thresholds to make the game more difficult for experts.

Sandlin answered 15/2, 2010 at 18:48 Comment(4)

Thanks. This is probably the most bullet-proof idea in that you could e.g. guarantee (most of the time) that there will always be a certain number of large words to pick. My board will be 6x6 and using a trie takes too much memory so I'm not sure how I could use this efficiently though. – Rabat 15/2, 2010 at 19:38

Using a word prefix list (trie) is going to give the best performance if you have the memory. If you store the trie compressed you can probably create fit a full trie in a few MBs I'd guess. If not, you can still probably get a word prefix list of up to length 5 in memory, then switch to binary (or interpolated) search of the full word list to check for matches longer than 5. Alternatively... count the prefixes up to length 5 and assume that lots of small partial words gives a good chance of a long word without explicitly checking for the long words. – Sandlin 15/2, 2010 at 19:55

If you're daring you could use a DAWG that is stored in an array. There is an excellent video lecture from Stanford on that found here: youtube.com/watch?v=TJ8SkcUSdbU The short story is she managed to store 250,000 words in .32 MB – Yasminyasmine 15/2, 2010 at 20:32

If you can push this solving work to a separate thread, the solver can work ahead while the user is playing the game. This will give your solver more time to work. – Oribelle 16/2, 2010 at 14:29

A minor variation on the letter-pair approach: use the frequency of letter pairs in long words - say 6 letters or longer - since that's your objective. You could also develop a weighting that included all adjacent letters, not just a random one.

Karinkarina answered 15/2, 2010 at 18:49 Comment(1)

Nice about about using the 6 letter long words! I considered using trigrams (only consider the frequency of 3 letter pairs) but your idea sounds closer to what I really want. – Rabat 15/2, 2010 at 19:32

This wordgame I slapped up a while back, which behaves very similarly to what you describe, uses English frequency tables to select letters, but decides first whether to generate a vowel or consonant, allowing me to ensure a given rate of vowels on the board. This seems to work reasonably well.

Ulterior answered 15/2, 2010 at 18:50 Comment(3)

Thanks. What did you use for the vowel/consonant rate? My feelings is, in every local 2x2 grid, you should probably have at least one vowel. Otherwise, you might get 'trapped' groups of consonants in the corners that you cannot use in words. Did you use just use regular letter frequency tables and not e.g. paired letter frequencies? – Rabat 15/2, 2010 at 19:30

@Bobby: because the board mutates after each word, the player can "chip away" at clumps of difficult letters over time - one might think of that as part of the game strategy. The vowel/consonant rate is hardwired to 0.559 - I obtained that value and the letter frequencies by gathering stats on some ebooks I had lying around :) – Ulterior 15/2, 2010 at 19:42

OK, thanks. I've actually tested my game with the falling behavior but I've found players tend to ignore the bottom letters when the letters there aren't very good and they spend all their time at the top. I was thinking about letters falling in from all directions somehow. Or make it a requirement to dispose of old letters. Also, falling letters make it hard to e.g. fix the number of vowels in local grid positions. I might be over thinking this though. :) I would just quite like it if e.g. every grid had at least one long word in it so experts could show off. – Rabat 15/2, 2010 at 19:48

You should look up n-gramming, and Markovian Models.

Your first idea is very losely related to Markovian algorithms. Basically, if you have a large text corpus, say of 1000 words. What you can do is analyse each letter and create a table to know the probability of a certain letter following the current letter.

For example, I know that the letter Q from my 1000 words ( 4000 letters in total ) is used only 40 times. Then I calculate what probable letters follow using my markov hash table.

For example, QU happens 100% of the time so I know that should Q be randomly chosen by your application that I need to make sure that the letter U is also included. Then, the letter "I" is used 50% of the time, and "A" 25% of the times and "O" 25% of the time.

Its actually really complicated to explain and I bet there are other explainations out there which are much better then this.

But the idea is that given a legitmately large text corpus you can create a chain of X letters which are probably consistent with English language and thus should be easy for users to make words out of. You can choose to look forward on a value of n-gram, the highest the number the easier you could make your game. For example, an n-gram of two would probably make it very hard to create words over 6, but an n-gram of 4 would be very easy.

The Wikipedia explains it really badly, so I wouldn't follow that.

Take a look at this Markov generator:

http://www.haykranen.nl/projects/markov/demo/

Hydropic answered 15/2, 2010 at 18:55 Comment(3)

Thanks, sounds interesting. Could you elaborate a bit more about the n-gram of 4 idea? Would I e.g. pick an adjacent chain of 4 letters, say "C-H-A-N", near my random letter location, then ask a table to pick a letter that usually follows the 3 letters "CHAN" e.g. "G" as in "CHANGING"? – Rabat 15/2, 2010 at 19:36

I've always been scared of Markov chains. The main wiki article is confusing but this one is quite good: en.wikipedia.org/wiki/Examples_of_Markov_chains – Rabat 15/2, 2010 at 19:44

n-gramming is just where you break down something in N number of grams. For example, on a 1-gram the word Boggle is 1 gram B O G G L E 2-gram (Commonly called a bigram) It would be B BO OG GG GL LE E 3-gram (Commonly called a trigram) It would be B BO BOG OGG GGl GLE LE E On a 4-gram (Just called an n-gram) It would be B BO BOG BOGG OGGL GGLE GLE LE E You can see how if you use a markov chain with a particular n-gram you can group particular common occuring charachter sequences. Incidentally, as you increase the n-gram you will find the game becomes easier. – Hydropic 15/2, 2010 at 20:34

I do not know about a precanned algorithm for this, but...

There is a dictionary file in UNIX, and I imagine there is something similar available on other platforms (maybe even in the java libraries? - google it). Anyways, use the files the spell checker uses.

After they spell a word an it drops out, you have existing letters and blank spaces.

1) From each existing letter, go right, left, up, down (you will need to understand recursive algorithms). As long as the string you have built so far is found at the start of words or backwards from the end of words in the dictionary file, continue. When you come across a blank space, count the frequency of the letters you need next. Use the most frequent letters.

It will not guarantee a word as you have not checked the corresponding ending or beginning, but I think it would be much easier to implement than an exhaustive search and get pretty good results.

Onto answered 15/2, 2010 at 18:48 Comment(1)

Could you give a short example? I'm not sure how this would work. – Rabat 15/2, 2010 at 19:31

I think this will get you a step closer to your destination: http://en.wikipedia.org/wiki/Levenshtein_distance

Lian answered 15/2, 2010 at 18:48 Comment(0)

You might look at this Java implementation of the Jumble algorithm to find sets of letters that permute to multiple dictionary words:

$ java -jar dist/jumble.jar | sort -nr | head
11 Orang Ronga angor argon goran grano groan nagor orang organ rogan 
10 Elaps Lepas Pales lapse salep saple sepal slape spale speal 
9 ester estre reest reset steer stere stree terse tsere 
9 caret carte cater crate creat creta react recta trace 
9 Easter Eastre asteer easter reseat saeter seater staree teaser 
9 Canari Carian Crania acinar arnica canari carina crania narica 
8 leapt palet patel pelta petal plate pleat tepal 
8 laster lastre rastle relast resalt salter slater stelar 
8 Trias arist astir sitar stair stria tarsi tisar 
8 Trema armet mater metra ramet tamer terma trame 
...

Aftergrowth answered 16/2, 2010 at 0:52 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags