FAQ Search
Memberlist Usergroups
Profile
  Forum Statistics Register
 Log in to check your private messages
Log in to check your private messages
Moonpod Homepage Starscape Information Mr. Robot Information Free Game Downloads Starscape Highscore Table
Scrabble lookup coding challenge!
Goto page Previous  1, 2, 3  Next
Post new topic   Reply to topic    Discussion Pod Forum Index -> Independent Game Development View previous topic :: View next topic  
 Author
Message
Rup



Joined: 19 May 2003
Posts: 363
Location: London, UK



PostPosted: Wed Jul 16, 2008 9:27 pm    Post subject: Reply with quote

Yeah, that's pretty neat.

From a performance point of view, though, I think all the string manipulation might hurt - I don't know if it's smart enough to reuse the memory from the previous string when you chop off the first letter and I'm worried it'll make a new object every time. How about this?
Code:
        public List<String> ScrabbleMatch(String p_strValidLetters)
        {
            char[] letters = p_strValidLetters.ToCharArray();
            char[] buffer = new char[letters.Length + 1];
            List<String> listWords = new List<String>();

            ScrabbleMatchInternal(ref listWords, ref buffer, 0, ref letters);
            return listWords;
        }

        protected void ScrabbleMatchInternal(
            ref List<String> listWords,
            ref char[] buffer,
            int nDepth,
            ref char[] letters)
        {
            if (CompleteWord)
            {
                listWords.Add(new String(buffer, 0, nDepth));
            }

            for (int i = 0; i < letters.Length; ++i)
            {
                if (letters[i] == '\0')
                {
                    // We've already tried this one - keep going
                    continue;
                }

                // Found a letter we haven't tried before
                char ch = letters[i];
                int nOffset = ch - 'a';
                TrieNode trieNodeChild = Nodes[nOffset];
                if (trieNodeChild != null)
                {
                    // We can continue with this letter
                    //
                    // Remove the letter from the list available to the
                    // child recursion, add it to the string so far and
                    // recurse
                    letters[i] = '\0';
                    buffer[nDepth] = ch;
                    trieNodeChild.ScrabbleMatchInternal(
                        ref listWords,
                        ref buffer,
                        nDepth + 1,
                        ref letters);

                    // Put the letter back
                    letters[i] = ch;
                }
                // else no words continue with that letter
            }
        }

Untested since I don't have your other classes but compiles OK with a stub TrieNode.

If that's any better I can see a few ways to improve that still; it's currently massively complex in the length of the original letter string and I don't see a way to improve that but we could simplify the constant of the complexity: rather than removing the letter from the letters array could instead swap in the one at the back of the array and reduce the max index of the letters that we scan to. Of course then we'd need to pass that max index around (or compute it every time from letters.Length - nDepth). That would also prevent the complete scan of the letters array at the final depth when it's known to be empty.

Also rather than passing the three ref arguments could instead create a 'ScrabbleMatchInternal' class which contains those three as members and put all that code in a Recurse method on the class with a single parameter for depth.

One more idea: if you sort the letters in p_strValidLetters ahead of time and remember the last letter you recursed with at each depth you could then trivially skip duplicate letters at that depth and avoid the word deduplication afterwards. I do also like Weeble's bit mask idea but I can't see a good way to work that in here too :-/

I should probably try implementing some of these other ideas too but it's late and I can't test / time the complete code anyway. Hope some of this helps!

Edit: I'm thinking in C++, aren't I? You can likely strip all of the 'ref's from my program and it will still work. They're all passed as references anyway.
Back to top
View user's profile
colinvella



Joined: 15 Sep 2006
Posts: 82



PostPosted: Thu Jul 17, 2008 8:26 am    Post subject: Reply with quote

I'm not sure about your ref'd parameters... these essentially translate to pointers to pointers to the objects which is fine when you want a method to update your parameter with a reference to some new object. The multiple string construction is intentional.. it is intended to generate all the letter permutations. From what I understand in your implementation however, you're keeping track of the changes to the valid letters buffer at every recursion point.. so it should be correct... very clever in fact Smile

There's lots that can be done to improve the code.. and most of the performance ma be achieved by minimising the object construction. I merely explored the concept of combining letter permutation with simulataneous trie navigation.

So, adopting the convenient textbook approach, I leave these optimisations as an exercise to the reader Razz Razz
Back to top
View user's profile
Slyh



Joined: 25 Nov 2004
Posts: 480
Location: Karlsruhe, Germany



PostPosted: Thu Jul 17, 2008 9:22 am    Post subject: Reply with quote

colinvella wrote:
For the letters "eettaoinshrd" the query took 131ms and generated precisely 2633 words from "ad" to "tsar".

You do have a point that the chosen set was not a good representative. Would you say that the one you gave me is an average set or is it at the other extreme?

Maybe the chosen set is a bit of an extreme. It contains the most frequent letters of the English language. (see Wikipedia).
You can test an average set of letters yourself by creating several sets of 10-17 random letters and run you algorithm against them.

The biggest problem with your algorithm still is the number of permutations you have to do. For 17 letters there are 355,687,428,096,000 possible permutations. No matter how many duplicate permutations you are able to eliminate -- which of course costs time as well -- the number of permutations you have to check against the Trie is too high.
For smaller sets of letters you algorithm is good.

colinvella wrote:
The scrabble search is more naturally coded recursively due to the ongoing combo construction. The potential combinatorial explosion of this process is pruned extensively by navigating down the trie trie.

But you still have to do lots and lots of checks, even if you can discard some thousand or million permutations at once.
Back to top
View user's profile
Rup



Joined: 19 May 2003
Posts: 363
Location: London, UK



PostPosted: Thu Jul 17, 2008 10:10 am    Post subject: Reply with quote

But that's not his algorithm. He starts at the top of the trie and at each node he reaches he explores all child trie nodes that he can go to given the set of letters he has left at that point. Ignoring multiple recurision for duplicate letters the most he'll ever do is a single pass through every node in the trie.
Back to top
View user's profile
Slyh



Joined: 25 Nov 2004
Posts: 480
Location: Karlsruhe, Germany



PostPosted: Thu Jul 17, 2008 10:18 am    Post subject: Reply with quote

Ooh. Ehm, ok, sorry. Seems as I misunderstood the explanation. (I didn't look at the code.)
So, the algorithm performs good for a set of 17 letters?
Back to top
View user's profile
colinvella



Joined: 15 Sep 2006
Posts: 82



PostPosted: Thu Jul 17, 2008 10:32 am    Post subject: Reply with quote

Well aptly put Rup Smile
What I was trying to say is that, without the trie navigation, the algorithm would simply degenerate to a combinatorial explosion, but since I only traverse down the trie tree of valid sub-words, this explosion never occurs and the worst case is simply a complete traversal of the tree (with possible repetitions due to repeated letters)


Anyone willing to suggest a set of 17 letters? I can give it a try.. the proof of the pudding is in the eating as they say.
Back to top
View user's profile
colinvella



Joined: 15 Sep 2006
Posts: 82



PostPosted: Thu Jul 17, 2008 10:42 am    Post subject: Reply with quote

Ok... I feeded the search with the 26-letter non repeating set "abcdefghijklmnopqrstuvwxyz".

Search time: 1528.8ms

Words matched: 50328, ranging from "ab" to "zythums"
Back to top
View user's profile
Slyh



Joined: 25 Nov 2004
Posts: 480
Location: Karlsruhe, Germany



PostPosted: Thu Jul 17, 2008 10:59 am    Post subject: Reply with quote

Mmh... Maybe I was not that wrong after all. Very Happy

My (slightly) optimized algorithm performs the search in about 31ms with warmed-up code and 48ms with interpreted code for this set of letters.

For "eettaoinshrd" it takes 23ms/35ms to search.


Last edited by Slyh on Thu Jul 17, 2008 11:12 am; edited 1 time in total
Back to top
View user's profile
colinvella



Joined: 15 Sep 2006
Posts: 82



PostPosted: Thu Jul 17, 2008 11:08 am    Post subject: Reply with quote

Slyh, yours seems to scale much better for large sets but seems a little slower for the smaller onces.

I did find a set that really killed mine: "eettaaooiinnsshhrrdd"
The algo didn't terminate even after a minute! My algo suffers when there are repeated letters, as the above. However I did manage to improve it by cleaning up my intermediary lists of subwords.. and the search terminated at about 5000ms... still slow.

Another disadvantage of my algo is that it doesn't start giving you matches immediately... it builds sub-words as substrings from the right until it recurses to the highest level. So basically there's no benefit from firing it in a separate thread and letting it trickle words to the main thread.

edit: feeding "abcdefghijklmnopqrstuvwxyz" to the algo with subword list cleanup: 1388ms .. a bit better, but not much
Back to top
View user's profile
Slyh



Joined: 25 Nov 2004
Posts: 480
Location: Karlsruhe, Germany



PostPosted: Thu Jul 17, 2008 11:59 am    Post subject: Reply with quote

The way my (original) algorithm works makes sure that in the (unrealistic) worst case the number of operations is a maximum of "only" the number of all letters of all words in the word list. For our Scrabble list the number of operations therefor is 2.7 million (=2.7 million actual letters).
One operation consists of 3 array accesses, 1 comparison, 1 subtraction and 1 number decrease. (Not including the for-loop.) This is not much for a modern machine.

The actual number of operations needed for "abcdefghijklmnopqrstuvwxyz" is 1.6 million. For "xylophone" it's only 381,105 operations.

The most time is used to set up the working copy of the comparison array for each word in the list, anyway. (System.arraycopy() in the code.)

I optimized my code so this array is no longer needed. (Every word in the valid word list now has a pre-calculated information associated to it which describes how many times a certain letter occurs in the word.)
This uses a lot more memory but only improved performance around 5% (warmed-up) to 25% (interpreted).

Using the original Scrabble a single search currently is completed after 20-30ms (for warmed-up code). This statement is valid for huge sets of valid letters as well.

My algorithm suffers if the number of valid words increases. Adding each word 8 times to the list will increase run time by a factor of about 7.

Your algorithm, on the other hand, is really fast for small sets of letters and works better with bigger lists of valid words. (At least I would expect so.)

I'm still trying to come up with an algorithm that combines both ideas. Some kind of hash value for each valid word and a tree/graph containing the words, where only 26 (or so) checks have to be done for the current set of letters. Or something like that. Smile
Back to top
View user's profile
colinvella



Joined: 15 Sep 2006
Posts: 82



PostPosted: Thu Jul 17, 2008 2:44 pm    Post subject: Reply with quote

That's a very good analysis.. I didn't give it that much thought Smile Now that I've had a chance to compare, your version is very effective and has the benefit of generating words immediately. You could for example sort the dictionary by word length, then alphabetically, such that the first results give you the shorter words, like a human would normally do. I'm afraid I have to concede that your algo is superior, at least for the given application Smile

For a single-word search, I don't think there's anything faster than a trie tree. You need no more than min(K, D) iterations where K is the size of the word to search and D is the depth of the tree, corresponding to the longest word in the dictionary. It is hence independent of the dictionary size. For a hash map solution, I think that a hashing function would be order(K) but with a trie you can exit early, either because it is clear the word cannot exist, or thanks to the use of collapsed branches, if implemented.

Thanks for the debate guys... this was fun science! Smile
Back to top
View user's profile
Weeble
Starscape Jedi
Starscape Jedi


Joined: 25 Apr 2003
Posts: 1143
Location: Glasgow, Scotland



PostPosted: Fri Jul 18, 2008 6:04 am    Post subject: Reply with quote

I'm pretty sure you can avoid the explosion associated with the trie solution when you have duplicate letters in your input. The trick is to generate the permutation in a way that never creates duplicates. I think if you sort your input letters, and then any time you backtrack past a letter and look to add the next one, make sure you skip over all its neighbours that are identical. Combined with the pruning, that should make a big difference. Still, your worst case is going to be worse than the raw brute force one (where you have enough letters to make anything in the dictionary), because there's always going to be more overhead.
Back to top
View user's profile Visit poster's website MSN Messenger
colinvella



Joined: 15 Sep 2006
Posts: 82



PostPosted: Fri Jul 18, 2008 11:34 am    Post subject: Reply with quote

I think that would work and should be relatively easy to implement. I'll see if I can try it out...

ok... for "eettaoinshrdeettaoinshrd" the search takes 78ms

and for "eettaoinshrd", 15.6ms

a definite improvement Smile

[edit] revised timings after I realised I could remove duplicate list cleanup code Smile
Back to top
View user's profile
taniyan



Joined: 22 Aug 2009
Posts: 1



PostPosted: Sun Aug 23, 2009 9:30 am    Post subject: Reply with quote

What is a good dictionary to use while playing scrabble? I've began playing scrabble recently (I love it) and I was wondering what a good dictionary to use was. Would a regular dictionary work? Or would a "Scrabble Dictionary" be better? Hope someone can help!
______________________
external keyword tool ~ keyworddiscovery.com ~ keycompete.com ~ compete.com ~ webmasterworld.com


Last edited by taniyan on Wed Aug 26, 2009 9:47 am; edited 1 time in total
Back to top
View user's profile
Poo Bear
Pod Team
Pod Team


Joined: 14 Oct 2002
Posts: 4121
Location: Sheffield, UK



PostPosted: Mon Aug 24, 2009 8:35 pm    Post subject: Reply with quote

There are some links off the appropriate wikipedia page that discuss "official" dictionaries, but they are for tournaments.
http://en.wikipedia.org/wiki/Official_Scrabble_Players_Dictionary


Amazon do an official scrabble dictionary too:
http://www.amazon.co.uk/Collins-Official-Scrabble-Dictionary/dp/0007259085

There are also many websites with online word checkers/suggester functions, see google.
Back to top
View user's profile Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    Discussion Pod Forum Index -> Independent Game Development All times are GMT
Goto page Previous  1, 2, 3  Next
Page 2 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group