- Home /
check for word in dictionary
I'm building a scrabble-like word game and loading a text file containing a list of words that is 80000 words in length.
It seems like it is only loading part of the list and not the whole thing. When I try to access a value past 1000 I get an error. Is there a better way to do this
var wordCSV : TextAsset;
var wordArray = wordCSV.text.Split(";"[0]);
var wordList = new HashSet.< String >(wordArray);
if (wordList.Contains(wordString))
{
Debug.Log (wordString + " is a WORD!!");
}
else
{
Debug.Log (wordString + " not in dictionary");
}
I'm working on something similiar!
Simple answer, I got it working with 100k words by loading them into a List and perfor$$anonymous$$g a Contains(queryWord) call. But this is slowwww, so if you need speed, see the next answer. What's the error you're getting?
Here's the complicated way. Load your text file into a list. Then for each word, add it to a Trie data structure (Ntree where every node is a letter). This will incur a hit in pre-processing, but will provide nearly instantaneous look-ups when they try to perform a match.
Hope this helps!
It's nothing to do with the Hashset - you've got that right and its the best method - must be something to do with the inbound data not parsing properly I'd have thought.
BTW HashSets have a 2GB memory limit (like every other .NET object).
thanks, it wasn't the Hashset - there was a problem with the text file!
@whydoidoit This is my code i want to check if the word exists in my text file. I have linked my file to the text assets variable. The above code do not work in monodevelope. Also i am doing all this for unity.
using UnityEngine;
using System.Collections;
public class textdisplay1 : DestrroyOnClick {
public TextAsset asset;
public int searchWord (string a) {
return 1;
}
}
Answer by Jon Petersen · Apr 09, 2013 at 03:54 AM
I use something similar to the following... I haven't notice any slow down at all and my dictionary is close to 200k words.
import System.Collections.Generic;
import System.Linq;
var myDictionary : Dictionary.<String, String>;
var wordToCheck : String;
function Awake()
{
var MytextAsset = Resources.Load("dictionary", typeof(TextAsset)) as TextAsset;
myDictionary = MytextAsset.text.Split("\n"[0]).ToDictionary(function(w){return w;});
}
function checkWord(word : String)
{
return myDictionary.ContainsKey(word);
}
function Update()
{
if(checkWord(wordToCheck))
Debug.Log(wordToCheck + " is a valid word");
else if(!checkWord(wordToCheck))
Debug.Log(wordToCheck + " is NOT a valid word");
}
Dead right, a Dictionary or HashSet is the right way to go - note that for just lookup then the HashSet uses less memory (and the same algorithm).
Answer by Dracorat · Apr 09, 2013 at 02:32 AM
If your dictionary is sorted (and it should be) you shouldn't be loading the whole thing in to memory. You should access the file and move the file pointer half way in. Check to see if your word comes before or after that point. If before, move the file pointer half way back and check again. If after, move half way after and check again. Keep doing this half and half until your word is found directly in the file. AT MOST - this will require ONLY 16 such searches in a file of 50,000 words to find the right one and will execute blazingly fast with minimal memory footprint.
Binary search. I don't think that's faster than a Trie stored in memory. You're still accessing data from disk, which is relatively slow. But I'd be happy to be wrong because that's quite easy to implement! Least you save memory.
A Trie wouldn't be bad either (we answered at the same time). But it requires a lot more memory comparatively.
Agreed. BUT
Binary Search operates at O(log(n)) Trie lookup operates at O(m) where m is the length of the lookup term.
So log(80k) vs like 15 or 20... $$anonymous$$uch faster. Plus its in memory.
All depends on if memory or speed is more important. And a Trie is harder to implement. You win there.... :P
200k words which are the absolute reason for the game should definitely all be loaded into memory - I mean, how much can it take 2-4$$anonymous$$B only right?
A Hashset or dictionary is an O(1) operation, nothing is faster than that (apart from a better implemented O(1))
I know this is a dead post, but i didn't find anything similar to this on atoher post and i have some question as i am new to c#. I saw the answer but i don't understand it exactly. So what is suppose to initialize the text file with the words in c# and how u compare to see if the word is in it? I tried with dictionar.Contains(string)) is the same as checkWord(wordToCheck))? Also i don't get how i should put the words in text asset? One then this , and another then this, and so on? Thanks for help and sorry if it's any incovenience. @mthicke2