- Home /
Comparing words in dictionaries
So I've got a text file with most of the English words in it, and I'm putting it into a dictionary (Called 'words') as so:
words = textFile.text.Split('\n').Select(s=>s.Trim()).Distinct().ToDictionary(s=>s,s=>true);
What I'm trying to do is take a Chosen Word and look in the dictionary for a word LIKE the Chosen Word. I capitalize 'LIKE' because I'm not looking to check if the Chosen Word is within the text file, what I want to do is find a word that is similar.
For example, if the Chosen Word was 'Cake' then the Dictionary might return 'Case'. Basically it should check that the first few letters are similar, and perhaps that it shares some other common letters, BUT it should not be the exact same word.
I should admit, I'm not that knowledgeable about dictionaries, so hopefully this isn't a dumb question and feel free to point me towards any relevant resources.
So, in conclusion, I want to figure out how to give the dictionary a random word and for it to return a similar (BUT not identical) word.
Answer by whydoidoit · Feb 28, 2014 at 05:57 AM
Ok for that you are not going to want to use a dictionary exactly the way you are. Unfortunately what you are looking for is not going to be doable in anything apart from O(N) time I fear. Not unless you have restrictions like "the first letter" must be the same for instance or the words must sound alike.
So you could do a "sounds like" by using Soundex which might be what you are after.
Let's say you do that and write a Soundex(someString) function - what you would then do is:
Create a Dictionary of Lists of words with the same soundex value
When looking up, calculate the Soundex of the word you are searching for - then iterate that list from the dictionary ignoring the word itself and choose a random value.
Wow, never heard of Soundex before but that's a brilliant solution, thanks. Still trying to wrap my head around it, but it sounds like it'll work.
Yeah it's used for fixing people typing names wrong in data deduplication.
You would end up with something like this as a Linq function:
Dictionary<string, List<string>> lookup = someStringArray.GroupBy(s=>Soundex(s)).ToDictionary(s=>s.$$anonymous$$ey, s=>s.ToList());
And here's an implementation of Soundex in C#: http://seesharpdeveloper.blogspot.sg/2013/07/soundex-algorithm-in-c.html
If I was to limit it by saying the first two letters of the word must be the same as the word replacing it, how would I start off on that?
Just make your soundex function use the first character and then soundex the substring of everything but the first character.
Your answer
Follow this Question
Related Questions
How to implement Text Mesh text wrapping? 2 Answers
Android 4.2 strings bugs. 1 Answer
Finding a character unicode in C# 0 Answers
Error CS0029 fix 1 Answer
Replace chars in String with a Dictionary and avoid order-sensitive problems 0 Answers