- Home /
Finding phrases within strings
Hello! I am working on a project that does a lot with strings - something I don't have much experience with. I currently have a chunk of code for a function that looks like this:
void checkWordBanks(string submittedSentence)
{
string[] words = submittedSentence.Split(' ');
foreach (string str in words)
{
if (badWords.Contains(str))
{
Debug.Log("A bad word was found");
}
if (goodWords.Contains(str))
{
Debug.Log("A good word was found");
}
}
}
Basically, the player types something into the game and when they hit enter, it runs this function with the player inputted text being the input of the function. This function works great! It takes the sentence(s) the player types in, takes each individual word and places it into an array, then compares that array to two separate arrays, one containing a list of good words and one containing a list of bad words. It does just what I want, the issue is that it only works for individual words, and I also need it to check for specific phrases that may be typed in by the player. I have a good theory of what I want it to do, but I can't seem to get it written correctly. Basically, I want to make separate array of phrases that the game knows (just like the arrays of good and bad words the game knows.) I want to compare what the player types to see if anywhere in the string there is a phrase that matches that of one in the array, then to take that phrase and add it to the "words" array before comparing the words array to the good and bad word arrays. I just don't know how to go about that! How can I access the phrase that the player typed once it finds it in the array?
I know my issue is very specific, but any help is greatly appreciated!
Answer by karlhulme · Oct 30, 2013 at 08:38 AM
Hi Sussy,
If I've understood you correctly you're after something like this..
string[] phrases = new string[] { "phrase number one", "phrase number two" };
List<string> foundPhrases = new List<string>();
public string CheckPhraseBanks(string submittedSentence)
{
// consider each phrase in turn
foreach(string phrase in phrases)
{
// use IndexOf instead of Contains
int index = submittedSentence.IndexOf(phrase);
if(index > -1)
{
// remember the phrase we found
foundPhrases.Add(phrase);
// suck the phrase out of the submittedSentence
submittedSentence = submittedSentence.Remove(index, phrase.Length);
}
}
// return whatever is left of the submitted sentence
// so you can still check for individual words
return submittedSentence;
}
public string CheckWordBanks(string submittedSentence)
{
submittedSentence = CheckPhraseBanks(submittedSentence);
// continue original processing here
}
The idea of this code is to check the submitted sentence for any phrases, and in doing so, remove those phrases from the string. This means that you can then search for individual words knowing that phrases are already removed. There's a big caveat with this approach though. If you have phrases that overlap, for example 'eat toast' and 'now eat toast', it becomes important to check them in the right order. You may be able to get away with ordering the phrases by length and checking the longest first.
Alternatively, you may want to look into the System.Text.RegularExpressions namespace. This provides a more succinct way of searching text for particular expressions and may result in a more elegant approach. However, see @fafase note below regarding readability and debugging issues with RegEx
RegEx are real good...when you can read them...and debugging is a pain for newbies. Since he claims not to be confident with string, I would leave regex aside for now.
Your answer is fine and I would feel the caveat you mentioned coud be avoided.
submittedSentence = submittedSentence.Remove(index, phrase.Length);
this line will replace the existing string with a new one, why not just store the new one into a new reference so that you can the original one and you can still work it for other searches?
I understand what you're saying about regex. I did add/remove it a few times actually. In the end I decided that it's easy to ignore stuff you don't understand (as long as it's marked as optional in some way) but it's impossible to get awareness of stuff that's not mentioned. $$anonymous$$aybe someone else will have a string-related problem, find this, and see that regex may be a better fit for their problem??
That 'Remove' line was intended to ensure that you don't find overlapping phrases, like the toast-based example. It's possible he's very happy with overlapping phrases, if so, he just has to delete code - which I figure is easier than if I'd given him code that accepts overlapping phrases and then he doesn't want them and has to work out how to do it.
$$anonymous$$ost importantly I'm guessing the upvote came from you. If so, thanks very much!
Answer by GetColor · Nov 03, 2013 at 02:37 AM
Thank you so much for the replies! I think I may have confused some people of what I wanted to do, as some of the information posted went over my head. I found a solution to what I wanted, by using info here and some more research on the string functions themselves.
The issue was that I had these string arrays (a good list of words and a bad list of words) taken line by line from a txt file. I was then having the player type something in, which was saved as a string, then that string was split up using a space so that each individual word in the string was a separate item in an array. Then I compared the submitted player text word array to the two other arrays to see if bad or good words were typed. This made it impossible for the game to recognize multiple word phrases, which was needed. (and I understood why, I was just trying to find a work around.)
Instead what I did was, instead of comparing the player submitted text to the two arrays, I decided to compare the two word arrays to the submitted player string without converting it to an array of words. This solved my issue. The only minor issue that arose from this method of doing it is that, say dog is a "good" word. If someone types doghouse it will trigger as if just dog was entered. I feel like the solution to this would simply add a space character after each entry in the word banks. So instead of "dog" it would say "dog ". I haven't tried this yet, as I just thought of it while typing this response!
If you'd like to see my code, it looks like this:
string words = submittedSentence;
foreach(string str in badWords)
{
if (words.Contains(str))
{
Debug.Log ("a bad word was found!");
}
}
foreach(string str in goodWords)
{
if (words.Contains(str))
{
Debug.Log ("a good word was found!");
}
}
Your answer

Follow this Question
Related Questions
Adding Entered name as Player 0 Answers
Player Prefs and String Concatenation 1 Answer
get three variables from string variable name 1 Answer
Create a seed or string to run multiple methods. C# 3 Answers