Wayback Machinekoobas.hobune.stream
May JUN Jul
Previous capture 13 Next capture
2021 2022 2023
1 capture
13 Jun 22 - 13 Jun 22
sparklines
Close Help
  • Products
  • Solutions
  • Made with Unity
  • Learning
  • Support & Services
  • Community
  • Asset Store
  • Get Unity

UNITY ACCOUNT

You need a Unity Account to shop in the Online and Asset Stores, participate in the Unity Community and manage your license portfolio. Login Create account
  • Blog
  • Forums
  • Answers
  • Evangelists
  • User Groups
  • Beta Program
  • Advisory Panel

Navigation

  • Home
  • Products
  • Solutions
  • Made with Unity
  • Learning
  • Support & Services
  • Community
    • Blog
    • Forums
    • Answers
    • Evangelists
    • User Groups
    • Beta Program
    • Advisory Panel

Unity account

You need a Unity Account to shop in the Online and Asset Stores, participate in the Unity Community and manage your license portfolio. Login Create account

Language

  • Chinese
  • Spanish
  • Japanese
  • Korean
  • Portuguese
  • Ask a question
  • Spaces
    • Default
    • Help Room
    • META
    • Moderators
    • Topics
    • Questions
    • Users
    • Badges
  • Home /
avatar image
0
Question by entity476 · Feb 10, 2015 at 11:22 PM · stringreplaceregularexpressions

SOLVED Exclude set of keywords from string (using Regex)

Hello community!

I am currently using this piece of code

 import System.Text.RegularExpressions;

 var theString: String = "word1 and word3 in the word6 blah blah"
 var editString : String = Regex.Replace(theString, "( and )+", " ");
 editString = Regex.Replace(editString, "( in )+", " ");
 editString = Regex.Replace(editString, "( the )+", " ");
     
 editString = Regex.Replace(editString, "( )+", " ");

in order to exclude some common words from a string, which I then split at the white spaces to get an array of the words. No matter the different syntaxes I tried (and the research I've done), I couldn't figure out how to combine the (at least first three) above "Replace" lines into one. Is it possible? As indicated from the code sample, I am using Unity Javascript and the Regex namespace. Suggestions to generally optimize the method are welcome of course.

EDIT: Just noticed that the ( keyword )+ method will replace only the first match, so please let me know how I would be able to replace all the matches in the string.

EDIT2: My actual goal is to create a keyword search method, where from a string input, which represents several words, I get every single keyword in a different string (so let's say an array of the substring keywords), excluding some predefined terms. I don't necessarily want to use regular expressions, but I thought it would be the most straightforward way to do it. I am now thinking that I might create the array with ALL the keyword substrings first and then edit this array to remove unwanted inputs...I'll give it a try, but if a regex could do the job, I'd be pleased to learn something new! :)

FINAL EDIT: Note, that Unityscript will not accept a single slash symbol, so it needs a second slash. In another case, for example, "\\n" would be needed instead of "\n" to represent the line break character.

Comment
Add comment
10 |3000 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users

1 Reply

· Add your reply
  • Sort: 
avatar image
0
Best Answer

Answer by Alanisaac · Feb 11, 2015 at 01:39 AM

I think I've got this right, but I'm by no means a regex expert, and I also am not sure on what exactly you want.

I do know that you can combine different regex expression matches using a grouping with the alternator symbol, "|". I also think you might want to use a different method than just the space character to represent word boundaries. "\W" can represent non-word characters. Add to that, where you might have some beginning lines or ending lines ("^" and "$") when one of the words starts a sentence. And what if the words appear in sequence ("...in the...")? You might want to also match a single space character in your sequence. "+" gets you one or more of the preceding expression, which will be useful for all of these elements. Take that all together, and you get something like:

 (\W|^)+(and|in|the| )+(\W|$)+

Check out this link, where I tested the Regex out.

In implementation, the backslash "\" character is actually the escape character. So you'll need to escape it by using a second "\" character, such as the following:

 var editKeyword : String = Regex.Replace(newKeyword, "(\\W|^)+(and|in|the| )+(\\W|$)+", " ");

Edit: Revised to include the final scripted version.

Comment
Add comment · Show 10 · Share
10 |3000 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users
avatar image entity476 · Feb 11, 2015 at 09:59 AM 0
Share

Thank you alfalfasprout for your quick and analytic reply! I am also sorry for my late respond, but it was getting late here and went to sleep. In order to defend the reason I posted, however, is that I have tried several expressions that I found and are supposed to do this task, but my actual problem probably is that I don't know how to implement them in the replace function. Like with other expressions I tried, if I write

 var edit$$anonymous$$eyword : String = Regex.Replace(new$$anonymous$$eyword, '(\W|^)+(and|in|the|)+(\W|$)+', " ");

the compiler complaints about "unexpected char: 'W'.". You're absolutely right though that I should mention my exact goal: I'd like from one string, which represents several words (separated by white space) to get an array of the words contained, excluding predefined words (which are common words and not specific terms). It's a keyword search after all. Regarding the common words in sequence, this is the reason I replace each word matched with a white space, so that for the following replace function to be able to find substrings surrounded by white spaces, but this method I use is rubbish and I only posted it in order to explain myself better. In conclusion any thoughts of the unexpected character error? By the way, the site you posted is very neat! Thanks again.

avatar image Alanisaac · Feb 11, 2015 at 11:00 AM 0
Share

Not sure if this will also work in UnityScript vs JavaScript, but since it's a Regex, try writing it between forward slashes.

 var regex = /(\W|^)+(and|in|the|)+(\W|$)+/;
 var edit$$anonymous$$eyword : String = Regex.Replace(new$$anonymous$$eyword, regex, " ");


avatar image entity476 · Feb 11, 2015 at 11:27 AM 0
Share

Thanks alfalfasprout for helping me out! I cannot use the

var regex = /(\W|^)+(and|in|the|)+(\W|$)+/;

line as is (unexpected token :/, unexpected character ), but if I quote the regex expression, it will return the same error as before: unexpected char: W. I also tried to find out how I create a new regex pattern, but I couldn't figure out from the autofill suggested options, when I type: new Regex. in $$anonymous$$onoDevelop.

avatar image entity476 · Feb 11, 2015 at 11:40 AM 0
Share

by using double slash before W ("\\W) does not pop up errors, but I messed my code trying to fix something else and I haven't run it yet, but I'll report back the soonest.

avatar image entity476 · Feb 11, 2015 at 11:49 AM 0
Share

Well, unfortunately, the \\ would not help either. There are no errors, but nothing is replaced... :(

Show more comments

Your answer

Hint: You can notify a user about this post by typing @username

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this Question

Answers Answers and Comments

20 People are following this question.

avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image

Related Questions

SOLVED - String replace % with " in C# 1 Answer

Why does String.fromCharCode does not work? 1 Answer

[solved] String: replace between start char and end char [Edited] 3 Answers

Regular Expressions in Unity 0 Answers

Convert DirectoryInfo.GetFiles data to string? 1 Answer


Enterprise
Social Q&A

Social
Subscribe on YouTube social-youtube Follow on LinkedIn social-linkedin Follow on Twitter social-twitter Follow on Facebook social-facebook Follow on Instagram social-instagram

Footer

  • Purchase
    • Products
    • Subscription
    • Asset Store
    • Unity Gear
    • Resellers
  • Education
    • Students
    • Educators
    • Certification
    • Learn
    • Center of Excellence
  • Download
    • Unity
    • Beta Program
  • Unity Labs
    • Labs
    • Publications
  • Resources
    • Learn platform
    • Community
    • Documentation
    • Unity QA
    • FAQ
    • Services Status
    • Connect
  • About Unity
    • About Us
    • Blog
    • Events
    • Careers
    • Contact
    • Press
    • Partners
    • Affiliates
    • Security
Copyright © 2020 Unity Technologies
  • Legal
  • Privacy Policy
  • Cookies
  • Do Not Sell My Personal Information
  • Cookies Settings
"Unity", Unity logos, and other Unity trademarks are trademarks or registered trademarks of Unity Technologies or its affiliates in the U.S. and elsewhere (more info here). Other names or brands are trademarks of their respective owners.
  • Anonymous
  • Sign in
  • Create
  • Ask a question
  • Spaces
  • Default
  • Help Room
  • META
  • Moderators
  • Explore
  • Topics
  • Questions
  • Users
  • Badges