I wish to evaluate a structure similar to the following:
The house is green but my favorite colors are blue red and yellow
I determine the color of the house with a regular expression like this:
the house \ s + (\ w \ s *) + (? = (cyan | green | red | blue))
What does it do? This expression returns the next match:
The house is green but my favorite colors are blue
That is, returns the last match in the string in the list CharacterClass colors indicated, ie it takes until the appearance of RED, but the first color you see is GREEN.
What should I do? What I'm looking for is to just take the first color mentioned in the list and stop looking, that is to tell me that the house color is green, and nothing else.
Q1: How to loop through the string until the appearance of only one and only one of the expressions that you indicated, that is, how to convert the expression (cyan or green or blue or red) to a list that behaves like an XOR. Important: Only use regular expressions, ie without any como.NET background language, Java, PERL, etc ...
Q2: Are there any alternative to using regular expressions that I missed. That is, the road I took is the right one?
In advance, thank you all
It's returning the latest match because your (\w\s*)+
is greedy; it matches as much as it can (i.e. all the way up to just before the 'red').
You could change it to non-greedy using +?
instead of +
the house\s+(\w\s*)+?(?=(cyan|green|red|blue))
But I think you can do better than that.
Why (\w\s*)+
you're potentially just matching a single letter at a time! why not match whole words instead with (\w+\s+)+
.
Also, why not just match up to the first colour?
the\s+house\s+(\w+\s+)+?(cyan|green|red|blue)
Then capturing group 2 (the second set of brackets) will contain the first occurence of cyan, green, red, or blue (i.e. your colour list). Note the +?
making sure that the word regex is non-greedy, meaning it won't gobble up instances of 'cyan', 'green', 'red' or 'blue'.
You could even just do
house.*?\b(cyan|green|red|blue)
Where the .*?
is non-greedy, and just gobbles everything up, up to the first colour. The \b
is a "word boundary" and just makes sure the regex doesn't match the 'red' in 'desired', for example.
This is how i would do it in python, im not sure if other languages have the .seach feature.
"What I'm looking for is to just take the first color mentioned in the list and stop looking, "
s='The house is green but my favorite colors are blue red and yellow'
import re
print re.search('(cyan|green|red|blue)',s,).group(1)
print re.match('The house is (cyan|green|red|blue)',s,).group(1)#or if u had to use the .match
note the lack of spaces in the (cyan|green|red|blue).
it prints this:
green
green