Why does regular expression not match without boundary matcher "Beginning of line"?

Go To StackoverFlow.com

1

There is something I don't understand in Java's regular expressions. I have the following string (and I need the "to Date"):

From Date :01/11/2011 To Date :30/11/2011;;;;;;;;;;;;;

I think that the following regular expression (in Perl) would have matched.

to\\s+date\\s*?:\\s*?([0-9]{2}[\\./][0-9]{2}[\\./][0-9]{2,4})

In Java, this pattern doesn't match. But it does if I add in front and at the end a .+ So this pattern works in Java:

Pattern p = Pattern.compile(".+to\\s+date\\s*?:\\s*?([0-9]{2}[\\./][0-9]{2}[\\./][0-9]{2,4}).+", Pattern.CASE_INSENSITIVE);

What I don't understand: It would be clear to me that the first pattern would not match in Java if I add a ^ (beginning of the line) and a $ at the end of the line. That would mean, that the pattern has to match the whole line. But without that, the first pattern should actually match, because why does the pattern care about string data which is out of scope of this pattern, if I don't set delimiters in front and at the end? This is not logical to me. In my opinion the first pattern should behave similar to the "contains" method of String class. And I think it is so in Perl.

2012-04-04 07:08
by Bevor
You can test Java regexp at regexplanet. I couldn't even get your pattern to work there with the .+'s in it - Alan Escreet 2012-04-04 07:30


6

In Java, matches() validates the entire string. Your input probably has line breaks in them (which don't get matched by .+).

Try this instead:

Pattern p = Pattern.compile(".+to\\s+date\\s*?:\\s*?([0-9]{2}[\\./][0-9]{2}[\\./][0-9]{2,4}).+", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher("... \n From Date :01/11/2011 To Date :30/11/2011;;;;;;;;;;;;; \n ...");

System.out.println(m.matches()); // prints false

if(m.find()) {
  System.out.println(m.group(1)); // prints 30/11/2011
}

And when using find(), your can drop the .+'s from the pattern:

Pattern.compile("to\\s+date\\s*?:\\s*?([0-9]{2}[./][0-9]{2}[./][0-9]{2,4})", Pattern.CASE_INSENSITIVE);

(no need to escape the . inside a character class, btw)

2012-04-04 07:14
by Bart Kiers
Ok, "find" is the trick. Btw: You are right, I can omit the escape characters inside the character class, didn't know that - Bevor 2012-04-04 07:32


0

I think this answer from a different question also answers your question: Why do regular expressions in Java and Perl act differently?

2012-04-04 07:19
by Sandro
Ads