There is something I don't understand in Java's regular expressions. I have the following string (and I need the "to Date"):
From Date :01/11/2011 To Date :30/11/2011;;;;;;;;;;;;;
I think that the following regular expression (in Perl) would have matched.
to\\s+date\\s*?:\\s*?([0-9]{2}[\\./][0-9]{2}[\\./][0-9]{2,4})
In Java, this pattern doesn't match. But it does if I add in front and at the end a .+
So this pattern works in Java:
Pattern p = Pattern.compile(".+to\\s+date\\s*?:\\s*?([0-9]{2}[\\./][0-9]{2}[\\./][0-9]{2,4}).+", Pattern.CASE_INSENSITIVE);
What I don't understand: It would be clear to me that the first pattern would not match in Java if I add a ^
(beginning of the line) and a $
at the end of the line. That would mean, that the pattern has to match the whole line. But without that, the first pattern should actually match, because why does the pattern care about string data which is out of scope of this pattern, if I don't set delimiters in front and at the end? This is not logical to me. In my opinion the first pattern should behave similar to the "contains" method of String class. And I think it is so in Perl.
In Java, matches()
validates the entire string. Your input probably has line breaks in them (which don't get matched by .+
).
Try this instead:
Pattern p = Pattern.compile(".+to\\s+date\\s*?:\\s*?([0-9]{2}[\\./][0-9]{2}[\\./][0-9]{2,4}).+", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher("... \n From Date :01/11/2011 To Date :30/11/2011;;;;;;;;;;;;; \n ...");
System.out.println(m.matches()); // prints false
if(m.find()) {
System.out.println(m.group(1)); // prints 30/11/2011
}
And when using find()
, your can drop the .+
's from the pattern:
Pattern.compile("to\\s+date\\s*?:\\s*?([0-9]{2}[./][0-9]{2}[./][0-9]{2,4})", Pattern.CASE_INSENSITIVE);
(no need to escape the .
inside a character class, btw)
I think this answer from a different question also answers your question: Why do regular expressions in Java and Perl act differently?