The Java Class Library has two special classes that are used for advanced regular expressions: java.util.regex.Pattern
and java.util.regex.Matcher
. An object of Matcher
has a lot of useful methods to handle regexes and a Pattern
represents a regular expression itself.
Matching a regex
Suppose, we have a text stored in a string variable:
String text = "We use Java to write modern applications";
We need to check whether the text contains "Java" or "java" using a regular expression. There are three stages to do that using Pattern
and Matcher
classes.
1. Create an object of Pattern
passing a regex string to the compile
method.
Pattern pattern = Pattern.compile(".*[Jj]ava.*"); // regex to match "java" or "Java" in a text
2. Create a Matcher
invoking the matcher
method of a Pattern
to create an object for a given string.
Matcher matcher = pattern.matcher(text); // it will match the passed text
3. Invoking the matches
method of the matcher to match the string.
boolean matches = matcher.matches(); // true
The method matches
of a Matcher
works like the method of the same name of a String
.
Advantages of Pattern and Matcher classes
Now it may seem there are no reasons to use Pattern
and Matcher
instead of a simple string regex representation. But in fact, there are two main reasons.
- Performance. Actually, the
matches
method of aString
internally invokes thematches
method of aMatcher
, but it also invokesPattern.compile(...)
every time it executes that is bad for performance. If a pattern is used multiple times, compiling it once will be more efficient than invoking this method every time.
- Rich API. An object of
Matcher
has a lot of useful methods to process strings, and aPattern
can be conveniently configured when compiling like enabling case-insensitive matching.
So, if you are going to reuse your regex many times and/or to write something quite complicated - in real practice, it is better to use Pattern
and Matcher
instead of a String
.
Patterns and Modes
As you know, a Pattern
is used to create an object of Matcher
. But if we do not need reusing of a regex, we can just invoke the matches method of the Pattern
class in a single line.
Pattern.matches(".*[Jj]ava.*", "We use Java to write modern applications"); // true
It is similar to invoke the matches
method of a String
but has the same performance problem.
Consider the previous example again. Now, it cannot match words like "JAVA" because it does not ignore the case. Fortunately, there is a special mode Pattern.CASE_INSENSITIVE
that can be set when compiling a Pattern
. It allows you to match regexes ignoring the case.
Pattern pattern = Pattern.compile(".*java.*", Pattern.CASE_INSENSITIVE);
String text = "We use Java to write modern applications";
Matcher matcher = pattern.matcher(text);
System.out.println(matcher.matches()); // true
Another mode is Pattern.DOTALL
that forces the dot .
match all characters, including line breaks.Note: to use the case-insensitive mode without a Matcher
, just add (?i)
to the start of a regex. To make the dot character match newlines, add (?s)
. You can add both modes together like (?is)
.
Pattern.matches("(?is).*java.*", "\n\nJAVA\n\n"); // true
There are also other modes, but we will not consider them. See documentation for details.
The matches and find methods
An instance of Matcher
has a lot of useful methods. In this lesson, we will consider only a few of them.
The method matches
returns true
only if the whole string matches the pattern, otherwise, it returns false
. We have already seen the method above. It works in the same way, as the matches
method of a String
.
A Matcher
also has the find
method that is similar to the matches
method, but it tries to find a substring that matches the pattern instead of matching the whole string. Look at the following example to understand the difference between these methods.
String text = "Regex is a powerful tool for programmers";
Pattern pattern = Pattern.compile("tool");
Matcher matcher = pattern.matcher(text);
System.out.println(matcher.matches()); // false, the whole string does not match the pattern
System.out.println(matcher.find()); // true, there is a substring that matches the pattern
We can modify the behavior of the find
method to use it like matches
if we add special characters to a regular expression. To specify the find
method should match from the beginning of a string we can add the ^
character to the start of a regex. To specify it should match at the end of a string, we can add the $
character to the end of a regex.
Pattern pattern = Pattern.compile("^tool$");
Matcher matcher = pattern.matcher(text);
System.out.println(matcher.matches()); // false
System.out.println(matcher.find()); // false
By default, both methods matches
and find
looks at all characters of a string. But it is possible to specify the range to be matched invoking the range
method with the start (inclusive) and the end (exclusive) indexes.
Conclusion
There are several ways to process regexes: calling methods of a String
, and using Pattern
and Matcher
. The second way is more efficient than the first and it also provides a set of useful methods for configuring and processing regexes. There are two important methods matches
and find
which has an important difference. The matches
method match the whole string as well as the same method of a String
, the find
method tries to match a substring to the regex.