JavaRegular expressions

Patterns and Matcher

The Java Class Library has two special classes that are used for advanced regular expressions: java.util.regex.Pattern and java.util.regex.Matcher. An object of Matcher has a lot of useful methods to handle regexes and a Pattern represents a regular expression itself.

Matching a regex

Suppose, we have a text stored in a string variable:

String text = "We use Java to write modern applications";

We need to check whether the text contains "Java" or "java" using a regular expression. There are three stages to do that using Pattern and Matcher classes.

1. Create an object of Pattern passing a regex string to the compile method.

Pattern pattern = Pattern.compile(".*[Jj]ava.*"); // regex to match "java" or "Java" in a text

2. Create a Matcher invoking the matcher method of a Pattern to create an object for a given string.

Matcher matcher = pattern.matcher(text); // it will match the passed text

3. Invoking the matches method of the matcher to match the string.

boolean matches = matcher.matches(); // true

The method matches of a Matcher works like the method of the same name of a String.

Advantages of Pattern and Matcher classes

Now it may seem there are no reasons to use Pattern and Matcher instead of a simple string regex representation. But in fact, there are two main reasons.

  • Performance. Actually, the matches method of a String internally invokes the matches method of a Matcher, but it also invokes Pattern.compile(...) every time it executes that is bad for performance. If a pattern is used multiple times, compiling it once will be more efficient than invoking this method every time.
  • Rich API. An object of Matcher has a lot of useful methods to process strings, and a Pattern can be conveniently configured when compiling like enabling case-insensitive matching.

So, if you are going to reuse your regex many times and/or to write something quite complicated - in real practice, it is better to use Pattern and Matcher instead of a String.

Patterns and Modes

As you know, a Pattern is used to create an object of Matcher. But if we do not need reusing of a regex, we can just invoke the matches method of the Pattern class in a single line.

Pattern.matches(".*[Jj]ava.*", "We use Java to write modern applications"); // true

It is similar to invoke the matches method of a String but has the same performance problem.

Consider the previous example again. Now, it cannot match words like "JAVA" because it does not ignore the case. Fortunately, there is a special mode Pattern.CASE_INSENSITIVE that can be set when compiling a Pattern. It allows you to match regexes ignoring the case.

Pattern pattern = Pattern.compile(".*java.*", Pattern.CASE_INSENSITIVE);

String text = "We use Java to write modern applications";

Matcher matcher = pattern.matcher(text);

System.out.println(matcher.matches()); // true
Another mode is Pattern.DOTALL that forces the dot . match all characters, including line breaks.

Note: to use the case-insensitive mode without a Matcher, just add (?i) to the start of a regex. To make the dot character match newlines, add (?s). You can add both modes together like (?is).

Pattern.matches("(?is).*java.*", "\n\nJAVA\n\n"); // true

There are also other modes, but we will not consider them. See documentation for details.

The matches and find methods

An instance of Matcher has a lot of useful methods. In this lesson, we will consider only a few of them.

The method matches returns true only if the whole string matches the pattern, otherwise, it returns false. We have already seen the method above. It works in the same way, as the matches method of a String.

A Matcher also has the find method that is similar to the matches method, but it tries to find a substring that matches the pattern instead of matching the whole string. Look at the following example to understand the difference between these methods.

String text = "Regex is a powerful tool for programmers";

Pattern pattern = Pattern.compile("tool");
Matcher matcher = pattern.matcher(text);

System.out.println(matcher.matches()); // false, the whole string does not match the pattern
System.out.println(matcher.find()); // true, there is a substring that matches the pattern

We can modify the behavior of the find method to use it like matches if we add special characters to a regular expression. To specify the find method should match from the beginning of a string we can add the ^ character to the start of a regex. To specify it should match at the end of a string, we can add the $ character to the end of a regex.

Pattern pattern = Pattern.compile("^tool$");
Matcher matcher = pattern.matcher(text);

System.out.println(matcher.matches()); // false
System.out.println(matcher.find());   // false 

By default, both methods matches and find looks at all characters of a string. But it is possible to specify the range to be matched invoking the range method with the start (inclusive) and the end (exclusive) indexes.

Conclusion

There are several ways to process regexes: calling methods of a String, and using Pattern and Matcher. The second way is more efficient than the first and it also provides a set of useful methods for configuring and processing regexes. There are two important methods matches and find which has an important difference. The matches method match the whole string as well as the same method of a String, the find method tries to match a substring to the regex.

How did you like the theory?
Report a typo