Regex in groovy

05 / Nov / 2015 by Aseem Bansal 0 comments

Groovy Language enhancements that help with Regex

Slashy Strings is the first thing that comes to my mind while talking about regex in Groovy. They are a simpler way to represent regex patterns as String compared to their Java counterparts.

If we have to find all numbers in a String we can do something like this in Java

[code]
import java.util.regex.*;

public class Main {

public static void main(String[] args) {

String regex = "\\d+";
Matcher digitMatcher = Pattern.compile(regex).matcher("a0 b12 c13");
while (digitMatcher.find()) {
System.out.println(digitMatcher.group());
}
}

}[/code]

In this we have the regex in the String

[code]
String regex = "\\d+";
[/code]

In a bigger regex the above can quickly get complicated because of backslashes. In Groovy we can simplify it to below

[code]
String regex = /\d+/
assert (/a/).class == String
[/code]

This seems trivial here but this has a benefit. Has someone on StackOverflow told you that they gave you the regex and you need to escape the backslashes to make it work in Java? Not anymore. helps in readability also.

To compile a pattern we have multiple ways. We can use the below which will give us a compiled Pattern.

[code]
~/a/
assert ((~/a/).class == Pattern)
[/code]

This is good in case a Pattern is being pre-compiled for reuse later. But in case we are compiling the pattern every time we can do any of the below

[code]
def matcher = ("a0 b12 c13" =~/\d+/)
assert matcher.class == Matcher

def result = ("a0 b12 c13" ==~/d+/)
assert result.class == Boolean
[/code]

Notice that the String on which search is being done must be on left side and the regex String should be on the right side.

The first form is useful when we want the matcher for some operations e.g. find or any other matcher operation. The second form is useful when we want to do the matches operation. Some of the documentation give the impression that the first form is matcher.find. It’s not as shown by the assert above. It creates a matcher and that seems like a find due to groovy truth associated with Matcher. It can be used like a find but bear the difference in mind.

Groovy GDK enhancements that help with Regex

You might ask why should I keep that in mind? Because you can then use groovy enhancements to the matcher class. Like to find how many digits are there in a String I can do the below which gives me 3.

[code]
("a0 b12 c13" =~ /\d+/).count()
[/code]

Or to find the third digit in the String I can get the result via below.

[code]
("a0 b12 c13" =~ /\d+/)[2]
[/code]

All this because we keep in mind that the find operator actually creates a matcher.

But the best part that I found very useful are the methods added to CharSequence and thus String. Iterating over String running a find can be made much easier by using the below

Finding all dates in a String and getting the date month and year can be as simple as below

[code]
"28-02-1992 15-06-1982".findAll(/(\d+)-(\d+)-(\d+)/) { full, date, month, year ->
println "$full, $date, $month, $year"
}
[/code]

Need to find all digits? No need to iterate over a matcher calling its find method and adding to a list. Just use findAll

[code]
"a0 b11 c13".findAll(/\d+/)
[/code]

which gives us a list [0, 11, 13]

Go ahead and search for regex in CharSequence. You might find that working with regex is much easier in groovy here.

If you are working with regex then take a look at stackoverflow regex FAQ. It contains answers to a lot of common questions related to regex. Looking for a place where regex examples for groovy are present? Check this post.

Hope this helps you in avoiding Java-like code and makes your regex groovier.

FOUND THIS USEFUL? SHARE IT

Tag -

Groovy regex

Leave a Reply

Your email address will not be published. Required fields are marked *