Problem: Sufficiently long regular expressions start to look like line noise and become unmaintainable
Solution: Use perl regular expressions with the /x modifier
perlre manpage reads:
/x
Extend your pattern’s legibility by permitting whitespace and comments.
These are usually written as “the /x modifier”…
The /x modifier itself needs a little more explanation. It tells the
regular expression parser to ignore whitespace that is neither
backslashed nor within a character class. You can use this to break up
your regular expression into (slightly) more readable parts.
The # character is also treated as a metacharacter introducing a
comment, just as in ordinary Perl code. This also means that if you
want real whitespace or # characters in the pattern (outside of a
character class, where they are unaffected by /x), that you’ll either
have to escape them or encode them using octal or hex escapes. Taken
together, these features go a long way towards making Perl’s regular
expressions more readable. Note that you have to be careful not to
include the pattern delimiter in the comment–perl has no way of
knowing you did not intend to close the pattern early. See the
C-comment deletion code in the perlop manpage.
For example, a regex for model numbers like X11 or Xa, Xb, Xc (X and
2 digits; or X and one letter: a, b or c)
— start example —
#!/usr/bin/perl
#$string = “Xa”;
$string = “Xbad”;
if ( $string =~ /
^X # model number must start with capital X
(dd) # and be followed by exactly two digits, or
|
(a|b|c) # one of three letters: a, b or c
$ # with no other text
/x ) {
print “$string is a valid model numbern”;
} else {
print “$string is not a valid model numbern”;
}
— end example —
is more readable and maintainable than
/^X(dd)|(a|b|c)$/
The difference is more pronounced as your regex gets longer.
This is a VERY useful feature if you have to have a complicated regex.