Turning perl regular expressions from line noise into documented code |

Problem: Sufficiently long regular expressions start to look like line noise and become unmaintainable

Solution: Use perl regular expressions with the /x modifier

perlre manpage reads:

/x
Extend your pattern’s legibility by permitting whitespace and comments.
These are usually written as “the /x modifier”…

The /x modifier itself needs a little more explanation. It tells the
regular expression parser to ignore whitespace that is neither
backslashed nor within a character class. You can use this to break up
your regular expression into (slightly) more readable parts.

The # character is also treated as a metacharacter introducing a
comment, just as in ordinary Perl code. This also means that if you
want real whitespace or # characters in the pattern (outside of a
character class, where they are unaffected by /x), that you’ll either
have to escape them or encode them using octal or hex escapes. Taken
together, these features go a long way towards making Perl’s regular
expressions more readable. Note that you have to be careful not to
include the pattern delimiter in the comment–perl has no way of
knowing you did not intend to close the pattern early. See the
C-comment deletion code in the perlop manpage.

For example, a regex for model numbers like X11 or Xa, Xb, Xc (X and
2 digits; or X and one letter: a, b or c)

— start example —

#!/usr/bin/perl

#$string = “Xa”;
$string = “Xbad”;

if ( $string =~ /

^X # model number must start with capital X

(dd) # and be followed by exactly two digits, or

(a|b|c) # one of three letters: a, b or c

$ # with no other text

/x ) {

print “$string is a valid model numbern”;
} else {
print “$string is not a valid model numbern”;
}

— end example —

is more readable and maintainable than

/^X(dd)|(a|b|c)$/

The difference is more pronounced as your regex gets longer.

This is a VERY useful feature if you have to have a complicated regex.

Aleksey_Tsalolikhin	spp
Kacoroski, Ski	Damon, Lee
Boris, John	tep
borwick	caseybea
jessetrucks	LOPSA Blogs Admin
McCullough, Mark	hcoyote
stpierre	trey
Matt_Simmons	fatherlinux
nhruby	lois
Constantine, Craig	wnl
jdetke	drich
mhalligan	doug
jlothian	jennine
Philip Kizer	jeremyc
lufthans	English, Paul
moose	warner
Evan_Pettrey	villyard
asachs228	nicolefv
wbilancio	nickanderson
unicityd	mharlow
allberyb	d_white
apthorpe	mdisney
mjulian	solarce
nnmiller	TheDreamer
morgajel	jgsmith