The $<digit> and $+ variables
When expressions enclosed in parentheses appear inside a regular expresion, the matching substrings are assigned to the special variables $1, $2, $3,… up to $9.
$+ is the special variable that represent the substring matching the last expression enclosed in parentheses.
For instance:
1 2 3 4 5 6 7 8 9 |
my $string = "There are several words in this string, a date 2013-01-27 and a number 123"; if ($string =~ m/^.* several ([^ ]*) .* (2013[^ ]*) (.*)$/) { print "The word after 'several' is: " . $1 . "\n"; print "The date is: " . $2 . "\n"; print "The trailing text after the date is: " . $3 . "\n"; print "The string matching the last expression is: " . $+ . "\n"; } |
These sample sentences produce the following output:
1 2 3 4 5 6 |
The word after 'several' is: words The date is: 2013-01-27 The trailing text after the date is: and a number 123 The string matching the last expression is: and a number 123 |
The $, $& and $' variables
The $& variable holds the last match found when a regular expression has been evaluated.
The $ (dollar-backtick) variable holds the substring preceding the matched string (prematch)
The $’ (dollar-apostrophe) variable holds the substring after the matched string (postmatch).
The sample code below:
1 2 3 4 5 6 7 8 |
my $string = "There are several words in this string, a date 2013-01-27 and a number 123"; if ($string =~ m/2013[^ ]*/) { print "Prematch: " . $` . "\n"; print "Match: " . $& . "\n"; print "Postmatch: " . $' . "\n"; } |
generates the following output:
1 2 3 4 5 |
Prematch: There are several words in this sstring, a date Match: 2013-01-27 Postmatch: and a number 123 |
The ${^MATCH}, ${^PREMATCH} and ${^POSTMATCH} variables
Due to the way the perl interpreter has been implemented, the use of the variables $`, $& and $’ anywhere in a script causes a noticeable degradation in performance in the processing of all the regular expressions in it.
This problem can be avoided replacing those variables with the variables ${^MATCH}, ${^PREMATCH} and ${^POSTMATCH}, and adding the “/p” modifier to the regular expression. The previous example could be rewritten as:
1 2 3 4 5 6 7 8 |
my $string = "There are several words in this string, a date 2013-01-27 and a number 123"; if ($string =~ m/2013[^ ]*/) { print "Prematch: " . ${^PREMATCH} . "\n"; print "Match: " . ${^MATCH} . "\n"; print "Postmatch: " . ${^POSTMATCH} . "\n"; } |
THe @- and @+ variables
The @- y @+ arrays hold the indexes to the beginning and end of each of the matching substrings in a regular expression.
@-[0] y @+[0] are the indexes of the substring matching the full regular expression. The next entries hold the indexes for the first, second, etc. subexpressions inside the regular expression.
For instance, the following sample code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
my $string= "There are several words in this string, a date 2013-01-27 and a number 123"; if ($string=~ m/several ([^ ]*) .* (2013[^ ]*)(.*)/) { print "The word after "several" is: " . $1 . "\n"; print "The date is: " . $2 . "\n"; print "The trailing text after the date is: " . $3 . "\n"; print '$-[0]: ' . $-[0] . ', $+[0]: ' . $+[0] . "\n"; print "substring(" . $-[0] . "," . $+[0] . "): " . substr($string, $-[0], $+[0]-$-[0]) . "\n"; print '$-[1]: ' . $-[1] . ', $+[1]: ' . $+[1] . "\n"; print "substring(" . $-[1] . "," . $+[1] . "): " . substr($string, $-[1], $+[1]-$-[1]) . "\n"; print '$-[2]: ' . $-[2] . ', $+[2]: ' . $+[2] . "\n"; print "substring(" . $-[2] . "," . $+[2] . "): " . substr($string, $-[2], $+[2]-$-[2]) . "\n"; } |
Produces as output:
1 2 3 4 5 6 7 8 9 10 11 |
The word after 'several' is: words The date is: 2013-01-27 The trailing text after the date is: and a number 123 $-[0]: 10, $+[0]: 74 substring(10,74): several words in this string, a date 2013-01-27 and a number 123 $-[1]: 18, $+[1]: 23 substring(18,23): words $-[2]: 47, $+[2]: 57 substring(47,57): 2013-01-27 |
The %+ and %- variables
In a regular expression, it is possible to assign a name to a subexpression enclosed in parentheses (…). This is done using the syntax (?<name>…).
The %+ hash variable gives access to each of the matching substrings of named subexpressions.
Example:
1 2 3 4 5 6 7 |
my $string = "There are several words in this string, a date 2013-01-27 and a number 123"; if ($string =~ m/^.* several (?<word>[^ ]*) .* (?<date>2013[^ ]*) (?<trailing>.*)$/) { print "The word after 'several' is: " . $+{word} . "\n"; print "The date is: " . $+{date} . "\n"; print "The trailing text after the date is: " . $+{trailing} . "\n"; |
The %- variable works like %+, but the values in it are arrays. Each of these arrays holds all the matching substrings for the same named subexpression.
Example:
1 2 3 4 5 6 7 |
my $string = "There are several words in this string, a date 2013-01-27 and a number 123"; if ($string =~ m/^.* several (?<data>[^ ]*) .* (?<data>2013[^ ]*) (?<data>.*)$/) { print "The word after 'several' is: " . $-{data}[0] . "\n"; print "The date is: " . $-{data}[1] . "\n"; print "The trailing text after the date is: " . $-{data}[2] . "\n"; |
The $^N variable
This special variable holds the matching substring for the last subexpression in parentheses processed, and is mainly used inside a regular expresion, in subexpressions like (?{ $variable = $^N}) to assign the string to a variable
Example:
1 2 3 4 5 6 7 8 9 |
my $string = "There are several words in this string, a date 2013-01-27 and a number 123"; my $word; if ($string =~ m/^.* several ([^ ]*)(?{$word = $^N}) .* (?<data>2013[^ ]*) (?<data>.*)$/) { print "The word after 'several' is: " . $word . "\n"; print "The date is: " . $-{data}[0] . "\n"; print "The trailing text after the date is: " . $-{data}[1] . "\n"; } |
In the example above, the result of evaluating the first subexpression “([^ ]*)” is assigned to the variable”$word”.
—
Index of posts related to Perl programming
—