Oct 212013

The $<digit> and $+ variables

When expressions enclosed in parentheses appear inside a regular expresion, the matching substrings are assigned to the special variables $1, $2, $3,… up to $9.

$+ is the special variable that represent the substring matching the last expression enclosed in parentheses.

For instance:

These sample sentences produce the following output:

The $, $& and $' variables

The $& variable holds the last match found when a regular expression has been evaluated.

The $ (dollar-backtick) variable holds the substring preceding the matched string (prematch)

The $’ (dollar-apostrophe) variable holds the substring after the matched string (postmatch).

The sample code below:

generates the following output:

The ${^MATCH}, ${^PREMATCH} and ${^POSTMATCH} variables

Due to the way the perl interpreter has been implemented, the use of the variables $`, $& and $’ anywhere in a script causes a noticeable degradation in performance in the processing of all the regular expressions in it.

This problem can be avoided replacing those variables with the variables ${^MATCH}, ${^PREMATCH} and ${^POSTMATCH}, and adding the  “/p” modifier to the regular expression. The previous example could be rewritten as:

THe @- and @+ variables

The @- y @+ arrays hold the indexes to the beginning and end of each of the matching substrings in a regular expression.

@-[0] y @+[0] are the indexes of the substring matching the full regular expression. The next entries hold the indexes for the first, second, etc. subexpressions inside the regular expression.

For instance, the following sample code:

Produces as output:

The %+ and %- variables

In a regular expression, it is possible to assign a name to a subexpression enclosed in parentheses (…). This is done using the syntax (?<name>…).

The %+ hash variable gives access to each of the matching substrings of named subexpressions.


The %- variable works like %+, but the values in it are arrays. Each of these arrays holds all the matching substrings for the same named subexpression.


The $^N variable

This special variable holds the matching substring for the last subexpression  in parentheses processed, and is mainly used inside a regular expresion, in subexpressions like (?{ $variable = $^N}) to assign the string to a variable


In the example above, the result of evaluating the first subexpression “([^ ]*)” is assigned to the variable”$word”.

Index of posts related to Perl programming

 Posted by at 2:23 pm
Oct 212013

The $_ variable

$_ is the default variable in perl when a variable is required, and no other variable has been explicitly specified:

  • In a while or foreach loop (that expect an array as argument)
  • When a regular expression is evaluated
  • In many intrinsic functions that expect at least one input argument

Continue reading »

 Posted by at 9:21 am
Oct 202013

A computer program can execute several tasks simultaneously using the technique known as “multithreading”. With multithreading, a thread can be executing a CPU intensive process, while other is executing a disk I/O operation. A third thread is running a query against a database, that might reside in a different computer, and a fourth thread may be busy downloading content from the network. In this way, the usage of resources available is optimized, and the total execution time is reduced.

This post reviews the basic concepts of multithreading, with some sample perl scripts that implement this technique.

Continue reading »

 Posted by at 3:52 pm
Jun 272013

A web site whose pages are dynamically generated can be slow and appear unresponsive, if the process involved in generating the content involves heavy queries to a database of other cpu intensive tasks.

This post explains a procedure to save the pre-processed pages in a disk cache, and configure the web server to deliver the cached pages on subsequent requests, avoiding the overhead of generating again the same content each time a new request is received.

Continue reading »

 Posted by at 11:59 am
Jan 142013

Sometimes, we need to process large text files (for instance, the log files of a web server).

But, for some purposes, such as the generation of statistical reports, there is no need to process the whole file. Instead, a representative sample is  enough to generate a meaningful result, reducing the processing time and resources involved.

For this, it is normally a requirement to get a fully random sample. In this post a couple of methods to achieve this are presented.

Continue reading »

 Posted by at 8:43 pm
Nov 272012

The “alarm” function available in perl triggers the execution of a subroutine after a specified time has elapsed.

Making use of this function, it is possible to implement a function that asks the user to type some input, and times out if no answer has been received after a given time, continuing the execution of the script.

The function can be implemented using this code

Continue reading »

 Posted by at 9:15 pm
Oct 312012

In our previous post we have explained how to process a file in XML format using the XML::Simple module from CPAN.

However, that module works by reading the whole file in memory. This is not suitable if the file to be processed is large and the RAM memory resources available are limited.

Instead, we can use the XML::Parse::PerlSAX module (SAX stands for “Simple API for XML”). Using this module, the file is read as a data stream, and events such as “start of element”, “end of element”, etc. are generated. The programmer needs only to provide an event handler package implementing methods to process these standard events.

Continue reading »

 Posted by at 8:05 pm