May 272014
 
Article Perl

The process flow of a perl script (or a program in any other language, actually) usually starts reading information for one or several sources: files on disk, databases, network connections, input from the keyboard… Next, it processes this information, and finally delivers the result writing it to a file, storing it in a database, sending it through a network connection, displaying it on screen,…

In some cases, this flow can be split into several sub-tasks, each of which follows the same schema of reading, processing and writing the data.

In these cases, the performance of the script can be greately enhanced if these tasks are executed in parallel. In that way, while some sub-tasks may be waiting to receive information from the network, some others may be using the CPU to perform the processing of data, and some others may be waiting for a disk I/O operation to finish.

This post explains how to program a perl script to execute concurrently several subprocesses, thus optimizing the usage of the available resources.

The fork() function

The functionality we are looking for is obtained by means of the fork() function. This function creates a  child subprocess that runs independently from the parent process. The child process inherits a copy of the environment of the parent, including the variables that have been defined and their values, file handles, database connections, etc…

Example:

The call to fork returns to the parent process a numeric identifier of the child process, and returns to the child process a zero. This is used in the script to split, with the “if ($pid)” sentence, the code executed by the parent and the code executed by the child.

This mechanism can be used to spawn several children from the same parent process. Each of them will be identified by a unique value of $pid.

Finally, the parent process can use a call to wait() to suspend its execution until all the children finish and exit.

Waiting for the child

Normally, the parent process should wait until all the children spawned have exited, before exiting itself. This is done with the call to wait(), as shown in the previous example. wait() suspends the execution of the parent process until one of its children exits, and returns the PID of that child. If there are no child processes to wait for, wait() returns -1.

There is also a built-in function waitpid() that is to be used in case the parent needs to wait for a specific child. The function receives as argument the process ID of the child to be waited for, and a second argument that is either zero for a blocking call, or WNOHANG to make the call non-blocking:

If the first argumento in the call to waitpid is -1 instead of the PID of a child process, it will wait for any child process to finish. That is, a call to waitpid(-1,0) is equivalent to a call to wait();

The waitpid call can also be used passing it a PID of -1 and flag WNOHANG. In this case, the call returns immediately with a value:

  • zero if no child has finished yet
  • the PID of one of the child processed that have exited since the last call.
  • -1  if there are no remaining child processes to be waited for

Passing data between parent and child processes

fork() creates a copy of the environment of the parent process. In this way, the variables that were defined in the parent process are also defined in the child process, with the same values assigned to them. But changes to the values of those variables done in the child are not visible to the parent.

To be able to exchange data between both processes, a communication channel can be established using “pipes”.  The open() built-in function can be called with the special arguments “-|” or “|-” to create the pipes:

When open() is called with the argument “-|”, it creates a subprocess in the same way as fork() does. But, besides, it returns to the parent an I/O handle connected to the child’s STDOUT. In this way, what the child process writes to STDOUT can be read by the parent process.

Example 1. Communication child -> parent:

If the call to open() is done passing “|-” as argument, the pipe is opened as a unidirectional communication channel parent -> child. What the parent writes to the pipe handle can be read from STDIN by the child.

Example 2. Communication parent -> child:

Bidirectional communication with pipes

If required, a bidirectional communication parent <-> child can also be achieved by explicitly using the built-in pipe() function to create two unidirectional pipes. In this case, the function fork() is used again to create the child process.

 Posted by at 10:45 am

 Leave a Reply

(required)

(required)