
The process flow of a perl script (or a program in any other language, actually) usually starts reading information for one or several sources: files on disk, databases, network connections, input from the keyboard… Next, it processes this information, and finally delivers the result writing it to a file, storing it in a database, sending it through a network connection, displaying it on screen,…
In some cases, this flow can be split into several sub-tasks, each of which follows the same schema of reading, processing and writing the data.
In these cases, the performance of the script can be greately enhanced if these tasks are executed in parallel. In that way, while some sub-tasks may be waiting to receive information from the network, some others may be using the CPU to perform the processing of data, and some others may be waiting for a disk I/O operation to finish.
This post explains how to program a perl script to execute concurrently several subprocesses, thus optimizing the usage of the available resources.
The fork() function
The functionality we are looking for is obtained by means of the fork() function. This function creates a child subprocess that runs independently from the parent process. The child process inherits a copy of the environment of the parent, including the variables that have been defined and their values, file handles, database connections, etc…
Example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
#!/usr/bin/perl # my $pid; $pid = fork; die "Error in fork: $!" unless defined $pid; if ($pid) { # Code executed by the parent process ... } else { # Code executed by the child process ... exit; } # Wait until the child exits 1 while (wait() != -1); # Continue executing the parent process ... |
The call to fork returns to the parent process a numeric identifier of the child process, and returns to the child process a zero. This is used in the script to split, with the “if ($pid)” sentence, the code executed by the parent and the code executed by the child.
This mechanism can be used to spawn several children from the same parent process. Each of them will be identified by a unique value of $pid.
Finally, the parent process can use a call to wait() to suspend its execution until all the children finish and exit.
Waiting for the child
Normally, the parent process should wait until all the children spawned have exited, before exiting itself. This is done with the call to wait(), as shown in the previous example. wait() suspends the execution of the parent process until one of its children exits, and returns the PID of that child. If there are no child processes to wait for, wait() returns -1.
There is also a built-in function waitpid() that is to be used in case the parent needs to wait for a specific child. The function receives as argument the process ID of the child to be waited for, and a second argument that is either zero for a blocking call, or WNOHANG to make the call non-blocking:
1 2 3 4 5 6 7 8 9 10 11 12 |
# Include the module where WNOHANG is defined use POSIX ":sys_wait_h"; # Is the child still running ? if (waitpid($pid,WNOHANG) == 0) { # The child is still running. Do some stuff ... # Wait until the child exits waitpid($pid,0); } |
If the first argumento in the call to waitpid is -1 instead of the PID of a child process, it will wait for any child process to finish. That is, a call to waitpid(-1,0) is equivalent to a call to wait();
The waitpid call can also be used passing it a PID of -1 and flag WNOHANG. In this case, the call returns immediately with a value:
- zero if no child has finished yet
- the PID of one of the child processed that have exited since the last call.
- -1 if there are no remaining child processes to be waited for
Passing data between parent and child processes
fork() creates a copy of the environment of the parent process. In this way, the variables that were defined in the parent process are also defined in the child process, with the same values assigned to them. But changes to the values of those variables done in the child are not visible to the parent.
To be able to exchange data between both processes, a communication channel can be established using “pipes”. The open() built-in function can be called with the special arguments “-|” or “|-” to create the pipes:
When open() is called with the argument “-|”, it creates a subprocess in the same way as fork() does. But, besides, it returns to the parent an I/O handle connected to the child’s STDOUT. In this way, what the child process writes to STDOUT can be read by the parent process.
Example 1. Communication child -> parent:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
my $message; if (open(FROM_CHILD, "-|")) { # The call to open creates a subprocess. The value returned to the parent is non-zero # Read a message sent by the child process. $message = <FROM_CHILD>; } else { # If the value returned is zero, we are in the child process # write a message to the parent print STDOUT "Hi, I am the child process"; exit; } # The parent process waits here until the child process exits wait(); print "My child told me: " . $message . "\n"; |
If the call to open() is done passing “|-” as argument, the pipe is opened as a unidirectional communication channel parent -> child. What the parent writes to the pipe handle can be read from STDIN by the child.
Example 2. Communication parent -> child:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
my $message; if (open(TO_CHILD, "|-")) { # The code below is executed in the parent process # The parent writes a message to the pipe, to be read by the child print TO_CHILD "Hi, this is a message from the parent process"; } else { # The code below is executed in the child process # The child reads a message from STDIN my $message = <STDIN>; print STDOUT "My parent sent a message: $message\n"; exit; } # The parent waits for the child to finish wait(); |
Bidirectional communication with pipes
If required, a bidirectional communication parent <-> child can also be achieved by explicitly using the built-in pipe() function to create two unidirectional pipes. In this case, the function fork() is used again to create the child process.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
use IO::Handle; # create a pipe for the communication parent -> child pipe(FROM_PARENT, TO_CHILD) or die "pipe: $!"; # create a second pipe for the communication child -> parent pipe(FROM_CHILD, TO_PARENT) or die "pipe: $!"; # Set the "autoflush" option in both pipes, to make the information available # for reading immediately after it has been written (disable buffering) TO_CHILD->autoflush(1); TO_PARENT->autoflush(1); if ($pid = fork) { # This code is executed in the parent process # close the sides of the pipes not used in the parent close FROM_PARENT; close TO_PARENT; # Write a message to the child print TO_CHILD "Hi, I am the parent with process ID $$\n"; # Read a message from the child chomp($line = <FROM_CHILD>); print "The parent process with PID $$ received a message: '$line'\n"; close FROM_CHILD; close TO_CHILD; # Wait until the child exits waitpid($pid,0); } else { # This code is executed in the child process. die "Error in fork: $!" unless defined $pid; # close the sides of the pipes not used by the child close FROM_CHILD; close TO_CHILD; # Read a message from the parent chomp($line = <FROM_PARENT>); print "The child process with PID $$ received a message: '$line'\n"; # Write a message to the parent print TO_PARENT "Hi, this is the child process with PID $$\n"; close FROM_PARENT; close TO_PARENT; exit; } |
—