
On a production server, the disk usage tends to grow due to the accumulation of historical data in log files, data stored in the database, etc.
The system administrator normally establishes procedures to compress files and backup old data, intended to keep a reasonable amount of disk space free (in most cases, a minimum of 10% disk free is chosen).
But sometimes, the disk usage may grow unexpectedly of other reasons, such as a failure in a mail server that causes outgoing messages to accumulate in the output queue, or a recurring error in the service that makes a large number of error messages being written to the log files, etc.
In this post we will see how to configure an alert that checks periodically the disk usage, and sends an email if one of the filesystems surveyed has a disk usage greater than a configured threshold.
Unix commands to analyze disk usage
df
On a unix/linux system, the “df” (disk free) command output the status of each of the filesystems in the server. It is normally given the “-h” switch to get the information in a human readable form.
1 2 3 4 5 6 7 8 9 10 11 |
$ df -h Filesystem Size Used Avail Use% Mounted on rootfs 29G 25G 2.0G 93% / udev 10M 188K 9.9M 2% /dev tmpfs 150M 116K 150M 1% /run tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs 557M 0 557M 0% /run/shm none 457G 408G 49G 90% /c /dev/sdb1 101G 77G 20G 80% /data |
In the example above, we can see that the “/” (root) filesystem is 93% full, and the “/c” filesystem is also at 90% of its capacity. The system administrator should analyze the reason for this excessive occupation, and take the required measures to get those filesystems back to a minimum 10% free space.
du
The “du” (disk usage) command outputs the space occupied by single files or whole directory trees. The “-s” (summary) and “-h” (human readable) switches are normally used with this command to get an output easier to read. For instance:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
$ sudo du -sh /var/* 8.1M /var/backups 384M /var/cache 12K /var/games 2.4G /var/lib 4.0K /var/local 0 /var/lock 47M /var/log 424K /var/mail 4.0K /var/opt 0 /var/run 204K /var/spool 4.0K /var/tmp 1.3G /var/www |
Perl modules to retrieve the disk usage
Filesys::Df, Filesys::DfPortable
When it comes to write a system administration script, it is generally much easier to do it in Perl, rather than in a shell scripting language such as sh, csh or bash. The CPAN library includes some modules, such as “Filesys::Df”, that greatly simplify the development of the script.
Note: Windows systems administrators can also find in CPAN the “Filesys::DfPortable” module, that implements for Windows O.S. the same functionality as “Filesys::Df” does for unix/linux.
A basis perl script can be written to have a look at the information retrieved by this module:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
#!/usr/bin/perl -w # Script disk_usage_monitor.pl v0.1 use strict; use Filesys::Df; use Data::Dumper; my @a_filesystems = qw( / /c /data ); foreach my $filesystem ( @a_filesystems ){ my $rh_df = df($filesystem); print "'" . $filesystem . "' used disk space: " . Dumper $rh_df; } |
And the result we get is:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
$ perl disk_usage_monitor.pl '/' used disk space: $VAR1 = { 'user_files' => 1884160, 'bfree' => 3591560, 'blocks' => 29657164, 'user_fused' => 685299, 'files' => 1884160, 'bavail' => 2085052, 'user_favail' => 1198861, 'su_bavail' => 3591560, 'fper' => 36, 'su_favail' => 1198861, 'per' => 93, 'favail' => 1198861, 'fused' => 685299, 'su_blocks' => 29657164, 'user_blocks' => 28150656, 'ffree' => 1198861, 'used' => 26065604, 'su_files' => 1884160, 'user_used' => 26065604, 'user_bavail' => 2085052 }; ... |
We can see that the “df” function returns plenty of information about the filesystem. The data that might be most relevant to out monitoring script are:
- “per” – percent used disk space
- “fper” – percent used i-nodes
We can learn the meaning of the other data in the reference documentation for the module in CPAN.
Filesys::DiskUsage
The Filesys::DiskUsage module from CPAN implements a “du” function that returns the total number of bytes used by files and directory trees.
Using this function, we can write a script to monitor disk usage at a finer granularity:
1 2 3 4 5 6 7 8 9 10 11 |
#!/usr/bin/perl -w use strict; use Filesys::DiskUsage qw /du/; my @a_directories = qw( /etc /lib ); foreach my $directory ( @a_directories ){ my $space = du($directory); print "Directory: " . $directory . ", used disk: " . $space . " bytesn"; } |
The user running this script must be granted read access to the directories and subdirectories being monitored. If the script is executed by a user without these privileges, we get the following result:
1 2 3 4 5 6 7 8 |
$ perl disk_usage_monitor.pl could not open /etc/ssl/private (Permission denied) could not open /etc/cups/ssl (Permission denied) could not open /etc/gconf/gconf.xml.system (Permission denied) Directory: /etc, used disk: 4759997 bytes Directory: /lib, used disk: 111718863 bytes |
Event if the du() function returns a total number of bytes, the space used by the subdirectories for which access is denied is not computed in that total.
Disk usage monitor script
To finish the script, we can add a parameter “min_free_percent” to specify the threshold disk free that triggers an alert if the actual disk free falls below it.
In the loop that check the filesystems, the code to check this value and print a warning is added:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
my $min_free_percent = 10; foreach my $filesystem ( @a_filesystems ){ my $rh_df = df($filesystem); my $used = $rh_df->{per} || 0; my $free = 100-$used; if ( $free <= $min_free_percent ){ printf("WARNING: filesystem '%s' has %d%% free, below %d%%.n", $filesystem, $free, $min_free_percent ); } } |
When the script is run, it outputs:
1 2 3 4 |
$ perl disk_usage_monitor.pl WARNING: filesystem '/' has 7% free, below 10%. |
Automating the execution of the script
Once it is working, we need to automate the periodic execution of the script. On a linux system, we do this by adding an entry to the crontab. For instance, to run it daily at 08:01:
1 2 3 |
01 8 * * * nice -n 5 disk_usage_monitor.pl |
For this to work as expected, the crond daemon must be running on the server.
Besides, crond sends an email to the local user with the output of the script. The mail system must be adequately configure to forward these emails to the intended recipient.
—