Oct 312012
 
Article Perl

There are many perl modules in CPAN implementing different approaches to process  XML-formatted files.

The XML::Simple module reviewed in this post may be the best suited for the simplest cases, such as configuration files of small size, not making use of advanced characteristics of the XML format.

Example file in XML format

XML format is a way of representing in a text file a hierarchical tree of elements.

Each element is optionally assigned:

  • a set of attributes ((key, value) pairs)
  • either a value, of a set of subelements.

In the example below we can see a XML document having a root element named “customers”, with two subelements named “client”:

<?xml version="1.0"?>
 <customers version="3.5" timestamp="2002-05-13 15:33:45">
 <client identifier="62520">
  <name>John</name>
  <surname>Williams</surname>
  <address>
    <street>17 Liberty Ave.</street>
    <locality>Birmingham</locality>
    <province>Birmingham</province>
    <zip>82649</zip>
  </address>
  <email>john.williams@expensive-mail.org</email>
  <age>42</age>
 </client>
 <client identifier="62521">
  <name>Helen</name>
  <surname>Hightower</surname>
   <address>
    <street>2 Flying Saucer</street>
    <locality>Southampton</locality>
    <province>Southampton</province>
    <cp>28001</cp>
   </address>
   <email>elerovw@cyb.org</email>
   <age>37</age>
  </client>
 </customers>

The client subelements, in turn, have subelements “name”, “address”, etc.

There are also attributes “version” and “timestamp” assigned to the “customers” element, and attributes “identifier” assigned to “client” elements.

Reading a XML document with XML::Simple

The XML::Simple module from CPAN reads a whole XML document into a hashref.

Example:

#!/usr/bin/perl

use strict;
use warnings;

use XML::Simple;

# Read the XML document with XMLin, setting the 'forcearray' option
# to make all subelements be referenced by means of arrayrefs
my $data = XMLin('./customers.xml', forcearray=>1);

This code assigns a hashref to the “$data” variable.

Each key in the hastable identifies:

  • an attribute, if the value is a primitive data type
  • an array of subelements if the value is an arrayref

Each subelement in the array can be:

  • a simple value
  • a hashref of subelements of the next level in the hierarchy

The code below can be used to traverse the tree of hashrefs and arrayrefs:

print_element("CUSTOMERS",$data,"");

sub print_element {
  my $name= shift;
  my $element = shift;
  my $indent = shift;
  print $indent . "Element: " . $name. "n";
  $indent .= "    ";
  foreach my $key (keys %$element) {
    if (ref $element->{$key} eq "ARRAY") {
      foreach my $subelement (@{$element->{$key}}) {
        if (ref $subelement eq "HASH") {
          print_element($key, $subelement,$indent);
        } else {
          print $indent . "Element: " . $key . ", Value: " . $subelement . "n";
        }
      }
    } else {
      print $indent . "Attribute: " . $key . ". Value: " . $element->{$key} . "n";
    }
  }
}

 

The subroutine “print_element” is recursive: it calls itself on each subelement of the element being processed. If this code is executed to read the sample xml file above, the output is:

Element: CUSTOMERS
    Attribute: timestamp. Value: 2002-05-13 15:33:45
    Attribute: version. Value: 3.5
    Element: client
        Element: email, Value: john.williams@expensive-mail.org
        Attribute: identifier. Value: 62520
        Element: surname, Value: williams
        Element: age, Value: 42
        Element: address
            Element: province, Value: Birmingham
            Element: street, Value: 17 Liberty Ave.
            Element: zip, Value: 82649
            Element: locality, Value: Birmingham
        Element: name, Value: John
    Element: client
        Element: email, Value: elerovw@cyb.org
        Attribute: identifier. Value: 62521
        Element: surname, Value: Hightower
        Element: age, Value: 37
        Element: address
            Element: province, Value: Southampton
            Element: street, Value: 2 Flying Saucer
            Element: zip, Value: 28001
            Element: locality, Value: Southampton
        Element: name, Value: Helen

Writing a XML document with XML::Simple

The XML::Simple module also implements a method XMLout. This method receives as argument a hashref pointing to data using the same structure as the one returned by XMLin, and converts it into XML format:

print XMLout($clients);
print "n";

Index of posts related to perl programming

 Posted by at 9:14 am

 Leave a Reply

(required)

(required)