Apr 242012
 
Article Perl

(Leer este artículo en español)

The LWP::UserAgent module provides a simple but flexible way to read the content of a given url in Perl.

The example below shows how to read the content of the url http://www.example.com:

We can see that retrieving the url in this way, we have separate access to the HTTP headers and to the main content. Besides, in case of error we can check the response code.

Specify user and password to retrieve protected pages

LWP::UserAgent also allows to specify a username and password to retrieve the content of protected pages. This is done by inserting the sentence:

Identify the request as coming from a given “user agent”

In the header of every request sent to the server, browsers send a “user agent” string that informs the server about the type and version of browser, and other characteristics that can be used by the server to adapt the response sent back to the browser.

Using the “agent()” method, the request sent by the perl script can be “disguised”, to make the server think that it is coming from different types of browser. For instance:

 

Other possibilities

We could also need to specify a timeout, send cookies or adapt in some other way the HTTP request being sent. The modules LWP::UserAgent and HTTP::Request offer methods that implement this functionality.

 Posted by at 4:35 pm

  One Response to “How to retrieve the content of an URL in Perl”

 Leave a Reply

(required)

(required)