Mar 172013
 
Article PHP

An XML sitemap is a document that lists the URLs in a web site, with information about their relative importance and how often they are modified. This information is taken into account by search engines, such as Google and Yahoo, to adjust their crawling rate to index the site.

This post explains how to automate the generation of the sitemap and its submission to Google and Bing

Format of a XML sitemap

Below is a sample XML document showing the structure of this type of document:

As we can see, there is a root element “<urlset>”, and a set of  “<url>” elements under it. Each of these “<url>” elements must include a “<loc>” subelement, and can optionally include “<lastmod>”, “<changefreq>” and “<priority>” elements.

Sitemap generation

Every website is structured in a different way, and so there is not a single sitemap generation procedure valid for all. Forthermore, part of the inormation in the sitemap, such as the change frequency or the priority of a page, must be specified somehow.

Nevertheless, there are many tools that help in generating the sitemap. Some of those tools are presented in the page:

https://code.google.com/p/sitemap-generators/wiki/SitemapGenerators

Limits in the size of a sitemap – sitemap index

Google imposes an upper limit on the size of a single sitemap. The file cannot have more than 50,000 URLs, and the size of the uncompressed file on disk cannot be greater than 50 MByte.

If the number of pages in a web site is above these limits, it is possible to generate several sitemaps, and a “sitemap index” that lists them. The next example is a sitemap index for “www.example.com”, with two sitemaps “sitemap1.xml” and “sitemap2.xml”:

As shown in the example, the sitemaps can be compressed with gzip to save bandwidth.

Submitting the sitemap to Google

There are several ways to send the sitemap to Google:

  • from Webmaster Tools
  • including a reference to it in “robots.txt”.
  • issuing a HTTP GET request for the url:

www.google.com/webmasters/tools/ping?sitemap=sitemap_url

Where sitemap_url is the url of the sitemap, URL-encoded. For instance, if the siemap is located at::

The HTTP GET request to make is:

In PHP, we can use urlencode() to encode the url, and file_get_contents() to make the HTTP request:

Submitting the sitemap to Bing

Bing uses the same mechanism as Google for the sumission of the sitemap. Only the url used for the HTTP request must be changed:

References:

 Posted by at 10:20 am

 Leave a Reply

(required)

(required)