Mar 072015
 
Article PHP

There is a wealth of information about the performance of a website available from Google Webmaster Tools. The GWT frontend gives the possibility to manually export this information as several reports in CSV format.

But, to achieve a full integration with an external metrics and monitoring system (for instance, to be able to generate alerts when some conditions are met), a procedure to perform automatically the download of this information is required.

This post reviews a free and open source utility named GWTData that implements this functionality.

1. Downloading the application

GWTData is available on github. To download it, point your browser to the page:

https://github.com/eyecatchup/php-webmaster-tools-downloads

The page contains a link for the download of a zip file “php-webmaster-tools-downloads-master.zip”. The file “gwtdata.php”, with the implementation of the GWTdata class, can be extracted from the zip package.

2. Sample report download script

To download a report from GWT, we will write a script “download.php” in the same directory where the “gwtdata.php” file has been extracted, with the following content:

You will need to enter a valid GWT username/password in lines 4 and 5, and the web site being analyzed in line 8.

Besides, in line 13 the $tables array specifies the set of reports to download. Currently, the GWTData class does not permit to specify more than two reports in this array, and fails silently if this limit is exceeded.

3. Enable GWT access to “less secure” applications

The first time the report download script is executed, it might happen that it just exits with the message “Login incorrect”, even if the correct user and password were entered in lines 4 and 5. This may be due to the account not allowing access to “less secure applications”. If this is the case, Google will have sent an email message warning about the failed access attempt:

The message explains how to fix this issue, going to the account’s security configuration page at the url;

https://www.google.com/settings/security/lesssecureapps

enable-lesssecure-app

 

and checking the “Turn on” option.

4. Downloadable reports

4.1. TOP_PAGES

The file downloaded is a report in CSV format named TOP_PAGES-website-YYYYmmDD-HHMMSS.csv, containing the list of pages of the web site that received most visits.

Example:

TOP_PAGES-blog.openalfa.com-20150212-191957.csv

4.2. TOP_QUERIES

This report, also in CSV format, contains the search strings made by users on Google that resulted in SERPs containing links to URLs of the site, and the number of times those links were clicked.

Example:

TOP_QUERIES-blog.openalfa.com-20150212-191957.csv

4.3. CONTENT_KEYWORDS

This report contains a set of records with the following fields:

  • Keyword – A keyword that appears in the site’s content
  • Occurences – The number of times the keyword appears in different pages of the site
  • Variants encountered – Alternative spellings and synonyms of the keyword
  • Top URLs – The URLs of the pages where the keyword is more relevant. The value is an array enclosed in square brackets “[” and “]”, and urls are separated by a colon characters “:”

Ejemplo:

CONTENT_KEYWORDS-blog.openalfa.com-20150212-152943.csv

Note: The sample entry above is a single line, but it has been presented here as several lines for legibility.

In the sample entry, the “blog” keyword, appearing either as “blog” or as “blogs”, was found 2448 times in the site’s content indexed by Google.

4.4. INTERNAL_LINKS

This report contains the site’s pages that are linked from other pages of the site, and the total number of these type of links they receive.

Example:

INTERNAL_LINKS-blog.openalfa.com-20150213-154247.csv

4.5. EXTERNAL_LINKS

This is a report of the domains with links to pages of the site, the number of inbound links found on them, and the number of pages linked.

Example:

EXTERNAL_LINKS-blog.openalfa.com-20150213-154247.csv

In this sample report, the first record says that there are 18 links to blog.openalfa.com found in pages in the domain blogspot.com, pointing to 7 different pages in blog.openalfa.com.

4.6. LATEST_BACKLINKS

This report is a list of the most recently found links to the site’s pages from other domains, with the discovery date.

Example:

LATEST_BACKLINKS-blog.openalfa.com-20150213-154247.csv

Downloading the report of crawling errors

The current version of the GWTdata class does not allow the download of the report on errors encountered by Googlebot while crawling the site (CRAWL_ERRORS). The author has moved this functionality to a specific script, that can be downloaded from:

https://github.com/eyecatchup/GWT_CrawlErrors-php

 Posted by at 8:53 am

 Leave a Reply

(required)

(required)