Mar 072015
 
Article PHP

There is a wealth of information about the performance of a website available from Google Webmaster Tools. The GWT frontend gives the possibility to manually export this information as several reports in CSV format.

But, to achieve a full integration with an external metrics and monitoring system (for instance, to be able to generate alerts when some conditions are met), a procedure to perform automatically the download of this information is required.

This post reviews a free and open source utility named GWTData that implements this functionality.

1. Downloading the application

GWTData is available on github. To download it, point your browser to the page:

https://github.com/eyecatchup/php-webmaster-tools-downloads

The page contains a link for the download of a zip file “php-webmaster-tools-downloads-master.zip”. The file “gwtdata.php”, with the implementation of the GWTdata class, can be extracted from the zip package.

2. Sample report download script

To download a report from GWT, we will write a script “download.php” in the same directory where the “gwtdata.php” file has been extracted, with the following content:

<?php
    include 'gwtdata.php';
    try {
        $email = "username@gmail.com";
        $password = "******";

        # don't forget trailing slash!
        $website = "http://www.domain.com/";

        # Valid values are "TOP_PAGES", "TOP_QUERIES", "CONTENT_KEYWORDS", "INTERNAL_LINKS",
        # "EXTERNAL_LINKS", "SOCIAL_ACTIVITY", and "LATEST_BACKLINKS".
        $tables = array("TOP_QUERIES");

        $gdata = new GWTdata();
        if($gdata->LogIn($email, $password) === true)
        {
            $gdata->SetTables($tables);
            $gdata->DownloadCSV($website);
        } else {
            echo "Login incorrect\n";
        }
    } catch (Exception $e) {
        die($e->getMessage());
    }

You will need to enter a valid GWT username/password in lines 4 and 5, and the web site being analyzed in line 8.

Besides, in line 13 the $tables array specifies the set of reports to download. Currently, the GWTData class does not permit to specify more than two reports in this array, and fails silently if this limit is exceeded.

3. Enable GWT access to “less secure” applications

The first time the report download script is executed, it might happen that it just exits with the message “Login incorrect”, even if the correct user and password were entered in lines 4 and 5. This may be due to the account not allowing access to “less secure applications”. If this is the case, Google will have sent an email message warning about the failed access attempt:

Hi Openalfa, 

We recently blocked a sign-in attempt to your Google Account [openalfa@gmail.com]. 

Sign-in attempt details
Date & Time: Thursday 12 February, 8:04 pm CET 
Location: Madrid, Spain 

If this wasn't you
Please review your Account Activity page at https://security.google.com/settings/security/activity to
see if anything looks suspicious. Whoever tried to sign in to your account knows your password;
we recommend that you change it right away. 

If this was you
You can switch to an app made by Google such as Gmail to access your account (recommended) or change
your settings at https://www.google.com/settings/security/lesssecureapps so that your account is no
longer protected by modern security standards. 

To learn more, see https://support.google.com/accounts/answer/6010255.

The message explains how to fix this issue, going to the account’s security configuration page at the url;

https://www.google.com/settings/security/lesssecureapps

enable-lesssecure-app

 

and checking the “Turn on” option.

4. Downloadable reports

4.1. TOP_PAGES

The file downloaded is a report in CSV format named TOP_PAGES-website-YYYYmmDD-HHMMSS.csv, containing the list of pages of the web site that received most visits.

Example:

TOP_PAGES-blog.openalfa.com-20150212-191957.csv

Page,Impressions,Change,Clicks,Change,CTR,Change,Avg. position,Change
http://blog.openalfa.com/como-implementar-wsdl-soap-en-php,4954,40%,821,38%,17%,-0.2,6.0,-0.2
http://blog.openalfa.com/como-renombrar-una-base-de-datos-en-mysql,2515,44%,776,35%,31%,-2.0,3.3,
http://blog.openalfa.com/restricciones-de-clave-externa-en-mysql,5911,42%,709,29%,12%,-1.0,7.6,-0.3
http://blog.openalfa.com/como-cambiar-de-nombre-un-fichero-en-java,2606,36%,685,22%,26%,-3.0,2.8,-0.4
http://blog.openalfa.com/como-leer-y-escribir-ficheros-json-en-java,3651,25%,666,30%,18%,0.6,4.7,-0.2
...

4.2. TOP_QUERIES

This report, also in CSV format, contains the search strings made by users on Google that resulted in SERPs containing links to URLs of the site, and the number of times those links were clicked.

Example:

TOP_QUERIES-blog.openalfa.com-20150212-191957.csv

Query,Impressions,Change,Clicks,Change,CTR,Change,Avg. position,Change
captura de pantalla sony xperia sp,165,-9%,111,5%,67%,9.0,1.1,0.2
captura de pantalla xperia sp,175,-9%,92,-29%,53%,-10,1.2,-0.2
renombrar base de datos mysql,116,13%,83,4%,72%,-6.0,1.0,
xsd,1265,23%,69,28%,5%,0.2,5.4,
...

4.3. CONTENT_KEYWORDS

This report contains a set of records with the following fields:

  • Keyword – A keyword that appears in the site’s content
  • Occurences – The number of times the keyword appears in different pages of the site
  • Variants encountered – Alternative spellings and synonyms of the keyword
  • Top URLs – The URLs of the pages where the keyword is more relevant. The value is an array enclosed in square brackets “[” and “]”, and urls are separated by a colon characters “:”

Ejemplo:

CONTENT_KEYWORDS-blog.openalfa.com-20150212-152943.csv

 Keyword,Occurrences,Variants encountered,Top URLs
 blog,2448,"blog, blogs",[http://blog.openalfa.com/como-configurar-un-feed-rss-en-un-blog-wordpress:
 http://blog.openalfa.com/como-implementar-suscripciones-por-correo-en-wordpress-con-jetpack:
 http://blog.openalfa.com/como-instalar-y-configurar-wordpress:
 http://blog.openalfa.com/introduccion-a-jetpack-de-wordpress:
 http://blog.openalfa.com/como-incluir-un-video-en-un-blog-de-wordpress:
 http://blog.openalfa.com/como-recuperar-un-sitio-wordpress-que-ha-sido-atacado:
 http://blog.openalfa.com/como-hacer-el-seguimiento-de-un-sitio-web-con-subdominios-en-analytics:
 http://blog.openalfa.com/como-instalar-un-blog-multi-idioma-con-wordpress]

Note: The sample entry above is a single line, but it has been presented here as several lines for legibility.

In the sample entry, the “blog” keyword, appearing either as “blog” or as “blogs”, was found 2448 times in the site’s content indexed by Google.

4.4. INTERNAL_LINKS

This report contains the site’s pages that are linked from other pages of the site, and the total number of these type of links they receive.

Example:

INTERNAL_LINKS-blog.openalfa.com-20150213-154247.csv

Target pages,Links
http://blog.openalfa.com/,203
/informacion,201
/indice-de-articulos-sobre-programacion-en-php,52
/indice-de-articulos-sobre-programacion-en-java,22
/como-enviar-emails-desde-un-script-php,14

4.5. EXTERNAL_LINKS

This is a report of the domains with links to pages of the site, the number of inbound links found on them, and the number of pages linked.

Example:

EXTERNAL_LINKS-blog.openalfa.com-20150213-154247.csv

Domains,Links,Linked pages
blogspot.com,18,7
wordpress.com,17,5
tumblr.com,11,1
...

In this sample report, the first record says that there are 18 links to blog.openalfa.com found in pages in the domain blogspot.com, pointing to 7 different pages in blog.openalfa.com.

4.6. LATEST_BACKLINKS

This report is a list of the most recently found links to the site’s pages from other domains, with the discovery date.

Example:

LATEST_BACKLINKS-blog.openalfa.com-20150213-154247.csv

Links,First discovered
"http://forumsmysql.helloresource.com/read.php?71,603681,603681,quote=1",2015-02-08

Downloading the report of crawling errors

The current version of the GWTdata class does not allow the download of the report on errors encountered by Googlebot while crawling the site (CRAWL_ERRORS). The author has moved this functionality to a specific script, that can be downloaded from:

https://github.com/eyecatchup/GWT_CrawlErrors-php

 Posted by at 8:53 am

 Leave a Reply

(required)

(required)