Twitter LinkedIn Google+
WP Greet Box icon

Welcome back, visitor!. You might want to subscribe to the RSS feed for online marketing info as Todd posts it.

Switch Reading StyleNighttimeDaytime

Blocking the Google Web Accelerator

Online marketing information can change quickly This article is 9 years and 104 days old, and the facts and opinions contained in it may be out of date.

Thanks to Justin – the mad modrewriter of macwoms.com for the scoop. A lot of this is over my head personally, but I know Justin is a smart chap and wouldn’t lead ya astray.

Specific Information for Allowing or Blocking the *Google Web Accelerator.

What is the Google Web Accelerator?
Discussion at threadwatch and webmasterworld.com.

The Google Web Accelerator is an install that is in it’s beta testing stage for Win XP or Win 2000 SP3+, running IE 5.5+ or Firefox 1.0+.

Without too much detail, the basic design of the Google Web Accelerator is to prefetch and/or cache web pages and then send them to your
browser in a compressed format. Though the process may prove slightly improve load speed of dial-up connections, the product is designed for
broadband connections. For more information on how the product works, see Google Web Accelerator. To know how this affects your privacy you
can see Google Web Accelerator Privacy.

If you are a webmaster trying to keep log accuracy, or who does not wish to have their pages prefetched, the following information should
help you understand how the headers presented by the Google Web
Accelerator work and what adjustments you may need to make to ensure server log accuracy.

The Basics of Blocking the Google Web Accelerator on a Per Site or Directory Basis.

1. How the headers work:

This is an example of a server log file:

00.000.00.00 - - [03/May/2005:22:28:25 -0500] "GET /page.html HTTP/1.1"
200 2850 "http://www.other-site.com" "Mozilla/5.0 (Windows; U; Windows
NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1"

This is an example of the exchange that took place to generate the above log:

Initial Request for a web page:

Host: www.anysite.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3)
Gecko/20041001 Firefox/0.10.1
Accept: text/html,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://www.other-site.com/

Your Server Response to the web page request:

HTTP/1.1 200 OK
Date: Sun, 8 May 2005 11:55:45 GMT
Server: Apache/1.3.31
Cache-Control: max-age=90000
Expires: Mon, 9 May 2005 11:55:45 GMT
Last-Modified: Sat, 19 Jun 2004 15:25:10 GMT
ETag: "7b80d9-891-40d52ad7"
Accept-Ranges: bytes
Content-Length: 2850
Keep-Alive: timeout=15, max=99
Connection: Keep-Alive
Content-Type: image/gif

Google indicates (See item #5) when they send the initial request string (top) they will be appending their string for prefetch headers,
with X-moz: prefetch. By adding this string to the prefetch header, not to all header requests, they allow for the blocking of the prefetch on
a site by site, or directory by directory basis, while still allowing full navigation of the site.

How? There is a custom header appended to the page request string, so the new request to prefetch a page will look something like this:

Appended Google Web Accelerator Request:

Host: www.anysite.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3)
Gecko/20041001 Firefox/0.10.1
Accept: text/html,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://www.other-site.com/
X-moz: prefetch

For a ‘regular’ click on a link (HTTPS, URL with a query string, or no prefetch present) their header string will still be the original
example, without the ‘prefetch’ string appended.:

Host: www.anysite.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3)
Gecko/20041001 Firefox/0.10.1
Accept: text/html,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://www.other-site.com/

Therefore, even though it will not appear in logs, (because logs are a summary of the connection, not the full connection string) the prefetch
can be blocked with the following line of code:

RewriteEngine On
SetEnvIf X-moz prefetch HAS_X-moz
RewriteCond %{ENV:HAS_X-moz} prefetch
RewriteRule . [F,L]

This ruleset specifically checks the original, full header request, for the X-moz: prefetch string, and if present this access is forbiden. But
the user experience is not changed (other than no prefetch) because the Google Web Accelerator is prefetching the page from the preceding page
before the actual request is made. So, when the user request is made the page is not in the cache, the request string for the page is made
with a ‘regular’ or ‘non-appended’ header, which causes the ruleset to fail and the content will then be served as usual.

Also, since the requests from the proxy are HTTP compliant caching can be blocked by using the following .htaccess code:

Header unset cache-control:
Header append cache-control: “private, no-cache, must-revalidate”

Ensure your logs and geo-targeting information are correct

There is another interesting aspcet of the implementation of the Google Web Accelerator. As a proxy server they are actually sending the
request for the page, not the end user (or viewer), this could have a negative impact on logging and geo-targeting by websites, so the team
at Google implemented a solution to allow webmasters to continue
business-as-usual even when those viewing a web page are using the product.

How? They are appending another header to the string. The second header they are appending is an X-Forwarded-For header. What’s the catch and
how can this effect server logs? The catch is if your server does not have mod_extract_forwarded installed, and configured to accept incoming
headers for IP addresses, the only address that will show in log files is that of the proxy the user is using.

To ensure loging of your visitors is correct and any geo-targeting is included, you should do the following:

  1. Ensure with your host mod_extract_forwarded is installed and available. (If this is not available, explain the situation and ask
    them to load the module.)
  2. Ask your host if the AddAcceptForwarder directive is active.

If the module is loaded, but AddAcceptForwarder is not currently
active, you are in luck… You do not need to talk to your host any more, because you have the ability to set the AddAcceptForwarder
directive on a per site or directory basis through the use of
.htaccess.

Here’s how to do it:

  1. Make sure you have AllowOverride set to Options
    AllowOverride Options
  2. Determine where you would like to accept the X-Forwarded-For headers from. (This can be set to all, Specific IP or Range, or Specific URL)
    AddAcceptForwarder all[OR]AddAcceptForwarder 000.000.000.00
  • If set to accept all, to avoid IP ‘spoofing’ you may need to set the
    AllowForwarderCaching directive to off for sensitive locations.
    AllowForwarderCaching off

    These are the basics of blocking any unwanted prefetching, and to ensure your logs and geo-targeting go undisturbed. As with any
    potential security issues, it is highly recommended you consult a professional, and fully understand your options, before implementing
    any solution.

    If you would like to enlist professional mod-rewrite services please,
    visit macwoms.com and fill out the contact form.

    *Google is a Trade Mark of Google, Inc.

    Related threads

    Tags: , Web Accelerator

  • More information about Todd Malicoat aka stuntdubl.

    Twitter LinkedIn Google+ 

    Buffer