Blocking the Google Web Accelerator
If you're new here, you may want to learn more about me, introduce yourself or check out my best posts. Thanks for visiting!
Thanks to Justin - the mad modrewriter of macwoms.com for the scoop. A lot of this is over my head personally, but I know Justin is a smart chap and wouldn’t lead ya astray.
Specific Information for Allowing or Blocking the *Google Web
Accelerator.
What is the Google Web Accelerator?
Discussion at threadwatch and webmasterworld.com.
The Google Web Accelerator is an install that is in it’s beta testing
stage for Win XP or Win 2000 SP3+, running IE 5.5+ or Firefox 1.0+.
Without too much detail, the basic design of the Google Web Accelerator
is to prefetch and/or cache web pages and then send them to your
browser in a compressed format. Though the process may prove slightly
improve load speed of dial-up connections, the product is designed for
broadband connections. For more information on how the product works,
see Google Web Accelerator. To know how this affects your privacy you
can see Google Web Accelerator Privacy.
If you are a webmaster trying to keep log accuracy, or who does not
wish to have their pages prefetched, the following information should
help you understand how the headers presented by the Google Web
Accelerator work and what adjustments you may need to make to ensure
server log accuracy.
The Basics of Blocking the Google Web Accelerator on a Per Site or
Directory Basis.
1. How the headers work:
This is an example of a server log file:
00.000.00.00 - - [03/May/2005:22:28:25 -0500] “GET /page.html HTTP/1.1″
200 2850 “http://www.other-site.com” “Mozilla/5.0 (Windows; U; Windows
NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1″
This is an example of the exchange that took place to generate the
above log:
Initial Request for a web page:
Host: www.anysite.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3)
Gecko/20041001 Firefox/0.10.1
Accept: text/html,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://www.other-site.com/
Your Server Response to the web page request:
HTTP/1.1 200 OK
Date: Sun, 8 May 2005 11:55:45 GMT
Server: Apache/1.3.31
Cache-Control: max-age=90000
Expires: Mon, 9 May 2005 11:55:45 GMT
Last-Modified: Sat, 19 Jun 2004 15:25:10 GMT
ETag: “7b80d9-891-40d52ad7″
Accept-Ranges: bytes
Content-Length: 2850
Keep-Alive: timeout=15, max=99
Connection: Keep-Alive
Content-Type: image/gif
Google indicates (See item #5) when they send the initial request
string (top) they will be appending their string for prefetch headers,
with X-moz: prefetch. By adding this string to the prefetch header, not
to all header requests, they allow for the blocking of the prefetch on
a site by site, or directory by directory basis, while still allowing
full navigation of the site.
How? There is a custom header appended to the page request string, so
the new request to prefetch a page will look something like this:
Appended Google Web Accelerator Request:
Host: www.anysite.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3)
Gecko/20041001 Firefox/0.10.1
Accept: text/html,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://www.other-site.com/
X-moz: prefetch
For a ‘regular’ click on a link (HTTPS, URL with a query string, or no
prefetch present) their header string will still be the original
example, without the ‘prefetch’ string appended.:
Host: www.anysite.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3)
Gecko/20041001 Firefox/0.10.1
Accept: text/html,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://www.other-site.com/
Therefore, even though it will not appear in logs, (because logs are a
summary of the connection, not the full connection string) the prefetch
can be blocked with the following line of code:
RewriteEngine On
SetEnvIf X-moz prefetch HAS_X-moz
RewriteCond %{ENV:HAS_X-moz} prefetch
RewriteRule . [F,L]
This ruleset specifically checks the original, full header request, for
the X-moz: prefetch string, and if present this access is forbiden. But
the user experience is not changed (other than no prefetch) because the
Google Web Accelerator is prefetching the page from the preceding page
before the actual request is made. So, when the user request is made
the page is not in the cache, the request string for the page is made
with a ‘regular’ or ‘non-appended’ header, which causes the ruleset to
fail and the content will then be served as usual.
Also, since the requests from the proxy are HTTP compliant caching can
be blocked by using the following .htaccess code:
Header unset cache-control:
Header append cache-control: “private, no-cache, must-revalidate”
Ensure your logs and geo-targeting information are correct
There is another interesting aspcet of the implementation of the Google
Web Accelerator. As a proxy server they are actually sending the
request for the page, not the end user (or viewer), this could have a
negative impact on logging and geo-targeting by websites, so the team
at Google implemented a solution to allow webmasters to continue
business-as-usual even when those viewing a web page are using the
product.
How? They are appending another header to the string. The second header
they are appending is an X-Forwarded-For header. What’s the catch and
how can this effect server logs? The catch is if your server does not
have mod_extract_forwarded installed, and configured to accept incoming
headers for IP addresses, the only address that will show in log files
is that of the proxy the user is using.
To ensure loging of your visitors is correct and any geo-targeting is
included, you should do the following:
1. Ensure with your host mod_extract_forwarded is installed and
available. (If this is not available, explain the situation and ask
them to load the module.)
2. Ask your host if the AddAcceptForwarder directive is active.
If the module is loaded, but AddAcceptForwarder is not currently
active, you are in luck… You do not need to talk to your host any
more, because you have the ability to set the AddAcceptForwarder
directive on a per site or directory basis through the use of
.htaccess.
Here’s how to do it:
1. Make sure you have AllowOverride set to Options
AllowOverride Options
2. Determine where you would like to accept the X-Forwarded-For headers
from. (This can be set to all, Specific IP or Range, or Specific URL)
AddAcceptForwarder all
[OR]
AddAcceptForwarder 000.000.000.00
3. If set to accept all, to avoid IP ’spoofing’ you may need to set the
AllowForwarderCaching directive to off for sensitive locations.
AllowForwarderCaching off
These are the basics of blocking any unwanted prefetching, and to
ensure your logs and geo-targeting go undisturbed. As with any
potential security issues, it is highly recommended you consult a
professional, and fully understand your options, before implementing
any solution.
If you would like to enlist professional mod-rewrite services please,
visit macwoms.com and fill out the contact form.
*Google is a Trade Mark of Google, Inc.
Related threads -
- Web AcceleratorSecurity Threats - Threadwatch
- Web Accelerator Pulled - Graywolf
- Slashdot on GWA
- Google Blames WA Problems on Publishers
- Privacy is an Illusion - DG
- WebAccelerator Support from Google
Tags: Google, Web Accelerator











