Monthly Archives for November 2010

 

How To Stop Google Preview From Being Counted In SiteCatalyst – Updated

UPDATE: I have a much better way to block the Google Web Preview bot from being tracked as a visitor in SiteCatalyst. The original solution I had posted here, required a block of code to be placed at the very top of the s_code file, and your account ID was put into a function call. Then when the s_code was fired, a function would first check the user agent of the visitor to see if it was the Google Web Preview bot, and if so then swap out the account ID with a blank value. The SiteCatalyst code would still fire, but when Omniture received the image request it would get discarded because of the missing account ID. The more I thought about this I figured there had to be a better way. No reason to execute all the s_code javascript and fire off the beacon call when I don’t want that visitor (Google) to be tracked. So after a little brainstorming I came up with a new and improved way to do this. Now when the user agent is determined to be the Google Web Preview bot, then the SiteCatalyst code is prevented from even firing (how it should have been originally). Even better, this can now be done by simply adding a tiny bit of code to the plugin’s section of your s_code file, right next to all your other plugin code. Thats it. No other changes need to happen. No code at the top of the page, no adding calls to functions in the account variable. Just cut, paste, and done.

Here is the code. Just add this right next to all of your other plugins.

/*
 *  Block the Google Web Preview Bot from firing SiteCatalyst code
 */
if(s.u.toLowerCase().indexOf('google web preview')!=-1){s.t=function(){}}

If you are using the original version from below make sure you remove it. And as with any code, make sure you fully test it before deploying to a live site.
~kevin

Google Instant Preview, designed to show you a visual preview of your search results, rolled out in early November 2010. You now have the ability to click a small magnifying glass icon next to each search result to get a snapshot of what the page looks like.
Google Instant Preview

Seems like a pretty helpful feature, but how do they do it? Well it would appear that Google has a new spider that crawls the web and takes snapshots of each page in its results. In order for them to get an accurate look at what the page looks like, this new bot needs to able to execute JavaScript. Here is the problem. Since it is executing JavaScript that means it is also firing off the SiteCatalyst code and is being counted as another visitor and is registering page views.

How can you tell if this new Google Web Preview bot hit your site? If you are capturing User Agent you can see it show up in that report:
User Agent Report

NOTE: If you are not capturing user agent and would like to, a super simple way would be to use the SiteCatalyst Dynamic Variable functionality and include s.eVarX=”D=User-Agent”; in your s_code.js file. Just insert the number of the eVar you would like to use (a s.prop would work too) and you are all set.

Another way to see if you are being affected with spider traffic in your report suite from the Google Preview Bot would be to check out a Browser report (Visitor Profile > Technology > Browsers) and filter it to only show visitors using Safari 3.1 and then trend it.
Browser Report
We can see that this report suite has recorded about an additional 15,000 visitors over the last week that is just attributed to Safari 3.1. Checking the User Agent we saw earlier, the Google Web Preview bot is registering itself as Safari 3.1.

Now that we can see that the Google Web Preview bot is having an effect on our traffic how do we get rid of it? We could block that bot in our robots.txt file, but I like having that additional functionality available for my visitors in the Google search results. I just don’t want it to execute my SiteCatalyst code. Well here is how to do it.

I call this my bot detection code (real catchy title, right?). I currently have it just set to look for the Google Web Preview bot, but it could easily be modified to exclude other bots that can execute JavaScript. Here is how you implement it. In your s_code file, at the top you will have a s_account variable that contains your report suite id. It will look something like this:

var s_account="dead"

To implement the bot detection code you will want to change that line to include the function call. It should look like this:

var s_account=botCheck("dead")

Pretty simple so far, right? We just added the function call and included our report suite id in it. Next we have a block of code that needs to be added to the plug-ins section of the s_code file:

function botCheck(b){var c=navigator.userAgent.toLowerCase(),a="";a+=c.indexOf("google web preview")!=-1?"":b;return a};

And that’s all there is to it. So how does it work you ask? What it does is it removes your report suite id if it is the Google Web Preview bot that is accessing the page. The SiteCatalyst code will still fire off, but it will not include the report suite id so it will be discarded by SiteCatalyst and it will not affect your metrics.

Want to see it in action? I thought you’d never ask! Check out the page http://webanalyticsland.com/test.php. On this page I have a basic SiteCatalyst implementation, one line of code that displays your user agent, and then I print the results of the SiteCatalyst debugger right to the screen. Opening this page in a standard Firefox browser we can see that the SiteCatalyst code has fired off properly, it has displayed the correct user agent and the report suite id is contained within the image request string.
Test 1
So far so good. Using the User Agent switcher plug-in for Firefox, we can switch out user agent to the one that we found in the SiteCatalyst report to mimic the Google Web Preview bot.
Test 2
We can now see that when we use that bot’s user agent string, the report suite id is missing from the image request call. Any action that happens now will not be recorded in my report suite, and when SiteCatalyst receives this request it will be discarded. I’ve had this running for a few days now and have not found any issues, but since this is a pretty new chunk of code be sure to test it out before using it on your production site.

Enjoy!

Pro Tip: Keep a Solution Design Document For Your SiteCatalyst Implementation

A Solution Design Document is a complete blueprint of your web analytics implementation. It outlines where every variable is set and why. It can really be a lifesaver. You should start keeping a solution design document as soon as you begin your implementation. Every time you add a prop, an evar, an event or any other variable, make sure you add it in there. Every time you make a change to your implementation, make note of it in there. It’s an easy place to go to see what variables are being used for what where they are being set and which ones are still available.

“But do I really need to keep track of everything to that level of detail, just checking the Admin console has been working well for us.” Well have you ever opened a report and find information there that shouldn’t be there?
Bad Report
How did that get in there? It’s being set somewhere in our implementation, but where? What page sets that variable? What function could set it? A quick check of our solution design document and we can see every place were that particular prop is set, well where they should be set. If one of your developers decided to get the implementation and set a variable in a non traditional way, it can be a real nightmare trying to figure out what was done. Without that you could waste a lot of time hunting through code searching for where a random variable is being set.

Here is an example spreadsheet that you can use as a basis to start your Solution Design Document:
Sample Solution Design Document

If you have a large team that all have their hands in your web analytics implementation, it may be beneficial to could place the solution design document in a location where the different members of your team can have access and add to it, like for example Google Docs or some other shared folder or public drive.

Enjoy!