Blogs Media Lab

Building the Walker’s mobile site, part 2 — google analytics without javascript

As I mentioned in my last post on our mobile site, one of the key features for our site was making sure that we don’t use any javascript unless absolutely necessary. If you use Google Analytics  (GA) as your stats package, this poses a problem, since the supported way to run GA is via a […]

ga_mobileAs I mentioned in my last post on our mobile site, one of the key features for our site was making sure that we don’t use any javascript unless absolutely necessary. If you use Google Analytics  (GA) as your stats package, this poses a problem, since the supported way to run GA is via a chunk of javascript at the bottom of every page. And to make matters worse, the ga.js file is not gzipped, so you’re loading 9K which would otherwise be about 4k, on a platform where every byte counts. By contrast, if you could just serve the tracking gif, it is 47 bytes. And no javascript that might not run on B-grade or below devices.

A few weeks ago, Google announced support for analytics inside mobile apps and some cursory support for mobile sites:

Google Analytics now tracks mobile websites and mobile apps so you can better measure your mobile marketing efforts. If you’re optimizing content for mobile users and have created a mobile website, Google Analytics can track traffic to your mobile website from all web-enabled devices, whether or not the device runs JavaScript. This is made possible by adding a server side code snippet to your mobile website which will become available to all accounts in the coming weeks (download snippet instructions). We will be supporting PHP, Perl, JSP and ASPX sites in this release. Of course, you can still track visits to your regular website coming from high-end, Javascript enabled phones.

And that is the extent of the documentation you will find anywhere on Google on how to run analytics without javascript. The code included is handy if you happen to run one of their platforms, but the Walker’s mobile site runs on the python side of AppEngine, so their code doesn’t do us much good. Thankfully, since they provide us with the source, we can without too much trouble, translate the php or perl into python and make it AppEngine friendly.

How it works

Regular Google Analytics works by serving some javascript and a small 1px x 1px gif file to your site from Google. The gif lets Google learn many things from the HTTP request your browser makes, such as your browser, OS, where you came from, your rough geo location, etc. The javascript lets them learn all kinds of nifty things about your screen, flash versions, event that fire, etc. And Google tracks you through a site by setting some cookies on that gif they serve you.

To use GA without javascript, we can still do most of that, and we do it by generating our own gif file and passing some information back to Google through our server. That is, we generate a gif, assign and track our own cookie, and then gather that information as you move through the site, and use a HTTP request with the appropriate query strings and pass it back to Google, which they then compile and treat as regular old analytics.

The Code

To make this work in appeinge, we create a  URL in our webapp that we’ll serve the gif from. I’m using “/ga/”:

[python]
def main():
application = webapp.WSGIApplication(
[('/', home.MainHandler),
# edited out extra lines here
('/ga/', ga.GaHandler),
],
debug=False)
wsgiref.handlers.CGIHandler().run(application)
[/python]

And here’s the big handler for /ga/. I based it mostly off the php and some of the perl (click to expand the full code):

[code lang="python" collapse="true"]
from google.appengine.ext import webapp
from google.appengine.api import urlfetch
import re, hashlib, random, time, datetime, cgi, urllib, uuid

# google analytics stuff
VERSION = "4.4sh"
COOKIE_NAME = "__utmmobile"

# The path the cookie will be available to, edit this to use a different cookie path.
COOKIE_PATH = "/"

# Two years in seconds.
COOKIE_USER_PERSISTENCE = 63072000

GIF_DATA = [
chr(0x47), chr(0x49), chr(0x46), chr(0x38), chr(0x39), chr(0x61),
chr(0x01), chr(0x00), chr(0x01), chr(0x00), chr(0x80), chr(0xff),
chr(0x00), chr(0xff), chr(0xff), chr(0xff), chr(0x00), chr(0x00),
chr(0x00), chr(0x2c), chr(0x00), chr(0x00), chr(0x00), chr(0x00),
chr(0x01), chr(0x00), chr(0x01), chr(0x00), chr(0x00), chr(0x02),
chr(0x02), chr(0x44), chr(0x01), chr(0x00), chr(0x3b)
]

class GaHandler(webapp.RequestHandler):
def getIP(self,remoteAddress):
if remoteAddress == '' or remoteAddress == None:
return ''

#Capture the first three octects of the IP address and replace the forth
#with 0, e.g. 124.455.3.123 becomes 124.455.3.0
res = re.findall(r'\d+\.\d+\.\d+\.', remoteAddress)
if res:
return res[0] + "0"
else:
return ""

def getVisitorId(self, guid, account, userAgent, cookie):
#If there is a value in the cookie, don't change it.
if type(cookie).__name__ != 'NoneType': # or len(cookie)!=0:
return cookie

message = ""

if type(guid).__name__ != 'NoneType': # or len(guid)!=0:
#Create the visitor id using the guid.
message = guid + account
else:
#otherwise this is a new user, create a new random id.
message = userAgent + uuid.uuid1(self.getRandomNumber()).__str__()

m = hashlib.md5()
m.update(message)
md5String = m.hexdigest()

return str("0x" + md5String[0:16])

def getRandomNumber(self):
return random.randrange(0, 0x7fffffff)

def sendRequestToGoogleAnalytics(self,utmUrl):
'''
Make a tracking request to Google Analytics from this server.
Copies the headers from the original request to the new one.
If request containg utmdebug parameter, exceptions encountered
communicating with Google Analytics are thown.
'''
headers = {
"user_agent": self.request.headers.get('user_agent'),
"Accepts-Language": self.request.headers.get('http_accept_language'),
}
if len(self.request.get("utmdebug"))!=0:
data = urlfetch.fetch(utmUrl, headers=headers)
else:
try:
data = urlfetch.fetch(utmUrl, headers=headers)
except:
pass

def get(self):
'''
Track a page view, updates all the cookies and campaign tracker,
makes a server side request to Google Analytics and writes the transparent
gif byte data to the response.
'''
timeStamp = time.time()

domainName = self.request.headers.get('host')
domainName = domainName.partition(':')[0]

if len(domainName) == 0:
domainName = "m.walkerart.org";

#Get the referrer from the utmr parameter, this is the referrer to the
#page that contains the tracking pixel, not the referrer for tracking
#pixel.
documentReferer = self.request.get("utmr")

if len(documentReferer) == 0 or documentReferer != "0":
documentReferer = "-"
else:
documentReferer = urllib.unquote_plus(documentReferer)

documentPath = self.request.get("utmp")
if len(documentPath)==0:
documentPath = ""
else:
documentPath = urllib.unquote_plus(documentPath)

account = self.request.get("utmac")
userAgent = self.request.headers.get("user_agent")
if len(userAgent)==0:
userAgent = ""

#Try and get visitor cookie from the request.
cookie = self.request.cookies.get(COOKIE_NAME)

visitorId = str(self.getVisitorId(self.request.headers.get("HTTP_X_DCMGUID"), account, userAgent, cookie))

#Always try and add the cookie to the response.
d = datetime.datetime.fromtimestamp(timeStamp + COOKIE_USER_PERSISTENCE)
expireDate = d.strftime('%a,%d-%b-%Y %H:%M:%S GMT')

self.response.headers.add_header('Set-Cookie', COOKIE_NAME+'='+visitorId +'; path='+COOKIE_PATH+'; expires='+expireDate+';' )
utmGifLocation = "http://www.google-analytics.com/__utm.gif"

myIP = self.getIP(self.request.remote_addr)

#Construct the gif hit url.
utmUrl = utmGifLocation + "?" + "utmwv=" + VERSION + \
"&utmn=" + str(self.getRandomNumber()) + \
"&utmhn=" + urllib.pathname2url(domainName) + \
"&utmr=" + urllib.pathname2url(documentReferer) + \
"&utmp=" + urllib.pathname2url(documentPath) + \
"&utmac=" + account + \
"&utmcc=__utma%3D999.999.999.999.999.1%3B" + \
"&utmvid=" + str(visitorId) + \
"&utmip=" + str(myIP)

# we dont send requests when we're developing
if domainName != 'localhost':
self.sendRequestToGoogleAnalytics(utmUrl)

#If the debug parameter is on, add a header to the response that contains
#the url that was used to contact Google Analytics.
if len(self.request.get("utmdebug")) != 0:
self.response.headers.add_header("X-GA-MOBILE-URL" , utmUrl)

#Finally write the gif data to the response.
self.response.headers.add_header('Content-Type', 'image/gif' )
self.response.headers.add_header('Cache-Control', 'private, no-cache, no-cache=Set-Cookie, proxy-revalidate' )
self.response.headers.add_header('Pragma', 'no-cache' )
self.response.headers.add_header('Expires', 'Wed, 17 Sep 1975 21:32:10 GMT' )
self.response.out.write(''.join(GIF_DATA))

[/code]

So now we know what to do with our requests at /ga/ when we get them, we just need to make the proper requests to that URL in the first place. So we need to generate the URL we’re going to have the visitor’s browser request in the first place. With normal django, we would be able to use template_context to automatically insert it into the page’s template values. But, since AppEngine doesn’t use that, we have our own helper functions to do that, which I showed some of in my last post. Here’s the updated helper functions, with the GoogleAnalyticsGetImageUrl function included:

[code lang="python"]
import settings

def googleAnalyticsGetImageUrl(request):
url = ""
url += '/ga/' + "?"
url += "utmac=" + settings.GA_ACCOUNT
url += "&utmn=" + str(random.randrange(0, 0x7fffffff))

referer = request.referrer
query = urllib.urlencode(request.GET) #$_SERVER["QUERY_STRING"];
path = request.path #$_SERVER["REQUEST_URI"];

if len(referer) == 0:
referer = "-"

url += "&utmr=" + urllib.pathname2url(referer)

if len(path)!=0:
url += "&utmp=" + urllib.pathname2url(path)

url += "&guid=ON";

return {'gaImgUrl':url}

def getTempalteValues(request):
myDict = {}
myDict.update(ua_test(request))
myDict.update(googleAnalyticsGetImageUrl(request))
return myDict

[/code]

Assuming we use getTemplateValues to set up our inital template_values dict, we should have a variable named ‘gaImgUrl’ in our page. To use it, all we need to do is put this at the bottom of every page on the site:

[code lang="html"]
<img src="{{ gaImgUrl }}" alt="analytics" />
[/code]

My settings file contains the GA_ACCOUNT variable, but replaces the standard GA-XXXXXX-X setup with MO-XXXXXX-X. I’m assuming the MO- tells google that it’s a mobile so accept the proxied requests.

One thing to keep in mind with this technique is that you cannot cache your rendered templates. The image you server will necessarily have a different query string every time, and if you cached it, you would ruin your analytics. Instead, you should cache nearly everything from your view functions, except the gaImgUrl variable.

New Media kills in the Walker’s pumpkin carving contest

Every year, the Walker has a staff halloween party, which includes a departmental pumpkin carving contest. And this isn’t just a carve a grocery store pumpkin contest, it’s a creative, conceptual, re-imagine an artist or artwork pumpkin contest. Invariably, our carpentry shop and registration departments usually blow everyone else out of the water. Those of […]

Every year, the Walker has a staff halloween party, which includes a departmental pumpkin carving contest. And this isn’t just a carve a grocery store pumpkin contest, it’s a creative, conceptual, re-imagine an artist or artwork pumpkin contest. Invariably, our carpentry shop and registration departments usually blow everyone else out of the water. Those of us that are a little less hands-on with the art work tend to be outclassed every year (exhibits 1, 2, and 3). New Media Initiatives never wins.

But not this year.

This year, we had a plan.

Actually, we came up with the plan after our no-show defeat last year, but we smartly held onto it for this year (thank you, iCal). On the day of the contest, we replaced every image of artwork on the Walker website with an image of a pumpkin.

walker homepage with pumpkins

And the rest of the pages (click to embiggen):

Calendar

Calendar

Collections and Resources

Collections and Resources

Artists-in-Residence

Artists-in-Residence

Visual Arts

Visual Arts

Design Blog

Design Blog



We ended up winning in the “Funniest Pumpkin” category.

Because we serve all of our media from a single server using lighttpd, and our files are all uniformly named, we were able to implement a simple rule set in lighty to replace the images. Instead of the requested file, each image was re-directed to a simple perl script that would grab a random jpg from our pool of pumpkin images, and send it’s contents instead. Part of the plan was that we would only serve these images to people visiting our site from inside our internal network. The rest of the world would see our website just as always. In our department, we all unplugged our ethernet cables and ran off of our firewall’d WiFi, which effectively put us outside the network, seeing nothing different on the site. We had a hard time holding back evil cackles as people came to us wondering how our site was hacked, and watching it slowly dawn on them that this was our pumpkin.

The images we used were all the creative commons licensed flickr images of pumpkins I could find. There were 54 of them. Here they are, for credit:

Building the Walker’s mobile website with Google AppEngine, part 1

Over the summer, our department made a small but significant policy change. We decided to take a cue from Google’s 20% time philosophy and spend one day a week working on a Walker-related project of our choosing. Essentially, we wanted to embark on quicker, more nimble projects that hold more interest for our team. The […]

mwalker-iphoneOver the summer, our department made a small but significant policy change. We decided to take a cue from Google’s 20% time philosophy and spend one day a week working on a Walker-related project of our choosing. Essentially, we wanted to embark on quicker, more nimble projects that hold more interest for our team. The project I decided to experiment with was making a mobile website for the Walker, m.walkerart.org.

Reviewing our current site to inform the mobile site

The web framework we use for most of our site has the ability, with some small changes, to load different versions of a page based on a visitor’s User Agent (what browser they’re using). This would mean we could detect if a visitor was running IE on a Desktop or Mobile Safari on an iPhone, and serve each of them two different versions of a page. This is how a lot of mobile sites are done.

This is not the approach we went with for our mobile site, because it violates two of the primary rules (in my mind) of making a mobile website:

  1. Make it simple.
  2. Give people the stuff they’re looking for on their phones right away.

Our site is complicated: we have pages for different disciplines, a calendar with years of archives, and many specialty sites. Rule #1, violated. To address #2, I took a look at our web analytics to figure out what most people come to our site looking for. This won’t surprise anyone, but it’s hours, admission, directions, and what’s happening today at the Walker:

Top Walker Pages as noted by Google Analytics

Top Walker Pages as noted by Google Analytics

So it seems pretty clear those things should be up front. One of the other things you might want to access on a mobile is Art on Call. While Art on Call is designed primarily around dial-in access, there is also a website, but it isn’t optimized for the small screen of a smartphone. We have WiFi in most spaces within our building, so accessing Art on Call via an web interface and streaming audio via HTTP rather than POTS is a distinct possibility that I wanted to enable.

Prototyping with Google AppEngine

I decided to develop a quick prototype using Google AppEngine, thinking I’d end up using GAE in the end, too. Because this was a 20% time project, I had the freedom to do it using the technology I was interested in. AppEngine has the advantage of being something that isn’t hosted on our server, so there was no need to configure any complicated server stuff. In my mind, AppEngine is perfect for a mobile site because of the low bandwidth requirements for a mobile site. Google doesn’t provide a ton for free, but if your pages are only 20K each, you can fit quite a bit within the quotas they do give provide. AppEngine’s primary language is also python, a big plus, since python is the best programming language.

In about two days I built a proof of concept mobile site that displayed the big-ticket pages (hours, admission,etc.) and had a simple interface for Art on Call. Using iUi as a front-end framework was really, really useful here, because it meant that the amount of HTML/CSS/JS I had to code was super minimal, and I didn’t have to design anything.

I showed the prototype to Robin and she enthusiastically gave me the green light to work on it full-time.

Designing a mobile website

A strategy I saw when looking at mobile sites was to actually have two mobile sites: one for the A-grade phones (iPhone, Nokia S60, Android, Pre) and one for the B- and C-grade phones (Blackberry and Windows Mobile). The main difference between the two is the use of javascript and some more advanced layout. Depending on what version of Blackberry you look at, they have a pretty lousy HTML/CSS implementation, and horrendous or no javascript support.

To work around this, our mobile site does not use any javascript on most pages and tries to keep the HTML/CSS pretty simple. We don’t do any fancy animations to load between pages like iUi or jQtouch do: even on an iPhone those animations are slow. If you make your pages small enough, they should load fast enough and a transition is not necessary.

Designing mobile pages is fun. The size and interface methods for the device force you to re-think how to people interact and what’s important. They’re also fun because they’re blissfully simple. Each page on our mobile site is usually just a headline, image, paragraph or two, and some links. Laying out and styling that content is not rocket surgery.

Initially, when I did my design mockups in Photoshop, I wanted to use a numpad on the site for Art on Call, much like the iPhone features for making a phone call. There’s no easy input for doing this, but I thought it wouldn’t be too hard to create one with a little javascript (for those that had it). Unfortunately, due to the way touchscreen phones handle click/touch events in the browser, there’s a delay between when you touch and when the click event fires in javascript. This meant that it was possible to touch/type the number much faster than the javascript events fired. No go.

Instead, the latest versions of WebKit provide with a HTML5 input field with a type of “number”. On iPhone OS 3.1 and better, it will bring up the keypad already switched to the numeric keys. It does not do this on iPhone OS prior to 3.1. I’m not sure how Android and Pre handle it.

Mocked up Art on Call code input.

Mocked up Art on Call code input.

Implimented Art on Call code input.

Implimented Art on Call code input.


Comparing smartphones

Here’s a few screenshots of the site on various phones:

Palm Pre

Palm Pre

Android 1.5

Android 1.5

Blackberry 9630

Blackberry 9630



Not pictured is Windows Mobile, because it looks really bad.

A future post may cover getting all of these emulators up and running, because it’s not as straight easy as it should be. Working with the blackberry emulator is especially painful.

How our mobile site works

The basic methodology for our mobile site is to pull the data, via either RSS or XML from our main website, parse it, cache it, and re-template it for mobile visitors. Nearly all of the pages on our site are available via XML if you know how to look. Parsing XML into usable data is a computationally expensive task, so caching is very important. Thankfully, AppEngine provides easy access to memcache, so we can memcache the XML fetches and the parsing as much as possible. Here’s our simple but effective URL parse/cache helper function:

[python]
from google.appengine.api import urlfetch
from xml.dom import minidom
from google.appengine.api import memcache

def parse(url,timeout=3600):
memKey = hash(url)
r = memcache.get(‘fetch_%s’ % memKey)
if r == None:
r = urlfetch.fetch(url)
memcache.add(key="fetch_%s" % memKey, value=r, time=timeout)
if r.status_code == 200:
dom = memcache.get(‘dom_%s’ % memKey)
if dom == None:
dom = minidom.parseString(r.content)
memcache.add(key="dom_%s" % memKey, value=dom, time=timeout)
return dom
else:
return False
[/python]

Google AppEngine does not impose much of a structure for your web app. Similar to Django’s urls.py, you link regular expressions for URLS to class-based handlers. You can’t pass any variables beyond what’s in the URL or the WebOb to the request handler. Each handler will call a different method, depending if it’s a GET, POST, DELETE, http request. If you’re coming from the django world like me, this is not much of a big deal at first, but it gets tedious pretty fast. If I had it to do over again, I’d probably use app-engine-patch from the outset, and thus be able to use all the normal django goodies like middleware, template context, and way more configurable urls.

Within each handler, we also cache the generated data where possible. That is, after our get handler has run, we cache all the values that we pass to our template that won’t change over time. Here’s an example of the classes that handle the visit section of our mobile site:

[python]
from google.appengine.ext import webapp
from google.appengine.ext.webapp import template
from google.appengine.api import memcache
from xml.dom import minidom
from google.appengine.api import memcache
from utils import feeds, parse, template_context, text
import settings

class VisitDetailHandler(webapp.RequestHandler):
def get(self):
url = self.request.get("s") + "?style=xml"
template_values = template_context.getTempalteValues(self.request)
path = settings.TEMPLATE_DIR + ‘info.html’
memKey = hash(url)

r = memcache.get(‘visit_%s’ % memKey)
if r and not settings.DEBUG:
template_values.update(r)
self.response.out.write(template.render(path, template_values))
else:
dom = parse.parse(url)
records = dom.getElementsByTagName("record")
contents = []
for rec in records:
title = text.clean_utf8(rec.getElementsByTagName(‘title’)[0].childNodes[0].nodeValue)
body = text.clean_utf8(rec.getElementsByTagName(‘body’)[0].childNodes[0].nodeValue)
contents.append({‘title’:title,’body’:body})

back = {‘href’:’/visit/#top’, ‘text’:’Visiting’}
cacheableTemplateValues = { "contents": contents,’back’:back }
memcache.add(key=’visit_%s’ % memKey, value={ "contents": contents,’back’:back }, time=7200)
template_values.update(cacheableTemplateValues)
self.response.out.write(template.render(path, template_values))
[/python]

Dealing with parsing XML via the standard DOM methods is a great way to test your tolerance for pain. I would use libxml and xpath, AppEngine doesn’t provide those libraries in their python environment.

Because the only part of Django’s template system that AppEngine uses is the template language, and nothing else, we have to roll our own helper functions for context. Meaning, if we want to pass a bunch variables by default to our templates, something easy in django, we have to do it a little differently with GAE. I set up a function called getTemplateValues, which we pass the WebOb request, and it ferrets out and organizes info we need for the templates, passing it back as a dict.

[python]
def ua_test(request):
uastring = request.headers.get(‘user_agent’)
uaDict = {}
if "Mobile" in uastring and "Safari" in uastring:
uaDict['isIphone'] = True
if ‘BlackBerry’ in uastring:
uaDict['isBlackBerry'] = True
return uaDict

def getTempalteValues(request):
myDict = {}
myDict.update(ua_test(request))
myDict.update(googleAnalyticsGetImageUrl(request))
return myDict
[/python]

In my next post, I’ll talk about how to track visitors on a mobile site using google analytics, without using javascript.