Blogs Media Lab Nate Solas

In college I pursued a double major of Theater and Computer Science which, while adding a bit of time to my education, gave me a taste of left-brain / right-brain synergy that I've sought in my work ever since. I've used my time at the Walker to a) fall in love with the museum sector and b) explore how the web can become a core component in fulfilling an organization's mission statement. Specialties: All things web and cultural heritage, creative thinking and problem solving, team building and hard work with a purpose. [Python, Java, PHP, perl, Postgres, MySQL, Linux, OpenSource everything]

Getting Mobile in the Garden

This summer marks a major milestone for the Minneapolis Sculpture Garden: 25 years as one of the country’s premiere public sculpture parks. The New Media Initiatives department’s contribution to the celebration comes in the form of a brand new website for the garden, a fully responsive “web app” that has been an exciting challenge to [...]

This summer marks a major milestone for the Minneapolis Sculpture Garden: 25 years as one of the country’s premiere public sculpture parks. The New Media Initiatives department’s contribution to the celebration comes in the form of a brand new website for the garden, a fully responsive “web app” that has been an exciting challenge to build.

Opening screen of the web appMap view of the garden

The new site is a radical shift from the static, research-focused 2004 version, and instead becomes an on-demand interpretive tool for exploration in the garden, including an interactive, GPS-capable map, audio tour, video interviews, and short snippets called “fun facts.” One of the most exciting features is the new 25th Anniversary Audio Tour called Community Voices. Last summer we recorded interviews in the garden with community members, first-time visitors, and some local celebrities, and it’s all come together in this tour to present a fantastic audio snapshot of just how special the garden is to people.

Detail view of Spoonbridge and CherryInterpretive media for Spoonbridgegarden_phone_5_sm

The site provides light, casual information “snacking,” with prompts to dive deeper if time and interest allow. It gives visitors a familiar device (their own!) to enhance their visit at their own convenience.

Of course, we didn’t neglect our out-of-state or desktop visitors, but the site’s focus remains on getting people to the garden. For those unable to experience it physically (or for those frigid winter months), the new website provides a browsable interface to familiar favorites and up-to-date acquisitions and commissions.

Behind the scenes

MSG Web Data

Our proof of concept for the site was lean and mean, built quickly using open source tools (leaflet.js) and open data (OpenStreetMap). We didn’t have latitude/longitude positioning info for our public works of art, but as it turned out some kind soul had already added a significant number of our sculptures to OpenStreetMap! We set about adding the rest and knocked together a new “meta API” for the garden that would unify data streams from OSM, our Collections, Calendar, and existing media assets in Art on Call.

Fuzzy GPS

garden_2

Next we began the process of verifying the data. We’d created custom map tiles for the garden so we could maintain the designed look and feel Eric was going for (look for a future post on the design process for this site), but it involved  some compromises to make the paths line up visually. The New Media team spent a few hours walking the garden in the early spring, making notes on sculpture GPS anomalies, misplaced paths, and trying to avoid having anyone appear to be inside the hedges. No two devices gave the exact same GPS coordinates, so we ended up averaging the difference and calling it close enough.

Native-ish

It’s not a web app. It’s an app you install from the web.

As we discovered while building the mobile side of the new Collections site, a properly tuned webpage can start to feel a lot like a native application. We’re using swipe gestures to move between information “slides,” pinch and zoom for the map, and followed most of the tips in the forecast.io blog post (above) to further enhance the experience. We’ll never be quite as snappy as a properly native app, but we feel the cross-platform benefits of the web fully outweigh that downside. (Not to mention our in-house expertise is web-based, not app-based.)

Need for Speed

garden_pagespeed

This was the make-or-break component of the mobile site: if it didn’t “feel” fast, no one would use it. We spent untold hours implementing just-in-time loading of assets so the initial site would by tiny, but then we’d have the images we need just before they were supposed to be on screen. We tuned the cache parameters so anyone who’s visited the site in the past will have the components they need when they return, but we can also push out timely updates in a lightweight manner. We optimized images and spread the map tiles around our Content Delivery Network to prevent a single-domain bottleneck.

Finally, and perhaps foolishly, we wrote a safety fallback that tries to estimate a user’s bandwidth as they load the welcome image: by timing the download of a known-size file, we can make a quick decision if they are on a painfully slow 3G network or something better. In the case of the slow connection we dynamically begin serving half-size images in an effort to improve the site’s performance. We’ll be monitoring usage statistics closely to see if/when this situation occurs and for what devices. Which brings me to…

Analytics

garden_heatmap_sm

I hope I’m right when I say that anyone who’s heard me speak about museums and digital knows how adamant I am about measuring results and not just guessing if something is “working.” This site is no exception, with the added bonus of location tracking! We’re anonymizing user sessions and then pinging our server with location data so we can begin to build an aggregate “heatmap” of popular spots in the garden. Above is a screenshot of my first test walk through the garden.

We’re logging as many bits of information as we can about the usage of the new site in hopes of refining it, measuring success, and informing our future mobile interpretation efforts.

Enjoy!

Please visit the new Minneapolis Sculpture Garden website and let us know what you think!

 

Out with the Dialog Table, in with the Touch Wall

If you’ve explored our galleries, you’ve probably noticed the Dialog Table tucked into the Best Buy Info Lounge just off one of our main arteries. It’s a poster child of technical gotchas: custom hardware and software, cameras, projectors, finicky lighting requirements… Despite all the potential trouble embedded in the installation, it’s actually been remarkably solid [...]

If you’ve explored our galleries, you’ve probably noticed the Dialog Table tucked into the Best Buy Info Lounge just off one of our main arteries. It’s a poster child of technical gotchas: custom hardware and software, cameras, projectors, finicky lighting requirements… Despite all the potential trouble embedded in the installation, it’s actually been remarkably solid apart from a few high-profile failures. At last tally, only the CPUS, capture cards, and one graphics board are original; the rest has been slowly replaced over the years as pieces have broken. (The graphics and video capture cards have drivers that aren’t upgradable at this point, so I’ve been trolling eBay to acquire various bits of antique hardware.)

It’s been a tank. A gorgeous, ahead-of-its-time, and mostly misunderstood tank. I’m both sad and excited to see it go.

I am, however, unequivocally excited about the replacement: two 65″ touch walls from Ideum. This change alone will alleviate one of the biggest human interface mis-matches with the old table: it wasn’t a touch surface, and everyone tried to use it that way.

presenter1

Early meeting with demo software

We’re moving very quickly with our first round of work on the walls, trying to get something up as soon as possible and iterating from there. The immediate intention is to pursue a large-scale “big swipe” viewer of highlights from our collection. Trying to convey the multidisciplinary aspect of the Walker’s collection is always a challenge, but the Presenter wall gives us a great canvas with the option for video and audio.

prsenter2

The huge screen is an attention magnet

With the recently announced alpha release of Gestureworks Core with Python bindings, I’m also excited for the possibilities of what’s next for the walls. The open source Python library at kivy.org looks like a fantastic fit for rapidly developing multi-touch apps, with the possible benefit of pushing out Android / iOS versions as well. At the recent National Digital Forum conference in New Zealand I was inspired by a demo from Tim Wray showing some of his innovative work in presenting collections on a tablet. We don’t have a comprehensive body of tags around our work at this point, but this demo seems to provide a compelling case for gathering that data. Imagine being able to create a set of objects on the fly showing “Violent scenes in nature” just from the paired tags “nature” and “violent”. Or “Blue paintings from Europe” using the tag “blue” and basic object metadata. Somehow the plain text description imposed on simple tag data makes the set of objects more interesting (to me, anyway). I’m starting to think that collection search is moving into the “solved” category, but truly browsing a collection online… We’re not there.

Touch screens, and multitouch in particular, seem destined for eventually greatness in the galleries, but as always the trick is to make the technical aspect of the experience disappear. I hope by starting very simply with obvious interactions we can avoid the temptation to make this about the screens, and instead about the works we’ll be showing.

Optimizing page load time

We launched the new walkerart.org late on December 1, and it’s been a great ride. The month leading up to (and especially the preceding week starting Thanksgiving Day, when I was physically moving servers and virtualizing old machines) was incredibly intense and really brought the best out of our awesome team. I would be remiss [...]

We launched the new walkerart.org late on December 1, and it’s been a great ride. The month leading up to (and especially the preceding week starting Thanksgiving Day, when I was physically moving servers and virtualizing old machines) was incredibly intense and really brought the best out of our awesome team. I would be remiss if I didn’t start this post by thanking Eric & Chris for their long hours and commitment to the site, Robin for guiding when needed and deflecting everything else so we could do what we do, and Andrew and Emmet for whispering into Eric’s ear and steering the front-end towards the visual delight we ended up with. And obviously Paul and everyone writing for the site, because without content it’s all just bling and glitz.

Gushy thanks out of the way, the launch gave us a chance to notice the site was a little … slow. Ok, a lot, depending on your device and connection, etc. Not the universally fast experience we were hoping for. The previous Walker site packed all the overhead into the page rendering, so with the HTML cached the rest would load in under a second, easy. The new site is heavy even if the HTML is cached. Just plain old heavy: custom fonts, tons of images popping and rotating, javascript widgets willy-nilly, third-party API calls…

Here’s the dirty truth of the homepage when we kicked it out the door December 1:

12/1: 2.6 MB over 164 requests. Load times are pretty subjective depending on a lot of things, but we had good evidence of the page taking at least 4+ seconds from click to being usable — and MUCH longer in some cases. Everyone was clearly willing to cut us some slack with a shiny new site, but once the honeymoon is over we need to be usable every day — and that means fast. This issue pretty quickly moved to the top of our priority list the Monday after launch, December 5.

The first thing to tackle was the size: 2.6 MB is just way too big. Eric noticed our default image scaling routine was somehow not compressing jpgs (I know, duh), so that was an easy first step and made a huge difference in download size.

12/5: 1.9 MB.

On the 6th we discovered (again, duh) lossless jpeg and png compression and immediately applied it to all the static assets on the site, but not yet to the dynamically-generated versions. Down to 1.8 MB. We also set up a fake Content Delivery Network (CDN) to help “parallelize” our image downloads. Modern browsers allow six simultaneous connections to a single domain, so by hosting all our images at www.walkerart.org we were essentially trying to send all our content through one tiny straw. Chris was able to modify our image generator code to spread requests across three new domains: cdn0.walkerart.org, cdn1, etc. This bypasses the geography and fat pipe of a real CDN, but does give the end user a few more straws to suck content through.

Requests per Domain

www.walkerart.org 26
cdn1.walkerart.org 24
cdn0.walkerart.org 24
cdn2.walkerart.org 21
f.fontdeck.com 4
other 7

 

By the 8th we were ready to push out global image optimization and blow away the cache of too-big images we’d generated. I’m kind of astounded I’d never done this on previous sites, considering what an easy change it was and what a difference it made. We’re using jpegoptim and optipng, and it’s fantastic: probably 30% lossless saving on already compressed jpegs and pngs. No-brainer.

12/8: 1.4 MB, almost half of what we launched with.

Next we needed to reduce the number of requests. We pushed into the second weekend with a big effort to optimize the Javascript and CSS. Earlier attempts using minify had blown up and were abandoned. Eric and Chris really stepped up to find a load order that worked and a safe way to combine and compress files without corrupting the contents. Most of the work was done Friday, but we opted to wait for Monday to push it out.

Meanwhile, I spent the weekend pulling work from the client’s browser back to the server where we could cache it site-wide. This doesn’t really impact bytes transferred, but it does remove a remote API call, which could take anywhere from a fraction of a second (unnoticeable) to several seconds in a worst-case scenario (un-usable). This primarily meant writing code to regularly call and cache all of our various Twitter feeds and the main weather widget. These are now served in the cached HTML and it’s negligible in the load time, instead of 200+ ms on average. It all adds up!

 

CSS Sprite for Header and Footer nav images (it has a transparent background, so it’s supposed to look like that):

 

 

So Monday, 12/12, we pushed out our first big changes to address the number of queries. Eric had combined most of the static pngs into a CSS Sprite, the javascript and CSS were reduced to fewer files, and the third party APIs were no longer called in the browser. Really getting there, now.

12/12: 1.37 MB, and 125 requests

Happily (as I was writing this) Eric just pushed out the last (for now) CSS sprite, giving us these final numbers:

12/13: 1.37 MB, and 110 requests! (down 53% and 67% respectively)

This isn’t over, but it’s gotten to the point of markedly diminishing returns. We’re fast enough to be pretty competitive and no longer embarrassing on an iPad, but there are a few more things to pick off around the edges. We’re heavier and slower than most of our museum peers, but lighter and faster than a lot of similar news sites. Which one are we? Depends which stats I want to compare. :)

We used the following tools to help diagnose and prioritize page loading time issues:
http://tools.pingdom.com/fpt/
https://developers.google.com/pagespeed/
http://developer.yahoo.com/yslow/

 

 

 

 

Museums and the Web 2011 recap

I shared a ride to the airport with some colleagues who had very different takeaways from the conference than I did, so it’s clear there wasn’t a universal message. Everyone picks and chooses the ideas that might apply to what they’re working on.  Here’s what stood out to me: Cast wider nets: organize, filter, present. [...]

I shared a ride to the airport with some colleagues who had very different takeaways from the conference than I did, so it’s clear there wasn’t a universal message. Everyone picks and chooses the ideas that might apply to what they’re working on.  Here’s what stood out to me:

Open Graph ProtocolCast wider nets: organize, filter, present.

Just as we’re getting good at putting our content online and connected internally, we’re starting to realize that’s not good enough. We need to connect more dots for our visitors: show related content not just from our institution, and not just from other institutions in the sector, but the entire web. We’re still a trusted source dealing with authoritative information, but we’re now expected to use that authority to interpret and present more than just our own content.

Part of this includes opening up our content in return so that we can be part of someone else’s related content. This includes OpenGraph markup (FaceBook, etc), simple machine readable versions, and above all: sort out our licensing and make it easy to understand what can be shared and how!

Standardize access, not content.

There was some of the usual hand-wringing over metadata formats and authorities, but also some new ideas on skirting that hurdle rather than jumping it. While everyone agrees we need to continue to work towards clean, linked, open data using shared authorities, there are a number of steps we can take right now that can potentially have a great impact.

Namely, what if we standardize the access to the data, rather than the data itself? Rather than building another API (although we’re still going to), we can provide similar and simpler functionality right now. (In an afternoon, if my impassioned rant is to be believed!  :)  Details to follow.)

Stop inventing. Iterate.

A great demo (early beta here: http://trope.com/miami/) was given in the unfortunate timeslot of 8am on Saturday morning. The Art in Public Places project by Miami-Dade County is, to quote @minxmertzmomo: “a great example of doing the obvious thing excellently”. There is a tendency to try to solve our shared problems in a unique way with a special and clever twist (guilty!), when instead we should be choosing best-practices from working solutions and applying them in an un-complicated way. To reach higher we need to stand on others’ shoulders instead of building our own stepladders.

Tate Collection OnlineDon’t finish building the wrong site.

James Davis from the Tate presented a great paper describing the process they’ve taken to launch the new (also beta) version of their Collections site: http://beta.tate.org.uk/art/explorer. The paper is a lovely narrative exploring the issues we face when development takes years and we must constantly remind ourselves to not finish building what we started building, but instead what it’s become along the way.

Summary

For me the conference provided a great summary of the latest innovations and thinking of museums online, and affirmed for me many of the choices and directions we’re taking in our current relaunch project. It was fantastic to see old friends and make new ones, and hopefully set the stage for future collaborations. I’ve also got a growing list of stuff to steal (er, shoulders to stand on.. :). Fantastic stuff all around!

 

User testing using paper prototypes

A few years ago I was trying to explain the concept of “fail early, fail often” to someone, and failing.  (see what I did there?  ;-)  They didn’t understand why you just wouldn’t take longer to build it right the first time. Now that we’re deep in the process of redesigning our website, I am [...]

A few years ago I was trying to explain the concept of “fail early, fail often” to someone, and failing.  (see what I did there?  ;-)  They didn’t understand why you just wouldn’t take longer to build it right the first time.

Now that we’re deep in the process of redesigning our website, I am starting to see the real danger in that sort of thinking.  Despite all our best intentions, we’ve fallen into a trap of thrashing back and forth around certain ideas – unable to agree, unwilling to move forward until we “solve it”, and essentially stuck in the same cycle illustrated in this cartoon.

Click for the whole cartoon (scroll down a bit)

To try to help break the recent impasse on site navigation, we’re doing some simple user testing using paper prototypes of several ideas.  These are meant to be rough sketches to essentially pass/fail the “do they get it?” test, but they’re also giving us a ton of valuable little hints into how people see and understand both our website and our navigation.

An example of some paper prototypes for the navigation. (Don't worry, it's just a rough idea and one of many!)

Our basic process so far is to ask people (non-staff) for first impressions of the top nav: does it make sense?  Do they think they know what they’ll get under each button?  Then we show the flyouts and see if it’s what they expected.  Anything missing?  Anything doesn’t meet their expectations?  Finally we ask a few targeted “task” questions, like “where would you look if you wanted information about n work of art you saw in the galleries?”

Even this simple round of testing has revealed some clearly wrong assumptions on our part.  By fixing these things now (failing early) and iterating quickly, we can do more prototypes and get more feedback (failing often).  I’ll try to post updates as we proceed.

PS — Anyone else doing paper prototypes like this?  I think we all know we’re “supposed” to do quick user testing, but honestly this is the first time in years we’ve actually done something like it.

Behind-the-scenes of ArtsConnectEd: Art Finder

On September 1, 2009 the new ArtsConnectEd became available at ArtsConnectEd.org.  The new site provides access to more than 100,000 museum resources, including audio, video, images, and information about works of art, all of which can be saved and presented with the more powerful Art Collector. This project was at least three years in the [...]

On September 1, 2009 the new ArtsConnectEd became available at ArtsConnectEd.org.  The new site provides access to more than 100,000 museum resources, including audio, video, images, and information about works of art, all of which can be saved and presented with the more powerful Art Collector.

This project was at least three years in the making, with the last two of those being the technical work of research, design, and development.  In this series of posts I’d like to present some of the decisions we struggled with and the process we went through in developing the new site.  I’ll start with the Art Finder, followed by a post on the Art Collector and presentations, and finish with a post about some of the more technical aspects including the data and harvesting technologies we’re using.

Art Finder

The Art Finder is the guts of the site, a portal into our thousands and thousands of objects, text records, and more.  I don’t think it’s an exaggeration to say designing and building this component was the biggest challenge we faced in the entire process.  We’ve redesigned the interface many times, often significantly, and are still not certain it’s right.  We’ve changed the underlying technology from a SQL / Lucene hybrid to a straight-up Solr search engine.  We’ve debated (endlessly) what fields to include, and what subset of our data to present in those fields.  We’ve gone back and forth over tab titles, and even whether to use tabs.  A rocky road, to say the least.

The big idea

What if we could start with everything and narrow it down from there?  Offer the user the entire collection and let them whittle away at it until they found what they wanted?

It’s all browse.  Keyword is just another filter.

To me this is the big breakthrough of the ArtsConnectEd interface.  We don’t hide the content behind a search box, or only show filters after you try a keyword.  We don’t have a separate page for “Advanced Search”, but we offer the same power through filters.  There is still a keyword field for those who know exactly what they’re looking for, but we get to use our metadata in a more powerful way than simple text.  That is, since know the difference between the word “painting” appearing in the description and something that is a painting, we can present that to the user through filters.

How we Got here

browse_wireframeWe wanted many ways for the user to explore the collection, with the idea we might hopefully mimic some of the serendipity of exploring a gallery.  The tech committee felt early on that we’d need, in addition to a robust search, some way to freely browse.  Our initial attempt was to split the Art Finder into a Browse interface (left) and a Search interface (right).search_wireframe

After forcing users to choose a content type to browse (Object, Text, etc), we exposed facets (fields) to allow filtering, e.g. by Medium or Style.  These facets were hidden by default in the Search interface, where instead you started with a keyword and content type as tabs — but could then click to reveal the same browse filters!  The more we played with these two ideas, the more we realized they were essentially the same thing, the only difference being a confusing first step and then having to learn two interfaces.  The real power of the site was in combining them, committing fully to Browse, and adding the keyword search as a filter.

Lastly, as we harvested more of our collections we realized pushing filters to the front offered a better way to drill down when many of our records are not text-heavy and thus less findable via keyword search.  In many ways browse leveled the playing field of our objects between those with healthy wall labels and those with more sparse metadata.

fact_discovery

What works

(In my humble opinion!)  A good browse has to do a few things:

  • Be fast. Studies have shown that slow search (or browse) results derail a user’s chain of thought and makes it difficult to complete tasks.  We went one step further and did away with the “Go” button for everything but keyword – making a change to a pulldown automatically updates your result set.  (It’s not instant, but it’s fast enough the action feels connected to the results)
  • Reduce complex fields to an intuitive subset. We have a huge range of unique strings for the Medium field, but we’ve broadly grouped them to present a reasonable-sized pulldown.  Likewise for the Culture pulldown.  (We manually reduce the terms for Medium, and have a automated Bayesian filter for the Culture field)
  • Have good breadcrumbs. Users need to know what options are in effect and be able to backtrack easily.
  • Avoid dead ends. With many interfaces it’s entirely too easy to browse yourself into an empty set.  By showing numbers next to our filter choices, we can help users avoid these “dead ends”.
  • Expose variety. Type “Jasper Johns” in the artist field, and check out the Medium pulldown: it shows the bulk of his work is in Prints, but we also have a few sculptures, some mixed media, etc.  A nice way to see the variety of an artist’s work at-a-glance.
  • Autocomplete complicated fields. If a search box is targeted to a field (like our Artist box), it needs to autocomplete.  Leaving a field like this open to free text is asking for frustration as people get 0 results for “Claes Oldenberg“. (Auto-suggest “did you mean” should also work!)
  • Have lots of sort options. One of my favorite features of the new Art Finder is the ability to sort by size.  Super cool.  (check out the Scale tab in the detail view for more fun!)

I’m biased after this project, but I’m fairly convinced combining faceted browsing with keyword search is absolutely the way to go for collection search.  It gives the best of both worlds, powerful but still intuitive.

facets_1

What could be better

… but is it really intuitive?  People seem to still be looking for a big inviting search box to start with.  The interface is crowded, and the number of options looks intimidating.  We’ve ended up avoiding using the words “Search” and “Browse” because they were loaded and causing confusion.  We’ve tried many versions of the tab bar to try to clarify what filters apply globally (e.g. Institution) and which only effect that tab (Works of Art have an Artist, for instance), but I don’t believe we’ve solved it.

I think the two components of the interface that give us the most trouble and confusion are actually the “Has Image” checkbox and the “Reset All” button.  These are consistently missed by people in testing, and we have tried almost everything we can think of.  Oh, and the back button.  The back button is “broken” in dynamic search like this.

Also, while I really like the look of the tiles in the results panel, we’ve had to heavily overload the rollover data to show fields we can sort by since there’s no more room in the tiles.  We also intended to create alternative result formats, such as text bars, etc, which could show highlights on matching keywords, but this item was pushed back for other features.

We’ve defaulted to sorting alphabetically by title when a user first reaches the page, and I’m no longer sure this is best.  As we’ve populated the collections in ArtsConnectEd we’ve ended up with a bunch of works that have numbers for titles, make the alpha sort less obvious.

You tell me!  Give the site a spin and post a comment – what works, and what could be better?

Resources:

  • Designing for Faceted Search (http://www.uie.com/articles/faceted_search/)
  • Faceted Search: Designing Your Content, Navigation, and User Interface (http://www.uie.com/events/virtual_seminars/facets/FacetedSearchVS35Handout.pdf)
  • Faceted Search (http://en.wikipedia.org/wiki/Faceted_search)
  • Best Practices for Designing Faceted Search Filters (http://www.uxmatters.com/mt/archives/2009/09/best-practices-for-designing-faceted-search-filters.php)
  • V&A Collections (beta) (http://www.vam.ac.uk/cis-online/search/?q=blue&commit=Search&category%5B%5D=5&narrow=1&offset=0&slug=0)
    • Their facets aren’t as up front as I’d like (you have to start with a keyword), but they’re done really well once they show up.
    • You can also cheat and leave keyword blank to get a full browse and go right to the facets…  Maybe start here?
  • MOMA Collections (http://www.moma.org/collection/search.php)
    • Nice presentation of facets, but I wish two things: show me a number next to all constraints, not just artists, and let me add a keyword.  (I got a dead end looking for on-view film from the 20s or 2000s)  I also like that it’s a true browse – leaving everything at “All” seems to give me the whole collection.

Some thoughts on preserving Internet Art

We’re in the process of retiring our last production server running NT and ColdFusion (whew!), and this means we needed to get a few old projects ported to our newer Linux machines.  The main site, http://aen.walkerart.org/, is marginally database-driven: that is, it pulls random links and projects from a database to make the pages different [...]

aenWe’re in the process of retiring our last production server running NT and ColdFusion (whew!), and this means we needed to get a few old projects ported to our newer Linux machines.  The main site, http://aen.walkerart.org/, is marginally database-driven: that is, it pulls random links and projects from a database to make the pages different each time you load.  The admin at the time was nice enough to include MDB dump files from the Microsoft Access(!) project database, and the free mdbtools software was able to extract the schema and generate import scripts.  Most of this page works as-is, but I had to tweak the schema by hand.

After the database was ported to MySQL, it was time to convert the ColdFusion to PHP.  (Note: the pages still say .cfm so we don’t break links or search engines – it’s running php on the server)  Luckily the scripts weren’t doing anything terribly complicated, mostly just selects and loops with some “randomness” thrown in.  I added a quick database-abstraction file to handle connections and errors and sanitize input, and things were up and running quickly.

… sort of.  The site is essentially a repository of links to other projects, and was launched in February 2000.  As you might imagine there’s been some serious link rot, and I’m at a bit of loss on how to approach a solution.  Steve Dietz, former New Media curator here at the Walker, has an article discussing this very issue here (ironically mentioning another Walker-commissioned project that’s suffered link rot.  Hmm.).

One strategy Dietz suggests is to update the links by hand as the net evolves.  This seems resource-heavy, even if a link-validating bot could automate the checking — someone would have to curate new links and update the database.  I’m not sure we can make that happen.

It also occurred to me to build a proxy using the wayback machine to try to give the user a view of the internet in early 2000.  There’s no API for pulling pages, but archive.org allows you to build a URL to get the copy of a page closest to a specific date, so it seems possible.  But this is tricky for other reasons – what if the site actually still exists?  Should we go to the live copy or the copy from 2000?  Do we need to pull the header on the url and only go to archive.org if it’s a 404 to 500?  And what if the domain is now owned by a squatter who returns a 200 page of ads?  Also, archive.org respects robots.txt, so a few of our links have apparently never been archived and are gone forever.  Rough.

In the end, the easy part was pulling the code to a new language and server – it works pretty much exactly like it did before, broken links and all.  The hard part is figuring out what to do with the rest of the web…  I do think I’ll try to build that archive.org proxy someday, but for now the fact it’s running on stable hardware is good enough.

Thoughts?  Anyone already built that proxy and want to share?

Build a bridging firewall (cheap!)

New Media has a number of development servers located in-house where we get stuff done before releasing it out into the wild.  Until last week these were protected by an aging OpenBSD firewall running packet filter and all was well until midweek when the motherboard failed.  Not having a spare on hand, I was scrambling [...]

New Media has a number of development servers located in-house where we get stuff done before releasing it out into the wild.  Until last week these were protected by an aging OpenBSD firewall running packet filter and all was well until midweek when the motherboard failed.  Not having a spare on hand, I was scrambling for a solution.

Linksys wireless router

Linksys wireless router

Being familiar with the dd-wrt project, I was pretty sure I could build a firewall out of a Linksys router.  We went with the WRT54GL, currently as cheap as $50 on Amazon.  (We bought local so we’d have it sooner, and it was a bit more).

The first step after flashing the firmware with the latest dd-wrt build (v24-sp2) was to take off the antennas and turn off the radio.  The last thing I want for the firewall is to be broadcasting an SSID and allow wireless associations.  This actually requires a startup script on the router, with a line to remove the wireless module so it won’t try to reenable itself:

wl radio off
wl down
rmmod wl

Good start.  Next I needed to bridge the WAN port with the LAN ports, which ended up being a struggle until I found the easy options in the dd-wrt GUI.  First, set the LAN to use a static IP and make sure you can connect via another machine to configure it.  You’ll also need to enable SSH access and remote configuration – but be sure to lock this down once the firewall is running!

Once you have the LAN configured, you need to set the WAN connection type to “disabled”.  This will give you a checkbox to bridge the LAN and WAN:  “Assign WAN port to switch”.  Lastly, under Advanced Routing set the Operating Mode to “Router” so it stops trying to do NAT.  Apply these settings, and you’ll basically have an expensive dumb switch – all traffic shows up on every port, and there’s no logic at all.  We’re halfway there.

Being unfamiliar with iptables (we use OpenBSD and pf for firewalls around here), I was under the impression that iptables rules would work in a bridging environment.  This is not the case: bridged packets don’t reach iptables at all!  The best I could do was block everything (manual restart needed), or otherwise blow up the configuration (manual restart needed) as I tried to mess with the bridge.  This was an incredibly frustrating learning curve as everything I could find made it sound like this was the way to configure a firewall in Linux, but it just wasn’t working.

Note to keep you sane: don’t do any of this testing in the startup scripts or you’ll brick your router, guaranteed.  Do it all from the command line with a known-good startup.  That way it’s a simple (but annoying) power cycle to get things back up.

The trick, it turns out, is a kernel module called ebtables.  Luckily, this is included in the dd-wrt build, but it’s not turned on by default!  Add this to your startup script:

insmod ebtables
insmod ebtable_filter
insmod ebt_ip.o

And, ta-da, all your iptables rules will start impacting packets!  Now it’s just a matter of configuring the firewall rules.  We’re using something like this:  (vlan0 represents the LAN ports, and vlan1 is the WAN port)

# drop everything by default:
iptables -P FORWARD DROP
# clear the old rules:
iptables -F FORWARD
# forward stuff that's established already
iptables -A FORWARD -m state --state ESTABLISHED,RELATED -j ACCEPT
# let connections out:
iptables -A FORWARD -i vlan0 -m state --state NEW -j ACCEPT

# firewall access rules
iptables -F INPUT
# WAC ips can get to fw:
iptables -A INPUT -p tcp -d 1.2.3.4 -s 4.3.2.1/24 -j ACCEPT
# drop everything else!
iptables -A INPUT -p tcp -d 1.2.3.4 -j DROP

# ... snipped all the actual access rules and packet flood protection ...

The only trick here is the last few lines which limit access to the firewall machine itself.  We can’t use the FORWARD rules since these packets are destined for the internal hardware and not forwarded, but we do need to limit access via the INPUT chain.  In this example the firewall has IP 1.2.3.4 and the network I want to access it from has 4.3.2.x.  That way I can leave the firewall’s remote access turned on and limit it to our network.  (because there’s no terminal access you can’t make it a truly transparent bridge or you’d never be able to change the config!)

I admit I’m a bit nervous posting some of this in case there’s a glaring security hole, but it seems good to me.  Anyone see anything they’d like to warn me about before we get hacked?

And there you have it!  For the cost of a cheap router and some time (not much, since you can just follow these notes!) you have a full-featured bridging firewall running on dedicated hardware.  With a little extra work it would be easy to get VPN running and much more…  I’m hoping for years of service from this little guy!

( Hat tip another DIY firewall solution that I’d really like to try someday. )

Do one thing in April…

… blog about it in May! Museums and the Web 2009 wrapped up with a challenge to all the inspired delegates: use the energy and ideas generated here to get one thing done in April.  (The idea being that many small steps build momentum, and it’s too easy to ignore the small upgrades we should [...]

… blog about it in May!

onview

Museums and the Web 2009 wrapped up with a challenge to all the inspired delegates: use the energy and ideas generated here to get one thing done in April.  (The idea being that many small steps build momentum, and it’s too easy to ignore the small upgrades we should constantly be pushing out.)

Yesterday I pushed out a few small upgrades to our aging collection site:

You can now limit your search to objects that are On View

What works by Dan Flavin can you come see right now?

browser_searchOpenSearch capable

Can’t get enough of our collection?  Add it to your browser’s built-in search box!  When you’re on the Collection site, you should be able to pull down your browser’s search field and add “Walker Art Center”.

Developers (Piotr!): you can now use the Walker collection in your Yahoo Pipes tool without having to scrape the results!  Not an API (yet), but a good step.  Check out the XML for ideas.

Bring it all together:

You’re a busy person.  You’d love to come see Chuck Close’s Big Self-Portrait, and you know the Walker’s got it in their collection, but you see it’s not on view.  You don’t have time to check our website every day, so how will you ever know when it goes on display?  Easy:  build a search that finds Big Self-Portrait, then turn on the “On View” flag.  The object disappears (not on view), but you can subscribe to the OpenSearch RSS feed for this query (click the rss icon).  Now, when Big Self-Portrait is available to see in the galleries, the object will show up in your RSS reader!  (note: I picked this painting randomly.  I make no guarantee about seeing it in the galleries any time soon.  :)

So, baby steps.  Get one things done that opens more doors.

#didonethinginapril (I tag Andrew at the MIA to get one thing done in May!)

MW2009 – Technology Strategies

Charlie Moad (developer at IMA) kicks off the session with a discussion of cloud computing, the advantages and disadvantages.  One of his most compelling arguments in a non-technical sense is the incredible energy efficiency of these large data centers: their cooling system and power use are at levels we can’t approach in our co-located server [...]

Charlie Moad (developer at IMA) kicks off the session with a discussion of cloud computing, the advantages and disadvantages.  One of his most compelling arguments in a non-technical sense is the incredible energy efficiency of these large data centers: their cooling system and power use are at levels we can’t approach in our co-located server rack. Google is approaching a 1.1:1 ratio of cooling to power consumption. They’ve recently documented their cooling and datacenter practices here.

Other advantages Charlie mentioned for using Cloud computing:

  • Scalability
  • Pay as you go. This is the big benefit. You use what you need when you need it, also helping the efficency.
  • No hardware to administer. No downtime. This makes sysadmins very happy.

Some disadvantages are:

  • Security. (Not sure on this… don’t recall amazon or google having any big issues with security. This is in the hands of us doing their jobs and setting proper permissions.)
  • Portability. AWS and Google App Engine (GAE) are proprietary systems. GAE has more issues in this realm than AWS.

One other thing to note about Google App Engine that Charlie didn’t mention is that GAE is a spec, and from what I’ve heard from various python people, Google very much wants it to be implemented by others. There is already an open source implementation of AppEngine called AppScale. And Joyent has an implementation called ReasonablySmart.

IMA is using Amazon Web Services (AWS) for hosting ArtBabble. A simple breakdown of their usage thus:

  • EC2 instances for transcoding video
  • S3 and CloudFront for storing video and media files (images/js/etc)
  • Wowza streaming server running on EC2 for streaming video
Cloud computing structure for ArtBabble

Cloud computing structure for ArtBabble

Charlie had a nice slide I don’t remember being in the paper: a diagram of where these services sit in the cloud (storage vs service) and what the end user’s browser is actually talking to at any time. It sounds like changing the number of wowza instances is still a manual process, but I imagine it could be automated.

The stats are impressive: 40,000 video views since launch 9 days ago, and 3,500 registered users.  They’re cleverly using Google / Yahoo sign-ins to create OpenID accounts, without telling people it involves OpenId.  Uptake is much higher by hiding the technology on this process…  Also impressive is the cost, or lack thereof: they’re able to run ArtBabble for the same cost as their internal website.

Charlie closes by mentioning a few recent advances in Amazon’s hosting that allows essentially pre-paying for a year’s service at a much discounted rate.

I think I’m not the only webmaster in the audience who is thinking “we have to move our sites into the cloud,” but also concerned about finding the time to do so.  This paper and presentation have gone a long way towards answering some questions I haven’t been able to research fully.

Jusitn Heideman also contributed to this post.

Next