List Grid

Blogs Media Lab

MW2008 – Search

This session has been great for me, as this is very much where my head is at right now with ArtsConnectEd… My live notes follow: Brian Kelly chairs a session on Search, announcing that with the smaller size both speakers are willing to make this a bit more workshop-like. Terry Makewell starts by introducing his […]

This session has been great for me, as this is very much where my head is at right now with ArtsConnectEd… My live notes follow:

Brian Kelly chairs a session on Search, announcing that with the smaller size both speakers are willing to make this a bit more workshop-like. Terry Makewell starts by introducing his project: 9 partners making up the National Museums Online Learning Project. He goes over some of the goals of the project, and the current state of things, and the realization that some sort of federated search was needed to span the partners’ collections.

How to do the federated search? Multi-institution project meant different technical teams, different technologies, and limited resources in some cases. See the paper for more details, but the two technologies they considered most carefully are OAI/PMH and Opensearch.

OAI, the path we’re going down with ArtsConnectEd, uses a central repository and runs the searches there. Opensearch spools the searches out to each institution and then re-orders them locally and returns the result.

Opensearch fit the project requirements and timeline most efficiently, so that was their choice. He discusses their prototyping effort: scraping search results to generate the RSS for Opensearch. They now have a single page with a configuration file they can drop on each partners’ website and it will “just work”. Potential caveats: what if the search result page changes? Also the Opensearch can only be as fast as the response from the slowest partner.

He shows the working prototype, and I’m excited to see they’ve got thumbnails where available – their scraper must be fairly robust for each partner.

Lessons learned: federated search doesn’t have to be expensive or complicated, and it can work with small and large museums equally well. Their method pushes the work offsite, requiring minimal or no effort on the museum’s part.

(Note to self: end slide show with a kitten and you’re in.)

Q&A – Scalibility issues come up, they’re aware of them coming. Asked if they considered Google Co-Op: yes, but quickly found that Google was unable to deeply crawl many of the partners’ collections due to dynamic urls. Lots of twitter traffic in this session too.

Very interesting debate for me to hear on OAI vs. Opensearch. Many institutions moving towards OAI, but the scope of implementing it is a barrier for most. My feeling that OAI gives more searchable fields is somewhat refuted by the idea that the average user has no interest or knowledge of these fields (culture, era, etc)…

(Mike Ellis shows off by building a co-op search during the session.)

Johan Mhlenfeldt Jensen from the Museum of Copenhagen, Denmark, speaks next on his paper. Trying to catch up, I was distracted for the beginning.

The example he’s showing now exposes some fields for filtering, rather than just keywords. Interesting. Another example showing map-based searching, says it’s immensely popular. Easy to make for photographic collections since the address is known, much harder for other sorts of objects sometimes.

Interesting discussion on “advanced search” – he says studies show it’s minimally used, Google has changed everything. People just want a single field. Hmm… Are we wasting time and overbuilding if we have anything more advanced than a single field?? This is the question I’m banging against as I listen to these speakers.

He asks “is the best the enemy of the good?” Good question. Do we wait forever getting it right? Clearly, no, but how far do we go.

They both have good input on the question I ask about overbuilding: move the advanced search behind the scenes and make it more semantic. Still need the metadata, but don’t ask users to know about it. Also need a way to drill down after search: start with simple search, and then apply filters.

Very good comment on positioning: where and at what point in the process do you expose filters and result counts?

Brian summarizes the importance of getting static URIs for resources: then Google will “just work”…

(Note to self: implement Opensearch for the Walker and ACE)

  • Mike L says:

    Glad to see NMOL doing federated search. I remember it scaring people when it was first brought up as an idea. I’d have recommended a (meta)data warehouse approach if I was still part of that project though – slow responses will not help usability, but the distributed/peer to peer nature of it does fit the requirements. Maybe give the user a choice to cancel parts of a search that are responding slowly (via ajaxy stuff).

  • Terry M says:

    Thanks for the positive post Nate. We really think we are getting somewhere with this. We initially looked at the google co-op possibility but we just weren’t getting back the information we required.

    Mike – pity you’re not still on the project since we could have done with your advice a few times recently. There will be partner time-out limits and caching involved which should help the usability of the final product you’ll be glad to hear. When implemented within WebQuests it will only search over the partners involved in that particular WebQuest. Creative Journeys may well have a ‘remove museum from search’ option as well.

  • Mike says:

    I should probably fess up – I had already created the http://www.museumcollections.org.uk stuff, pre-session… but the point is that we need to be able to out-perform the Google Coop in order to make any of this stuff worthwhile…!

    ps. I *could* have built it mid-session if I’d wanted to.. * cough * ;-)

  • Mike says:

    Just to add to the comment stream: 1) great to see OpenSearch being implemented; 2) getting *anything* done with a consortium of 9 museums is a frightening and gargantuan effort for which the V&A should be applauded 3) That’s why I would have done it with a Google Enterprise 4) Please, please, make your final site and results available with a public API

  • Re. ‘what if the search results page changes?’ – Maybe a microformats type of approach would work here. Ask contributors to use standardised class names in the HTML of their search results, then look for those classes when scraping the page.

    OpenSearch supports output of results as RSS or Atom too. That gives you some of the advantages of OAI, in that you can add your Dublin Core metadata to each item in the feed. We’re trying this now at the National Maritime Museum – publishing RSS feeds extended with terms from the PNDS application profile.

  • Nate Solas says:

    @Jim – That’s a really intriguing idea, to add DC metadata to the OpenSearch result. Having done the work of mapping our collection data to DC, this would be easy and really expand the usefulness of the approach.

    My only concern is with intensive searching and browsing each search comes back to our servers, whereas with OAI it would be running locally. I haven’t thought that through all the way yet, it might not even be a problem.

    I’m planning tomorrow to set up a test installing of the Delphi toolkit from Berkeley to see what it thinks of our metadata — that might be one further argument for an OAI approach so the semantics could be teased out of several collections at once.

  • Mia says:

    Good write-up!

    I’m in the middle of implementing an OAI repository so this is all really relevant (and my ramblings aren’t yet coherent). The repository is a requirement from an external funder, but I’m hoping to use it to get lots of cool stuff in place at the same time. I want it to also serve OpenSearch results, and to act as an index to every instance of our objects in various online projects. We don’t currently have a single catalogue search and the OAI repository is a chance to implement that in a human- and machine-queryable way. It also ties in with our push to create a common basic schema for all our object records, with extensions for objects with more metadata or related media and authority records. It’s a bit daunting but the possibilities are very cool.

    I’ve used OAI before (for Exploring 20th Century London) and it worked well for getting data from a range of collections and content management systems into a single repository that could then be used as the source for the actual website. It did mean an overhead in mapping fields, but we needed to do that anyway for the final website. Having a repository instead of using a federated search also meant we didn’t have to worry about slow response times from partner sites – especially smaller museums who already have a lower visibility in the project because they contribute fewer records.

    On another note, some of the discussions at MW2008 have really made me wonder about well-implemented single search fields vs advanced search. Advanced search might only be used by a small number of specialist researchers (and our research shows they always want that ability) but having well-defined fields means we can quickly build interfaces or mashups over those streams.

    FWIW, I once ended a paper with http://www.laughinglibrarian.com/images/CeilingLibrarian.jpg but it was probably too obscure a cat reference and everyone just looked at me funny. Kittens FTW.

  • Nate Solas says:

    @Mia – I agree about OAI keeping things fast for search since it’s all local. Also I think there’s a lot more “semantic” (or text-mined fakery) data that can be pulled out if all the metadata is in a central spot – relationships can be spotted that would be lost in a federated OpenSearch model. Still, a lot to be said for the ease of use in the latter.

    How are you describing the metadata in OAI? We’re using CDWAlite for our works of art, and a qualified DC for everything else, but it feels a bit like we’re cheating on some of the fields…

  • Mia says:

    @Nate – I’m hoping to push our metadata through a few schemas, because we have to send records to a site with a really limited/general DC simple schema (sorta exposed through their search at http://www.peoplesnetwork.gov.uk/discover/showAdvancedSearchPage.do) but also want to express object records through our core and various project schemas in qualified DC as well as providing GeoRSS or whatever where the data supports it.

    I should be able to give you a link in a few weeks, though the details will be a work in progress for a while longer. I’ll see if I can share our core schema too, though it’s particular to our collections and digitisation projects.

  • Nate Solas says:

    Looks like Google’s been reading this blog and is going to start trying to crawl the “Deep Web” – pages behind forms, or in this case: collection items behind search pages. This could start to solve a lot of issues museums are wrestling with…

  • Giv Parvaneh says:

    Hi Nate,

    This is a great post and there are some really interesting discussions here. I wanted to put in my 2 cents in regards to the federated search issue. I recently wrote an article on how one might attempt to efficiently go about doing a search across museums, “We have OpenSearch, now what?“. Some folks here may find it interesting/useful.