New Media Initiatives Blog

Technology at the Walker Art Center

Part of: blogs.walkerart.org

 
by Nate Solas at 10:03 am 2008-04-16
Filed under:

Some great conversation happening in the comments of my writeup of the Search session at MW2008, and it made me remember something I wanted to bring up at the conference but forgot. Namely, the concept of “master metadata”, or the idea that there’s one authoritative version of the metadata describing an object.

This came up for me in the session the MFA and MIT did on sharing their data for a new subsite: they mentioned the data was being “augmented” on the final site, and that someday they’d be interested in getting this extra information back into their main repository.

The problem’s immediately obvious: with all of the proposed sharing and opening up of our data, presumably to allow others to weigh in on it and add their voice, there are often situations where institutions would like to have some of this new data. For instance, we’re building a new version of ArtsConnectEd and intend to allow museum educators to variously tag, comment on, and draw relationships between objects. This will almost certainly be “good data”, stuff that would be valuable to integrate in our internal collection database.

The question is, how? Once your data is available for sharing, and someone actually builds something good with it and enhances it, is there a way to get that new data back into the source? Is there / should there be a way to tag metadata as “original source” or “augmented”? Should we be asking anyone harvesting our data to push back their changes for us to audit and possibly include?

Anyone solved this? Seb, are you getting info back from Flickr Commons you can then add to your internal database? Phil / Jenna, any thoughts on how to get that extra data back?

 

8 Comments

  1. Hi,

    You are on the right track, as far as I can tell. We need to plan for the handling of non-authoritative “categories” of meta data. I think we all agree on the potential for tags, comments, and other user-contributed data to add value - bolstering folksonomy, adding to scholarship, helping to create meaning, etc. But, I have not done the work of planng for the handling of this. This time constraint also keeps us from doing the important work of creating our own thesaurus, lexicon, or semantic framework - the kind of foundational requirement for using delphi you elude to.

    I’ll admit to hoping we will work together to build these blocks. I feel that such things are finally emerging as useful results of so many ongoing projects. Also, we need to pay close attention to the methods and tools that the big players, like Flickr, Yahoo!, Google, etc.

    more later, but very cool thread, thanx,

    - Phil

    Comment by Phil Getchell — 4/16/2008 @ 10:41 am

  2. You probably need three labels for tags - institutional, user-generated, and machine-generated (for things like OpenCalais).

    There’s some work in the UK on this, I’ll try and track down a reference. Also, I’m sure I read that London’s Transport Museum are going to incorporate user-generated content back into their collections management system but I can’t find it again.

    Comment by Mia — 4/16/2008 @ 1:19 pm

  3. Hey Nate

    I’m not sure why we need to be concerned with the ‘right’ metadata. Surely ‘right’ is just a matter of who the user is - and now that we’ve moved defiantly away from JUST catering for scholarly users there are a lot of different categories of ‘right’.

    That’s been one of the joys of our implementation of OpenCalais - suddenly we have a whole lot of different metadata that IS ‘right’ but just not in the ways any of our museum staff - registrars, curators or even the web team - would have thought about our collection. This is because OC is applying a financial markets eye to our content - but this has great value in terms of allowing the browsing of our collection by company name for example.

    Likewise, our Commons on Flickr experience (http://www.flickr.com/photos/powerhouse_museum/) has been very much that Flickr users add more tags and different tags to the same content that has been available (and taggable) on our own site for a long time. And yes, we can import them back with the Flickr API.

    Our own tagging experience is one of peaceful coexistence . . . . user terms can coexist with formal museum classifications and enhance discovery and cross-search.

    Mia - on the LTM example I will be quite surprised if they get UGC at any significant volume. They have set the barriers to participation quite high with requirements for email addresses and user details etc. Remembering that even tagging on the PHM collection - 15million views in 2007, 5thousand tags . . . - and that is without requiring ANY form of login.

    Seb

    Comment by Seb Chan — 4/19/2008 @ 2:35 am

  4. @Seb - great comment. I agree in the sense that we shouldn’t be judging the user generated content for “rightness”, but my real concern was that the UGC is going to be built over a set of harvested data. The primary collection lives in a remote repository, and I know at some point they’re going to come to me and say “hey, can we bring some of the new data back into our main system”?

    Still, I think you’re given me the answer with the Flickr example: I should just write an API that allows the originating museum to query the system for any UGC. Now to find the time… :)

    Comment by Nate Solas — 4/19/2008 @ 5:36 am

  5. Ok, I’ve found the reference to London’s Transport Museum taking UGC directly back into their collections management system in a post on the Museums Computer Group list: http://tinyurl.com/5emc65

    And I’ve blogged a bit about it at http://openobjects.blogspot.com/2008/04/museum-and-claytons-audience.html

    Comment by Mia — 4/19/2008 @ 2:19 pm

  6. Hi Nate, there has been some discussion about the Walker Art teens website on my blog. Do you have any more information about it somewhere here. Am particularly interested in the development and subsequent design of it.

    Comment by Lynda Kelly — 5/5/2008 @ 2:16 pm

  7. Hi - Sorry I’m a bit late to this party, but, I’ve been wondering about this idea too… Our experience in The Commons has been really interesting. I think the initial guess was that tags added by the Flickr Community would be able to be mined/harvested for instant value, but it turns out that the real gold is hidden in people’s conversations about what they see in the photographs.

    Conversations are very difficult for a computer to parse, but easy to read. I was chuffed to hear how the Library of Congress team built a little report that dumped all the comments chronologically into an Excel doc so they could be read. Turns out the LC has updated around 200 of their catalogues as a result of these conversations. So, rather than “defying” the Master Metadata, it is simply augmented as the photograph’s context is broadened. I love the idea that an institution will hold “The Golden Copy” of an object (and its metadata), but that there can be a network of “augmentation channels” that the institution may pick from, or simply link to on the web.

    That’s not to say that tags are useless though, but their real strength is in the Flickr context, nestled amongst all the other words in the natural language “lexicon” that has grown over the last four years. Now I find myself wondering how we might be able to make that available, the lexicon I mean. I would love to be able to compare this natural language lexicon against all the controlled vocabularies out there. I know which one I’d find easier to navigate! ;)

    Comment by George Oates — 6/6/2008 @ 3:29 pm

  8. Hi all:

    We’re having to deal with these issues increasingly for nzresearch.org.nz/ and other metadata aggregation websites that the National Library of New Zealand is working on.

    Most of our experience is with machine-augmented metadata. For example, we keep the (OAI-PMH) harvested DC “master” metadata for a record (example), plus our internally-generated “cleaned up” form of that metadata (example) that we derive at harvest time, plus our “admin” metadata (example) which you might think of as “meta-metadata”, or metadata about the quality of the master metadata (used to feed back to repository admins and to create national metadata quality reports).

    Anyway, there’s a dependency here, because the two sets of derived metadata are based on the harvested “master” metadata and therefore have to be updated whenever the harvested metadata changes (in other words, when we harvest a new version of a record, we can delete all the metadata and start over).

    This is relatively simple because we know all the metadata is either harvested (or “master”) metadata or directly derived from the harvested metadata. However, our new prototypes throw user-contributed and admin-contributed metadata into the mix, and suddenly you’re in a situation where you want to yupdate a record because the master metadata changes and you have to try and figure out exactly what the provenance of all the metadata are for a record so that you only replace the metadata that needs to be replaced. I’m still experimenting with different ways to do this.

    I hope that made sense.
    Gordon

    Comment by Gordon Paynter — 6/17/2008 @ 9:57 pm

Leave a comment:





You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>

Keep up to date:

With an RSS feed for this post's comments. If you leave a comment you may subscribe to comment notification emails.


Powered by WordPress