Today we bring you a guest post reflecting on the experience of being a first timer at Repository Fringe. Our blogger is Richard – and we’ll let him introduce himself…
My name is Richard Wincewicz and I work at EDINA as a software engineer. My background is synthetic chemistry but three years ago I got involved in the Islandora project (http://islandora.ca) based on the east coast of Canada. A year ago I moved back to the UK and started my current position at EDINA.
This was my first Repository Fringe and I was surprised at how comfortable it felt. I’ve not been in the repository field for that long but I’ve got to know some people and this was a great opportunity to catch up with them. Being relatively new also meant that there were plenty of people there that I’d not met before and so I spent a lot of time making new connections with people. The sessions were fairly informal and plenty of time was allowed between them to let people engage and share their thoughts and ideas. Even so there were occasions where Nicola had to heard a number of stragglers (me included) into the next session because the impromptu discussions were so engrossing that we’d lost track of time.
When I signed up I’d indicated that I wanted to take part in the developer challenge. At first I looked at the topic of ‘preservation’ and thought, “That’s a broad topic, I’m sure I can come up with a useful idea over the next couple of weeks.” On my way home on the evening before my entry had to be submitted I finally came up with an idea that was potentially useful and feasible given that I only had a night to get it done (as well as sleep and eat).
Over the previous couple of days I had heard a few people mention the lack of metadata provided when given content to store, alongside the lack of willingness of providers to change. My idea was to create a web service that would take any file that you wanted to throw at it and provide as much metadata as it could glean from the file in a useable form. There are plenty of tools around that will extract the embedded metadata in a file, the Apache Tika project (http://tika.apache.org/) being one of the more comprehensive ones, and my application was basically a front end for this.
The added value that I provided was to return the metadata in Dublin Core. This meant that this web service could be integrated into a repository workflow with very little effort. My plan was to expand the number of metadata schemas available to make it easier for the repository to incorporate the output directly but I sadly ran out of time. One thing that became clear while testing my code was that often the quality of the embedded metadata was poor. After discussing my project with Chris Gutteridge I decided that mining the document for relevant information would give richer metadata but require a lot more time to produce anything even remotely functional.
In the end I spent around 3 hours on my entry but I was proud that I had something that not only worked but didn’t fail horribly when I demoed it live to a roomful of people.
I enjoyed my first Repository Fringe immensely. I got a huge amount out of it both in terms of learning and networking. I plan to attend next year and hopefully find a couple more hours to work on my developer challenge entry.
Leave a Reply