Rocio von Jungenfeld is introducing our final three presentations…
Robin Burgess’ Repository Song
I presented in 2011 as part of Repository Fringe I was presenting my paper on take up and embedding from the Glasgow School of Art’s new repository, and embedding their previous system in a better one. Then it was the story so far and where we were… the conclusion was that we finally had a name… RADAR.
Two years on we will update you on where we’re at at the moment! They said never work with children and animals… watch out for miaowing from my cat! The song is called RADAR: The Repository Song.
As I can’t capture Robin’s song properly here there will be video…
It’s better than Filemaker… and easy to use… It’s built on EPrints. It gives a much better view of GSA’s research capability. And they can use the repository from home. The impacts we’ve been seeing are mainly to improve take up and use of research outputs… hopefully. Recommendations? Make sure you engage staff resources and work with their needs.
RADAR is a repository and thank you for welcoming us to the repository community!
Robin Rice – Edinburgh DataShare and RDM
I know some people are interested in Edinburgh’s DataShare and I thought I’d share some “warts and all” issues we’ve faced with putting data into repositories. DataShare is free at the point of use data repository that started in 2008 back when no one was really thinking about reuse of academic research data in that way. We worked with repository communities and library communities. We worked with Oxford and Southampton on this. We have gone on to create our own Research Data Management Policy. It means people should deposit their data. We don’t say it has to be open access as a term – but it’s clear that people should make their data available – we are about encouraging sharing.
We had the policy in 2011, now we have a steering group who help guide us make sure it’s fit for purpose for academics. So this is a picture of the University RDM Roadmap. It’s the brickwork for everything we do. It has to do with the data stewardship part of the roadmap. We have been using the DMP Online tool as part of this too. So the group have given us a challenge to look at tough test cases. So for instance Dr Nuno Feirrara came to us through MANTRA and wanted to encourage good practice and get students to share data in Clinical Psychology. But they have NHS data and supervisors and a lot of the fieldwork is considered sensitive. Not in a legal sense but the NHS people may not want a study out there in case it leads to a scandal. So whilst Nuno grapples the politics we could use him as a usability test case – he wanted a simple process for his students to work through. So we’ve got a lot of usability results out of it. So the next two releases will be fixing that, hiding unneeded fields etc.
Another use case we had, Dr Bert Remijsen was sort of a perfect user – we did assisted deposit for him. He’s from Linguistics and English Language. His expectations was that he could upload zip, magically unpack it and explain the content. To get round that we did it for him. But we would like to make it that simple… He had already deposited in a Max Planck repository for lost languages too. So is duplication good or not? User happy with download stats and referred colleagues to them.
And we’ve had users from Informatics – so Prof. Simon King from the Centre for Speech Technology Research – recording and annotating videos. They want the data. They have huge video files. They have specialist software to deposit. We’re still at the talking stage with them. There is an ongoing deposit… They want user registration, their own licenses in the headers of the files (a system they devised 10 years back). So there are special non-standard terms and conditions. How do we cope with that? But there are no checks on some of the licenses – some would say requiring to register violates academic freedom.
And another use case has been the Roslin Institute – they have lots of ‘omics data. We felt there were specialist repositories for this stuff… apparently not the case. They have loads of data created by machine, big files, getting bigger, very easy to generate. Should the repository be the place to share things? Or should they go it alone to figure out storage solution? That’s still an open conversation. And they are interested in push-pull relationship with the CRIS.
And finally the fish 4 knowledge project from Prof Bob Fisher. It’s an EU project in the Institute of Perception, Action and Behaviour (Informatics). But when that project ends do they just wipe the data? It’s observational data. The professor came up with a way for automatically detecting fish in video. He’s suggested keeping a 5% sample of the video data but that would be a 3TB. And that would swamp our database – a great challenge for us around big data.
And we’ve also done some work with the ECA. They have their own digital asset management needs so to what extent should there be rich display to the users – is that the responsibility for another service?
So issues arising from pilots include: usability and user education; encouraging user to document and future-proof; relationships of IRs and subject repositories, etc.
Q1 – Chris Adie) Many of those use cases are very specific. But at the same time one thing that comes across when managing data is that we need more use cases described to learn from across the sector. So to what extent could you write up or describe those use cases for the sector? Of how to manage different types of data.
A1) I think that’s a good idea. I’d like to solve all the problems first.
A1 – Stuart) We keep thinking people must have done this already so a central collection would be great!
Stuart Lewis – ResourceSync and SWORD
Somehow I’ve managed to both start and end the day! So you all seem to know about SWORD (by show of hands) so I’ll focus on ResourceSync. JISC has been supporting this work. ResourceSynch has been developed by the same people who developed OMI-PMH but this time working with NISO/OAI with Sloan funding and building on the OMI-PMH experience.
It’s basically about ways to synchronise resources on the web – and those might be files, images, whatever. As well as OMI-PMH has been adopted and embedded in our community it has it’s issues. So this allows us to look for changes or updates needed in the repository – so if you archive a site you might just want to update rather than overwrite the data (e.g. new blog posts rather than overwriting the whole history of posts). It’s an interoperability protocol – like SWORD – not a piece of software! Much like http. You have to have a client and a server.
There are several different laywers to the protocol: discovery; capability description – how does it do that?; baseline sync – grab everything; changelists – a way to gather only latest updates; dumps – basically zips… ways to archive the repository quickly and efficiently.
So that’s ResourceSync. It’s about getting things out of repositories for reuse and discovery.
Now, who has heard of SiteMaps? Your repository will support them out of the box! They let Google etc. understand your website rather than crawling them. So we aren’t reinventing the wheel… we are making use of the sitemaps adding information about changes, about relationships, about what needs to by synced.
To sum this up… if you take nothing away… SWORD is putting stuff in repositories. ResourceSync is about what’s in the repository and what’s changed.
And if you feel brave arXiv ResourceSync have a Feed…
And here’s a 30 second demo. So one nice thing about the trials JISC has funded is that they are real repositories. So I will run 3 lines of code. We see the changes and can sync them. Then run another line or code to deposit all those changes! And we can see them update in realtime! Live mirroring of ArXiv into DSpace.
Q1 – Pat MacSweeney) When are DSpace and Eprints going to have this?
A1) Well part of this work has been about how to do this in a fairly generic way. The DSpace trial has highlighted that DSpace timestamps an “item” but not “parts of the item”. We can see it’s updated but not which part has changed. And that’s highlighted that issue which needs resolving to make ResourceSync work. So there is a version for DSpace.
Q2) Is it that dependent on URLs that you can’t connect to desktop app? You can imagine that being useful…
A2) Call it a URI then. So that should be possible.