We are kicking off Day Two with short presentations:
Hydra – Chris Awre, University of Hull
Firstly thank you to my colleague Tom Cramer at Stanford for some of these slides. Hydra started out as a project in 2008 between University of Hull, Uniuversity of Virginia, Stanford University, Fedora Commons/DuraSpace, MediaShelf LLC. As we identified a common need. The time-frame was 2008 to 2011 but it is now running indefinitely.
We had several fundamental assumptions. Firstly that no single system can provide the full range of repository based solutions for a given institutions needs, yet sustainable solutions require a common repository infrastructure. And we also assumed no single institution can resource the development of a full range of solutions of its own, yet they all need to tailor their solutions to their own local needs and circumstances.
So Hydra is a repository system which you can take and run and use but you can select what you need knowing that all elements share a common infrastructure. But Hydra is also a community, that is key to sustainability through encouraging lots of input from lots of places. It is a technical framework that can be applied to other solutions. And Hydra is open source.
The software we use is a Fedora repository with a Solr indexing tool. It uses Blacklight, adapted to repository content, as an interface. And everything is built on Ruby as it is flexible, has excellent testing tools. With Ruby Gems used as well.
Fedora can be complex in enabling its flexibility – so how can the system be enabled through simpler interfaces and interactions? Well the concept of Hydra is that there are many views onto a single body of materials (Hydra, one body, many heads). We now have well over 20 institutions using Hydra. Many are in the US but there are others around the world. Hull is by no means a large university and we have really benefited from being part of this project. LSE, Glasgow Caledonian and Oxford are also using Hydra.
Hydra allows you to manage ETDs, Books, Articles, Images, Audio-visual content, Research data, Maps and GIS and Documents. You can include any of those as a single body of content so that you are not building new systems for each. And that idea of different views allows you to filter through that single body of data.
We have four key capabilities:
- Support for any kind of record or metadata (as per Fedora)
- Object specific behaviors – whether books, images, music, etc.
- Tailored views
- Easy to enhance
Hydra@Hull includes many types of data – we have datasets, committee papers, student handbooks and articles, etc. We try not to overstretch ourselves but it’s great to be able to accommodate others’ needs. We are using version 6 with BootStrap interface (Twitter interface tool now usable for other sites).
We have seven strategic Hydra priorities at the moment:
- Develop solution bundles
- Develop turnkey applications – make it even easier to set up and install
- Grow the Hydra vendor ecosystem – support matters and we have already started to see vendors come onboard
- Codify a scalable training framework to fuel community growth – a session in Dublin recently, more coming up in Virginia soon
- Develop a documentation framework
- Ensure the technical framework allows for further enhancement and development – we will be “Gemifying” in September
- Reinforce and develop the Hydra community.
Q1 ) Can you say more about BootStrap?
A1) It is a CSS library designed for or by Twitter and you can download it and either use the entire libraries (as we are) or you can take elements and apply them.
Q1) Mainly for responsiveness?
A1) Well yes, everything works on mobile. But it’s also about the freshness and flexibility of the design.
Q1 – Les) Can we have your OMI-PMH endpoint?
Q3) Repository owners have problems recruiting developers and keeping them engaged, Hydra doesn’t seem to have that issue, can you say why you think that is?
A3) I think the choice of Ruby – which none of our developers had used before – but all got up to speed rapidly and they enjoy that environment and they enjoy the interaction of others sharing ideas with each other. One reason it’s potentially successful in the US in particular may be that US libraries take technical development really seriously. That can seem to be a struggle in the UK and we will need to address that if repository development is seen as important.
Q4) Can you say a bit about how you have adapted for other purposes?
A4) A lot of our REF records are being output from the CRIS using Ruby scripts. In terms of Data Management we use using Blacklight to enable searching and analysis of data sets.
Andrew Dorward and Pablo de Castro – UK RepositoryNet+
We have been involved over the last few years on UK RepNet which is a project to build out the socio-technical infrastructure for shared repository services in the UK. The two year project came to an end two days ago so this is an ideal opportunity to reflect on what we have done. We have a round table later in which we will explore some of the issues – especially around CRISs – later. But just now we will share Outcomes and Lessons learned.
We have worked on various services and elements – some have come from ideas through to services during the project, some have been explored during the project. Our website gathers what we’ve done in one place. The project has two more years funding from JISC for the service elements such as IRUS-UK, RoMEO, Juliet, Repository Junction Broker will continue, under the management of JISC. And the website includes so much more of our work and findings as well.
In terms of our outcomes we used an ITIL framework for transitioning projects into services – to bring all stakeholders together in a coherant framework. So what lies under RepNet are a series of components addressing Aggregation and Search; Benchmarking and Reporting; Registry of Repositories; Deposit Tools; Metadata quality – tools for enhancing it. And then a gap analysis of where gaps in metadata could be filled through new initiatives or services.
We started the project with a market analysis – looking at where repository managers felt they were and where they would like to be. We looked at prototype projects and services and then moved those through to services. So one thing we created was a mapping of the CRIS / IR landscape. We found a mixture of usage and obviously over the last two years some of these have changed. We want to explore CRIS further later on.
Another key outcome of the project was stakeholder engagement activity. A complex diagram as many stakeholders. We have HEIs, Service Providers and Vendors. We have JISC, we have RCUK and Wellcome, and we have component providers (e.g. EDINA, University of Nottingham, Mimas, etc) and we have OpenAIRE/OpenAIRE+, we have COAR and euroCRIS. We also have ARMA, UKCoRR and RSP.
The STARS initiative was one of the main outcomes of RepNet in stakeholder engagement and exploring how the landscape analysis could apply to a single institution – in this case St Andrews. We explored running services on DSpace IR and/or on PURE CRIS.
Q1) I think IRUS has been one of the big successes of the last few years. But two things on there I’m curios about is the Repository Junction and Metadata Enhancement. When I think about that I think about the REF framework – put data in there and CrossRef comes back with matching metadata and that’s been very useful for tidying up my metadata
A1) So IRUS wise I think people who were in the Repository of the Future session yesterday – Balviar mentioned that 35 institutions are signed up but we’d like to get to 150. When we worked with St Andrews it was a fast install for IRUS. There have been huge numbers of downloads, extrapolating across the UK there will be huge traffic across the network. We are hugely excited by IRUS. That’s analytics but adding in bibliometrics and Altmetrics it is even more exciting.
Pablo: RJ Broker – the broker allows mediated full text deposit and as there is increased demand to do that it will really be useful. We went through some implementation issues yesterday and will have more space to talk about that later on. Every repository platform and version has to be supported in order to use push mechanism and SWORD across all repositories. It has two aspects. The core is working, the additional work is being implementing.
Balviar: On RJ Broker… it is in test phase right now. It is being tested with Nature Publishing Group, EuPMC, Imperial and Oxford. All in test phase. In terms of what it can do it’s one-to-many deposits – e.g. for multi author papers. Both those publishers are really on board.
Pablo: Finally I’d like to highlight the paper by William Nixon on APC funding – related to your question on the REF, including funding as metadata. This was published in the UKSG journal Serials earlier this week and we will talk more about that too.
Angus Whyte – The role of repositories in supporting RDM: lessons from the DCC engagements
I want to share some experiences and really about the role of repository managers in the wider institution. Following on from other presentations really in that it is about interoperability.
For those who are not aware the Digital Curation Centre has been around for almost ten years. In latter years we have had a much greater focus on Research Data Management. Since 2011 we’ve had HEFCE funding to help institutions engage and embed Research Data Management. Our work includes institutional engagement in both research intensive and teaching led institutions.
We have also had a background role in the JISC Managing Research Data programme, which has funded 25 infrastructure projects from 2009-2013. We have supported events and provided tools for the sector.
So we have a view of the development process of RDM. This process certainly isn’t linear. Our role has focused on the earlier stages – helping institutions to develop policies and advocate for reseach groups. We have tools to build on work carried on elsewhere. CARDIO – Collaborative Assessment of Research Data Infrastructure and Objectives and DAF – Data Asset Framework. Cardio is based on work by the Data Library and other institutions in 2006.
From working with the institutions that we work with and from speaking with JISC we came up with this view of the services that we see emerging over the last few years. This is a very high level view but captures early stages of technology through to establishing data catalogues with metadata assets. We can probably see repository roles sitting on the bottom (guidance and support) and the right side of this diagram.
In terms of emerging services there have been a number of excellent surveys published recently in the US and the UK (Cox and Pinfield 2013 as well as Corrall, Kennan and Afzal 2013). These really give a good view of planned RDM services. Very interesting views of what libraries in particular are planning to deliver in the next 2 to 3 years. And the prioritisation of those plans. There are a mixture of advice and liaison, and technical services planned. There are interesting points from those priorities – there is still a lot to do to help libraries develop policy. And data citation advice comes low down the list – it is a priority for funders but perhaps the library see their role slightly differently here. What sits with the library, what sits with the institution?
When we engage with institutions repository managers get involved in very different things. So if we compare Oxford Brookes with Edinburgh University – very different institutions – we see repository managers taking lead roles in steering groups to develop policy, to develop online guidance, to support data management planning. Oxford Brookes have been driven by EPSRC expectations and they are aware that the infrastructure isn’t what they’d like it to be. They have done a lot in the last few years. There is data in the IR and they have a helpdesk, all without specific RDM staff. A contrast with Edinburgh. Edinburgh have been active in this area for many years, Robin Rice has had a very active role in the steering group here. One of the first UK data repositories. Data Library pivotal in RDM developments. They have actively involved social science librarians to help build awareness and activity. They have led on RDM policy and training materials – particularly on MANTRA of course.
So, to sum up. In our experience repository managers are very active in kickstarting “softer” capabilities. Still few universities have dedicated RDM staff, tends to be carved out of existing academic liaison roles (also indicated by surveys mentioned). It’s kind of obvious that repositories already deal with computing services, research support and records managers but I hope we can discuss later is what these relationships, particularly for day to day work, work in terms of research data, continuity of process and data. And where new workflows come in.
Q1 – Andrew Dorward) You talked about data catalogues – is that a repository or registry of data repositories? Is there a common way of benchmarking metadata for different disciplines where data varies?
A1) Good question. Data catalogues are basically catalogues of what research data has been produced by the institution. Not all things will be recorded or deposited in the repository, if there is one. But generally we think institutions want a record of what has been deposited whether with them or elsewhere. So that catalogue has to be lowest common denominator to work across disciplines and contexts. Southampton have thought this through well with a three level approach allowing people to make some choices rather than shoehorning data into the inappropriate format for them.
Q2) Question about the survey and whether you were shocked by the results. There are few institutions with dedicated RDM staff. Priorities for training and advice… will they hire people for those roles?
A2) People are trying to carve those roles out of existing roles. Then hiring short period (1-2 years) project managers to lead that work. Whether institutions make the EPSRC 2015 deadline or the 2014 REF will be interesting. Institutions have to figure RDM into their planning to ensure they can get things going.
Q2) In terms of roles… is anyone here in the education space rather than repository management space?
A2) We know of a few but you could put them in a couple of taxis! Those institutions that have funded those roles have done so as they see it as a competitive advantage.
Comment – Kevin Ashley) Most of those surveys questioned libraries on RDM but not institutions. Libraries are important stakeholders in RDM but not the be all and end all. If one wants to understand what universities are doing with research data you have to ask universities not just libraries.
A3) Yes, I think that’s reflected in priorities
Q4) In terms of trust that you mentioned… is that in the repository within the institution or for the end users access the data?
A4) Firstly researchers have to trust the repository and funders require researchers to deposit data in appropriate jurisdictions. There is a gap at the moment in guidance around what is a good place to ask researchers to deposit their data, in terms of trust standards, seals of approval, ISO16363. That standard has been established but few repositories are established that are certified. How do you deal with databib that lists hundreds of repositories but no guarantee of longevity? We probably need RJ Broker but for data… more work to do to get there first though.
And now for coffee followed by Round Tables and the parallel judging of the Developer Challenge!