Live Blog – Short Presentations (1)

OJS – Angela Laurens and Theo Andrew

OJS is the Open Journal Systems and it’s developed by PKP, a group of American Universities. And it is used worldwide with around 7000 journals. Our first OJS Journal was Concept, the Journal of Contemporary Community Education Practice Theory. We have various journals running on the system, some are student led some are academic or researcher led, but all are peer reviewed.

We held yesterday’s OJS workshop because there is a growing UK community. Since the Finch review there has been a lot more interest in the system. Ourselves and St Andrews have been using OJS since around 2009/10 but in the last year we’ve had many more inquiries. So we wanted a forum for the UK OJS community.

Some key themes arising were resources. The software is free but it has a real cost. Pittsburg estimated 3.5 FTE worth of time for them. UoE reckoned at least 0.5FTE. But its significant time. According to the Finch Report we can expect to pay an average of £1750 to publish an article. UoE regularly received invoices from Elsevier for $5000. So 8 articles costs the same as running OJS for us. It’s really good value or money and brings control back to universities and to academics themselves.

Another key theme for us was the learning curve. It can be steep but that means training up front but otherwise relatively self-supporting.

Managing expectations is important here. What does a free service include or exclude? Is it just the systems, is it design, support, training, layout, policy. Who manages submissions? Institutions providing a service have documentation in place to standardise the service and to manage those expectation. Our Keynote Vanessa Garbler explained the substantial documentation that helps ensure expectations are managed. But quality matters. Pittsburg have a committe to approve new journals to ensure quality is maintained.

A few more key themes:

Licensing matter. CC-BY recommended as NC too restrictive. Avoid heavy customisation – too difficult to maintain and manage. Similarly one installation vs multiple installations are better. And there is real opportunity to engage students – as guest editors of journals etc. And Kevin Ashley spoke about the beneficial impact of having preservation as part of the routine process of publishing as OJS enables.

Q&A

Q1) Will there be follow up in terms of meetings etc?

A1 – Angela) We hope so yes. There was real appeitite for that and for sharing expertise and experience and to build a toolkit we can all use to avoid reinventing the wheel.

Vivo, Repositories and FigShare – Graham Triggs, Symplectic

I work on intregrating our tools with repositories. VIVO is one system for integrating systems – it is a network of systems across the world. The whole system is based on Linked Open Data, all triple-based information, and data captured at the university. It’s not just publications but ontologies, events, professional activities, the people themselves, all linked together. That means there are SPARQL endpoints that can be queried and investigated. New ways to discover research and expertise across multiple universities. These systems work by harvesting different sources – from CSVs, from PubMEd, from Scopus. Your repository is only a partial view of your research. You need to add all that other stuff in.

But this stuff is tricky, how do you disambiguate the individual from multiple data sources, what all the research and resources are. One option is to use a CRIS like Symplectic Elements or PURE. Those research management systems make it easier for you to disambiguate and link to authors with precise connections. We’ve already talked at this conference about how you can integrate CRIS and repositories, taking metadata with you. But interestingly the CRIS also gives you information about what’s in the repository in a way that is exportable through an API. So if you harvest from a CRIS and put into VIVO you can actually take all of that information with the links into VIVO. For instance Duke University in America allows links right through into national networks, into the repository, from person or research to the article itself.

You can also pull data sets in from FigShare – and embed materials into other systems.

The STARS Shared Initiative – Pablo de Castro, UK RepositoryNet+ project and Jackie Proven, Repository Support Officer at St Andrews

STARS stands for St Andrews and Repnet SDLC is about delivering repository services in an advanced CRIS system set up.

The UK RepositoryNet+ project is about building a socio-technical infrastructure for UK Repository Services. On our brand new website you can see the wide range of components that are part of this: IRUS-UK; RoMEO; JULIET, and the Repository Junction Broker. And we have the different strands that the RepNet is covering in terms of services. Aggregation, benchmarking, registries, preservation, etc.

The RepNet was originally conceived to build services on the UK Repository network. But as time has gone by there have been huge changes in terms of repositories and policy. We have found there are no longer so many stand alone IRs, it is now a much more complex and mixed system, with many CRIS systems, particularly at the research intensive institutions. So we have IR-only institutions, those with IR+CRIS, CRIS-only systems, IR+Symplectic, IR+RMS. There is quite a discussion to be had here – we have a round table tomorrow with an opportunity to discuss more.

Another finding from our survey of IR managers is that there are a wide range of support services presently available to IRs for hosting or maintaining repositories. So Research@St Andrews for instance offers a complex set up – CRIS+IR and an external provider (in this case SDLC) as an example. So, over to Jackie.

You will see from this diagram how the repository fits together – how data comes in to our repository and how those systems fit together. We have a PURE CRIS system – there is a portal as a front end that showcases our research. When full text goes into PURE it goes into our DSPACE repository – and the link to that full text is included in the portal. That process is influenced and affected by the set up in PURE.

Pablo: So in our work programme for STARS included many refinements and enhancements. From enabling SWORD to use of IRUS, etc. So we started with IRUS, a project for gathering COUNTER-compliant data from repositories. It is on 35 repositories so far and aiming for wider coverage.

Jackie: Installation of IRUS was really easy for us – particularly as Claire Knowles did that for us. We now have monthly stats that we can compare to our existing Google Analytics stats. There are some discrepencies and that had led to some really interesting discussions around those – we’ll talk more about that in our Round Table tomorrow.

We tested openaire. CRIS is very tied up with the REF system so we tested openaire in a test collection in the portal of EU funded publications and we added the relevant fields – some issues with validation to investigate.

Pablo: With RJ Broker we tested that in the data workflow. There were two options of whether to push or pull the data with the CRIS.

Conclusions here. This was a really interesting opportunity to test the landscape we had been looking at.

Jackie: For us it has also been a real opportunity to build bridges with stakeholders and we want to keep those lines of communication open.

Q&A

Q1 – Balviar Notay at JISC) The RepNet project is a JISC funded project which ended yesterday but we are looking at delivering the components as shared services from this point onwards. Just to say that IRUS is not just article level data. If you are not already being harvested by IRUS then please get in touch. They are harvesting from over 30 repositories. We’ve seen over 3 million uses of materials in the repository, 2.5 million of articles. Thank you, really useful to hear your experience.

A1 – Pablo) Yes, IRUS is journal as well as article level data in COUNTER compliant way.

A1 – JAckie) For us it also links into our other systems so great to have that comparison of stats.

It’s not open if no-one can find it – Chris Gutteridge, University of Southampton

I was working on repositories, now in open data but I retain a fondness of repositories. I want to talk to you about data.ac.uk. It is not being run like Data.gov – they run on a shoe string and we run on a lot less than that (although we accidentally have some funding now!). We want to be a hub and ensure that data has a sensible generic domain for future proofing.

The big initiative we’ve been working on is equipment.data. The national portal for UK HE research equipment. So if you want a laser or a DNA sequencer for instance you can see where the nearest one is. It works through basic open data principles. The next stage for us is to export to our Southampton repository – so that we can use our equipment IDs in the papers. So we can tie the equipment to the articles published – a great way to show the value of equipment.

So if we look at the Roslin Institute in Edinburgh. They have open data about their equipment. They only have ten or fifteen items of equipment. The old system was very slow and manual, it’s expensive to maintain those types of relationships. We are making this sustainable (not cheap! sustainable). Getting info from the website isn’t scalable. But if you go to the website you can see in the HTML a single line that shows where the equipment data is – and that is harvestable. It makes it clear who the institute are, where they are. And a series of assertions about their equipment data and where that data can be found. And I can, from that, autodiscover a CSV daily, automatically, and compile it into a database you can search here. And it works. And you can follow from equipment.data you can trace back to the institutions. It’s simple, it’s automated. And neater still you could do the same thing for any type of organisation or aggregation or organisations – e.g. an Irish network. All the code is on Github – please steal the code.

Q&A

Q1 – Les Carr) Any chance of grants on the web being rolled in so that RCUK can make use it.

A1) There’s a website called research gateway with data from grants. I’m keen to link up our definitions to theirs. So yes, but there are some delays with data providers there. And the other real value… the advantage of data.ac.uk URI survives any reorganisations which will be really helpful for future proofing.

Parsimonious Preservation at the UK National Archives – Tim Gollins, Head of Digital Preservation

What do we do with preservation? We look after things for a long time! There is a concept that you should be able to describe things in two lines. Your challenge is what is the two line description for digital preservation?

So I want to talk to you about the threats to materials we want to preserve. Everyone talks about formats, about media obsolescence… what happens if you bring in your zip disks and floppy discs… you can’t read it. So… Rule 1 is get it off removable media!

So, just to say, my perspective is about National Archives, you may have different data etc. But maybe you are costing yourself more than you need. So… file formats… the long tail. So the National Archive’s own data admin… had over 1.2 million emails in our repository. Then 400+K documents. There are 130k excel sheets. But there are 800 formats. So should I worry about the 800th format in which we will have almost no data. There is an economic issue here, what’s worth saving?

And should I worry about that list in the next thirty years… Well I don’t think most of those will be obsolete any time soon. There are millions of .doc or .xls files and someone will want to read them. They will ensure that material stays readable. It’s a very similar graph as shared by the British Library recently. So, so National Archives doesn’t do file format conversions, we put things into a repository. And if people want to use them we give them to the customer – they can read these. When they can’t read them then we’ll worry about them then. How long will your repository survive?

From Pat in the crowd: Until the end of the REF! [cue much laughter]

Is your system 10 years old? 15 years old? Very few systems are operational that long. They get upgraded, replaced, improved, they get changed. So will any of the threats that any of your data will come against actually be a  problem.

What’s the two line thing for digital preservation… ?

  1. “Know what you’ve got” – have a catalogue, know whats there.
  2. “Keep the bits safe” – so that you can actually hand that record onwards reliably so that’s what has been preserved.

Q&A

Q1 – Jacqui Taylor) In the NHS there are systems much older than 15 years

A1) Good point. They have a major preservation problem for that reason. I’m not aware of anyone addressing it..

Q1) I am!

A1) Great!

Q2 – Kevin) The idea of counting formats is a good way of cutting the problem. Sometimes a rare item is more important to preserve, and you put the effort in then.

A2) That is the point. You can invest realitively cheaply in investing in seeing what’s there, and then you have the resource to identify what needs that expensive specialist curation and preservation. That stuff at the end of the long shape. But if you don’t know what you’ve got you can’t curate effectively – you attempt to curate everything to the same level.

Share

I am Digital Education Manager and Service Manager at EDINA, a role I share with my colleague Lorna Campbell. I was previously Social Media Officer for EDINA working across all projects and services. I am interested in the opportunities within teaching and learning for film, video, sound and all forms of multimedia, as well as social media, crowdsourcing and related new technologies.

Tagged with: , , , , , ,
Posted in LiveBlog

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Repository Fringe 2013 logo

Latest tweets

Repository Fringe 2013 is organised by:

The Digital Curation Centre

EDINA

The University of Edinburgh