Repository Fringe 2013: Reports from the Blogosphere

As a Friday treat we thought we would share with you all of the reports on Repository Fringe 2013 that have been appearing across the blogosphere. These are brilliant personal records of the event and the workshops. They include some fantastic reflections, links to additional materials, and an opportunity to experience the event from someone else’s perspective.

We’ve decided to order these by category so take a wee browse and enjoy:

The Workshops
The Round Table Sessions

Planes, Trains and Automobiles

Developer Challenge

Pecha Kuchas and Presentations

Reflections and Feedback on Repository Fringe 2013

  • Gaz Johnson has provided a useful summary of Repository Fringe 2013, including some really useful feedback and suggestions for future events.
  • Richard Wincewicz has blogged about his first experience of the event – and of taking part in the Developer Challenge – in his guest post: My first Repository Fringe.
  • Lynette Summers of Cardiff Metropolitan  University has written a great summary of the event for the Wales Higher Education Libraries Forum (WHELF) blog: Repository Fringe 2013.
  • Chris Awre has provided his reflections across the whole event on the Hull Information Management Blog: Edinburgh? Fringe? Must be a repository conference.

Still to come…  

If you have written a post on RepoFringe we would be more than happy to add it here and to our forthcoming summary post. Please just leave us a comment here or email repofringe@gmail.com.
Share
Tagged with: , , , , ,
Posted in Announcements, Guest Blogs

Taming the beast – working with Hydra together

Another guest blog post for you from Chris Awre, University of Hull:

It seems at first slightly self-rewarding to be able to use a conference blog to highlight further some of the points I made in my conference presentation.  I’m sure it is.  Nevertheless, I’d like to do so in the context of the workshop and roundtable I led on ‘Getting to the Repository of the Future‘ and the conference as a whole, which I hope places any platform plug in its proper place.

To re-cap, Hydra is a project initiated in 2008 to create a flexible framework that could be applied to a variety of repository needs and solutions.  Hydra today is a repository solution that can be implemented (based around Fedora), a technical framework that can be adapted to meet specific local needs, and, of equal if not greater importance, a community of partners and users who wish to work together to create the full range of solutions that may be required.  There are, as of August 2013, 19 current partners, with a number of others considering joining: in Europe the University of Hull is joined by LSE and the Royal Library of Denmark as partners, with use also taking place at Glasgow Caledonian, Oxford, Trinity College Dublin (for the Digital Repository of Ireland), and the Theatre Museum of Barcelona.  Not a large number as it stands, perhaps, but each exploiting what Hydra can provide to meet varied needs, sharing experiences and ideas, and demonstrating how a flexible platform can be adapted.

I’d like to pick out three main themes:

  • What is Hydra?

In the Repository of the Future workshop one of the main points raised was about clarifying the purpose of a repository.  This allows it to be situated in a broader institutional context without necessarily competing with other systems.  In doing so, it suggests that repositories should focus their activity rather than suffer mission creep and dilute their core offering.  I was conscious of this as I described Hydra in my presentation as being able to manage any digital content the University of Hull wished us to.  Contradiction?  On one level, yes, and I am all too well aware of the need to clarify what the repository is actually doing so as to strengthen the message we give out: I am defining our repository service more succinctly for this reason, for the University and for library staff.  But that doesn’t mean the repository infrastructure shouldn’t be capable of managing different types of content so that when a use case arises the repository can offer that capability to address it.  Clarifying our repository’s purpose is thus emphasising that it is a service capable of managing structured digital content of all sorts, with foci around specific known collections.  Other Hydra partners have focused their developments on more specific use cases (e.g., Avalon for multimedia, or Sufia for self-deposit), albeit recognising that Hydra provides them with the wherewithal to expand this if they need to.  And if we can share the capability between us as part of a community, then we can expand functionality and purpose as we need to.

  • Repository as infrastructure

I mentioned repository infrastructure in the last paragraph.  A challenge I threw out to the workshop, and throw out here, is to go into an institutional IT department and ask if the repository is infrastructure or an application.  These are treated very differently, with the former often given more weight than the latter I would argue.  I would also suggest (and I’d welcome feedback on this) that repositories are more considered an application.  However, if we are to take the management of digital collections seriously then they need to be treated as infrastructure, and the purpose of a repository built up from there.  A lot of the thinking behind Hydra is based on the repository underpinning other services through making content available in flexible ways to allow it to be used as appropriate.  Someone at the workshop referred to a repository as a ‘lake of content’.  Whatever the scope and purpose of a repository, managing that lake is an infrastructural role akin to managing the water supply network as opposed to focusing on the bathroom fittings.

  • Technical support

Key to Hydra’s evolution has been the dedication of many software developers to contribute from the various institutions they are employed by – a classic open source model in many ways.  I was asked following my presentation how Hydra had been successful in getting such commitment.  One part of the answer was the one I gave, that the software choice, Ruby on Rails, had proved very amenable to agile development and frequent input, and that the developers liked using it.  Another is the further point I made, that the US libraries can sustain such projects as Hydra because they recognise the value of technology to their libraries, and are prepared in many cases to back that up with specific staffing resource.  Certainly this is most evident at the larger institutions, but it goes beyond this as well: not for nothing is there a technically-oriented national Digital Libraries Federation through which digital library initiatives can be showcased and shared, and the developer-focused Code4Lib community.   Developer staffing within libraries in the UK is there in some cases, but is not widespread.  If we consider repositories as being part of a library’s future, do we need the technical commitment to ensure they can do the job they need to?  At Hull we rely on IT department staffing, as many do.  Perfectly adequate for managing an application, but is it an indication of real commitment?  Where it is not feasible to have local technical staff, is there a model that supports dedicated developer input as part of a collaboration?  Of course, even with dedicated technical resource it may not be feasible to do everything alone – hence the Hydra model of doing things together that partners the size of Stanford and Virginia continue to value.

<setting out stall>

Of course, at the University of Hull we view Hydra as being a route down which we can get to the repository of the future.  It provides us with the infrastructure we need to establish our repository’s purpose, but adapt and grow from this as the University requires.  It also allows us to say ‘yes’ when we are asked about the ability to manage different content, even if there may be associated staffing resource issues that need resolving.  We think this will stand us in good stead moving forward.  Hydra won’t necessarily be right for others as a technology, but I hope that the community aspects of working together technically can be adapted to suit regardless of technical platform.  If interested in pursuing more about Hydra as a technical solution, though, let me know 😉

</setting out stall>

Share
Tagged with: , , ,
Posted in Guest Blogs

My Summary of Thursday 1st August

Today we bring you a rather belated live blog (entirely your Repository Fringe blog editor’s fault) from guest blogger Valerie McCutcheon, Research Information Manager at the University.  She is part of a team that provides support for managing a wide range activity including datasets, publications, and other research outputs. Valerie blogs regularly at Cerif 4 Datasets.

Here is my brief summary of Repository Fringe Thursday 1st August.

Met some old friends and some great new ones and look forward to exploring several areas further including:

  • Open data – Are there good case studies out there from UK higher education institute based researchers that might illustrate to our researchers some potential benefits to investing time in making data openly available?  Had a good chat with Jacqui Taylor over lunch and hope we can follow this up.
  • Open Journal Systems, OpenAIRE compliance, and Repository Junction Broker – all sound like things we ought to progress and are on our list – need to see if we can find more time to investigate
  • Concerns over quality of data put out there e.g. by Gateway to Research – I will follow up with Chris
  • I wondered if the ResourceSync might be a useful option or at least concept to emulate to address synchronisation of data with RCUK outputs systems – I will follow this up with Stuart Lewis

Overall the day exceeded my expectations and I got a lot out of it – thank you!

 

Share
Tagged with: , , , , , ,
Posted in Guest Blogs, LiveBlog

My First Repository Fringe

Today we bring you a guest post reflecting on the experience of being a first timer at Repository Fringe. Our blogger is Richard – and we’ll let him introduce himself…

My name is Richard Wincewicz and I work at EDINA as a software engineer. My background is synthetic chemistry but three years ago I got involved in the Islandora project (http://islandora.ca) based on the east coast of Canada. A year ago I moved back to the UK and started my current position at EDINA.

First impressions

This was my first Repository Fringe and I was surprised at how comfortable it felt. I’ve not been in the repository field for that long but I’ve got to know some people and this was a great opportunity to catch up with them. Being relatively new also meant that there were plenty of people there that I’d not met before and so I spent a lot of time making new connections with people. The sessions were fairly informal and plenty of time was allowed between them to let people engage and share their thoughts and ideas. Even so there were occasions where Nicola had to heard a number of stragglers (me included) into the next session because the impromptu discussions were so engrossing that we’d lost track of time.

Developer challenge

When I signed up I’d indicated that I wanted to take part in the developer challenge. At first I looked at the topic of ‘preservation’ and thought, “That’s a broad topic, I’m sure I can come up with a useful idea over the next couple of weeks.” On my way home on the evening before my entry had to be submitted I finally came up with an idea that was potentially useful and feasible given that I only had a night to get it done (as well as sleep and eat).

Over the previous couple of days I had heard a few people mention the lack of metadata provided when given content to store, alongside the lack of willingness of providers to change. My idea was to create a web service that would take any file that you wanted to throw at it and provide as much metadata as it could glean from the file in a useable form. There are plenty of tools around that will extract the embedded metadata in a file, the Apache Tika project (http://tika.apache.org/) being one of the more comprehensive ones, and my application was basically a front end for this.

The added value that I provided was to return the metadata in Dublin Core. This meant that this web service could be integrated into a repository workflow with very little effort. My plan was to expand the number of metadata schemas available to make it easier for the repository to incorporate the output directly but I sadly ran out of time. One thing that became clear while testing my code was that often the quality of the embedded metadata was poor. After discussing my project with Chris Gutteridge I decided that mining the document for relevant information would give richer metadata but require a lot more time to produce anything even remotely functional.

In the end I spent around 3 hours on my entry but I was proud that I had something that not only worked but didn’t fail horribly when I demoed it live to a roomful of people.

Summary

I enjoyed my first Repository Fringe immensely. I got a huge amount out of it both in terms of learning and networking. I plan to attend next year and hopefully find a couple more hours to work on my developer challenge entry.

Share
Tagged with: , , ,
Posted in Guest Blogs

(Open) Heaven is a Place on Earth – or How to Get to Utopia Without Really Trying

Today we bring you a guest post from Gareth J Johnson (@llordllama) on the Open Access and Academia  Round Table led by Gareth and Dominic Tate on Thursday 1st August. Gareth is a former repository manager and is currently working towards a PhD that is examining issues of culture, influence and power related to open scholarship within UK academia at Nottingham Trent University.

In all the hubbub and hullabaloo of the Repository Fringe about wondrous technological solutions and efforts to bring us to the dawn of a new age of openness in scholarship, we thought it would be worth spending sometime asking the question “So just what would the utopian end point of open access within academia be?”  It was, I think you can agree a fairly large question to tackle and one that I don’t think we’ll claim we made conclusive headway in during the 90 minutes.  However, as an exercise in attempting to get everyone in the room to step back for a moment from concerns about the REF and having to meet senior institutional management’s expectations, and to consider what the end point of open access would ideally be I think it was a reasonable success.

Participants in the Open Access and Academia Round Table session

Brief introductions from those present revealed a constituency comprising mostly repository workers, with a smattering of more technical staff and a publisher or two; which would likely bias the results of the discussions in a certain direction.  We started with the precept that the current OA situation in academia couldn’t be perfect, given people’sinterest in attending the session and conference.  And at this point asked the first key question:

  • What could or would the utopian open access situation in academia look like?
  • What is involved?
  • Who is involved?

Early suggestions included ensuring that OA was simply built into academics’ natural practice and ensuring that openness in scholarship wasn’t siloed into simply research papers but embraced data, education, sculptures and other expressions of scholarship too.  At this point everyone was broken into small groups to discuss these issues.

Participants in the Open Access and Academia Round Table session

A broad range of ideas came back from the groups, some of which it was noted are potentially mutually interdependent or diametrically opposed.  But as we’d said at the start, this was a utopian view where not everything could or would be achieved.  Aspirations ranged from the holistic to the specific with desires for open licences, no embargo periods through transparency for Gold OA pricing to XML over PDF as the standard format.  Interestingly given the current UK situation and prevalence of Gold OA, there was some considerable desire for transformation of the scholarly dissemination environment too with calls for open peer review or for institutions to take over the management of the same.

Overall thought the discourse from the groups seemed to suggest a sense that OA should become the norm for academia, that it should be so regular and normal as to almost be engaged with without comment.  Embedded and invisible in this way calls for research community engagement would likely find near total compliance.

We asked the groups next to consider which of these activities were in their eyes the most significant, and then to go back and think more about the second of our key questions in relation to it.

  • How could or would this be achieved?
  • What needs to change?
  • What needs to stay the same?
  • What needs to happen next?

For the record none of the groups picked the same key task (interesting…) which meant we had 5 separate areas of OA utopia to be worked on as follows:

  1. Changing the culture
  2. OA achieving invisibility
  3. Institutional management of peer review
  4. Transparent pricing for all OA
  5. Embedding OA in the research process

What follows are some of the main points that came out:

Culture

  • Considering the needs of different stakeholders – students and academics – in scholarly discourse and access.
  • Same funding and policies for open access for all disciplines.
  • Students to have a more embedded understanding of where their research information comes from – thus not just library led training session on information skills.
  • Institutions to do their own publication of monographs and journals, thus everyone would have the same level of understanding.
  • OA league tables rather than traditional measures of excellence for recruited new students
  • VCs need to be on board with OA as it should impact on every decision made at the university.
  • Changing internal promotional for academics through greater recognition of OA outputs.
  • HEFCE to fund universities equally based on OA and not funding universities based on traditional institutional outputs.
Participants in the Open Access and Academia Round Table

Invisibility

  • Repository Junction Broker making harvesting easier from a centralised point.
  • Capturing multiple-versions (ala Jack Kérouac ‘s On the Road which existed in multiple forms) as some scholars want to interact and use earlier/later versions of research.
  • Open peer review could support OA.  Noted the Nature experiment where this was tried and hadn’t really been taken up by the academic community.
  • Issues about who does this peer review and if issues of age/generation of academic impacts on this – would Generation Z academics fit more naturally into an open scholarly culture than Gen X and Baby Boomers.
  • Are there penalties for failure to comply with RCUK mandate.

Discussion

Suggestion from the other groups that future generations might not want to reach the same goals for OA as the current movement members – might this mean a shift in an hitherto unexpected direction?  Should students be consulted about how and where OA should go was another thought.

Managing Peer Review

  • Collaborative approach to be taken within disciplines with champions nominated by institutions to take this forward.
  • Academics and editors need to continue to maintain their involvement in peer review, to retain rigour and quality
  • The way that research outputs are evaluated needs to be revised and changed, seeing that research has an impact beyond academia and that simple citation counts are insufficient judging criteria of excellence alone.  Thus embracing of impact evidence, alt-metrics and a rethink of the whole qualitative vs quantitative value of research outputs is needed.

Discussion

A discussion point that Social Sciences/Humanities work less with citation counts and this would help them to be viewed in an “as valuable as” science way by senior faculty and external auditors.  There was also a discussion around the time it takes for non-STEM subjects to become recognised as having achieved impact is much longer; although a counterpoint is that some pharmaceutical research only becomes recognised as significant many years after it has been done as well.

Issues around the ability of scholarly dissemination to transformation and evolve through the auspices of OA were examined as well.  In particular a point was raised that methods and routes of communication have evolved considerably in the past couple of decades and yet dissemination of scholarship has not kept pace, a point near to this author’s heart in his own current researches.  That researchers it was suggested still function within a print mentality in a digital world was suggested as being perpetuated by the way impact is calculated currently.

Dominic Tate with participants in the Open Access and Academia Round Table

Transparent Pricing

  • A transparent pricing system for OA is very urgent.  The past is an opaque system whereby the library spent money but academics were unaware of the levels.  Right now is a golden (no pun intended) opportunity to make the cost of scholarly dissemination made transparent, but it seems there continues to be a danger that we will continue to operate in an opaque system.
  • Universities should make public how much funding it has for open access, and exactly what the APCs are for each of the publishers that are available for people to publish with.
  • JISC Collections should do this as well, with their memberships to say “This is exactly how much the average APC will be”.
  • A nationally coordinated database with this information in, so it would be possible to see who has the best deals with publishers in this respect.  This is a role it was suggested for libraries, and be the advisory point within the university for publishing in OA, or publishing in full stop.
  • Universities should recommend average APCs for each discipline, so it would be possible for authors to see where they were paying over the odds.  And where academics went over the limit, they would have to make a case justifying this – thus allowing other options to be presented to them as well, rather than simply publishing irrespective of cost.
  • Fears over burning through APC funds at some institutions too quickly.  If universities came in under-costs, then this APC funding could be released back to the research funding streams.

Discussion

Some universities present admitted they were going to be as open as they could be about their funding received and expended, although the level of granularity would vary.  It was suggested that the RCUK would not be as transparent as individual universities would be in terms of funding levels.  The problem though was that the value quoted and the value paid for APCs could vary due to the time between submission and invoice, and fluctuations in publisher policy and currency exchange.  It was highlighted that some publishing work-flows drawn up by some universities locked academics in the route of publishing down more expensive gold routes rather than cheaper gold or green options as a matter of policy.  However, other universities countered this with a policy that went down the route of achieving funder policy satisfaction without necessarily taking the “easy” APC route.

Participants in the Open Access and Academia Round Table

Participants in the Open Access and Academia Round Table.

Embedding OA

  • OA should pervade academic thinking at all stages – from teaching 1st Year Undergrads, or information literacy training.  Awareness raising and education about scholarly dissemination should be embraced at every opportunity.
  • Getting the finance and research office staff educated about deals and options is key.  Researchers might only make the odd publication, but these people deal with the wider institutional publishing framework.  If they know about the options they might be able to help steer academics down an OA route that might otherwise be missed.
  • Better terminology – green and gold cited as needed to be gotten rid of as terms.  Not helped by publishers who it was suggested espoused different views or terms of what OA is, which added to the confusion of academics.
  • FOI suggested helps this process, as researchers might have to consider clarity in their work in advance of potential openness down the line.
  • Getting independent researchers, those based outside of universities aware and involved in OA was important.  Currently we lack a critical mass of OA contents, but in time this could be reached and help them to become aware of it.
  • Call for lobbying from academia to lobby for clarity from publishers and funders and attempt to use our influence on academics to help standardised the way they talk about OA.
Image of the whiteboard from the  Round Table session with discussion topics and key points listed.

The Round Table whiteboard with discussion topics and key points listed.

By the end had we reached utopia?  Sadly no, if anything I think we’d underscored the long way there is to go to achieve the final evolution of open scholarly discourse.  There are a lot of issues, but at least within the room there was a collegiality and positivity about working towards perhaps achieving some of the goals.  Were we to revisit this workshop next fringe – it would be indeed interesting post-REF to see what steps towards achieving some of these hopes, dreams and ideas had actually been made within the UK.

My thanks for our wonderful delegates for their thoughts, the Fringe organisers for giving us the space the run this session; and last but most certainly not least my co-workshop chair Dominic Tate without whom none of this would have been possible.

Share
Tagged with: , , , , , ,
Posted in Guest Blogs

Getting to the Repository of the Future – reflections

A week after the Getting to the Repository of the Future workshop, it is useful to reflect on what thoughts emerged from the event that we can take forward.  The workshop itself was very helpfully blogged by resident RepoFringe bloggers Rocio and Nancy, which captures many of the points raised.  There was also a follow-on round table discussion held the day after, from which additional ideas and suggestions emerged.  All contributions are being written up into a document to inform Jisc in their planning, but will also be openly reflected back to inform conversations back home within institutions and elsewhere.

By way of continuing the discussion online, I reflect here my own initial thoughts and conclusions from the discussion.  Feedback very welcome.

  • Repositories will become capable of dealing with content types according to their needs

Repositories have been established to manage many different types of material, with probably the largest focus being around research articles.  Nonetheless, with digital content collections of all sorts growing and needing better management, can repositories cope with this?  Discussion suggested that we have a technology available to us that can be used for a variety of use cases, and so can usefully be exploited in this way.  In doing so, though, it was recognised that we need to better understand what it means to manage different types of material so this exploitation can take place effectively and add to the value of the content.  As to type of repository, it should be recognised where materials benefit from being managed through specific repositories rather than a local repository, e.g., managing software code through GitHub or BitBucket, or holding datasets in specific data centres.  Overall message emerging: understand more how to deal with different types of content, be realistic about where they are best managed as part of this.

  • Repositories will move beyond being a store of PDFs to enable re-use to a greater extent

It was one very specific comment at the workshop that highlighted that many repositories are simply a store of PDF files (there was also a debate about whether repositories holding metadata are real repositories, but that’s another discussion).  PDF files can be re-usable if generated in the right way (i.e., are not just page images), but are never ideal.  Part of the added value that repositories can bring is facilitating re-use, and enabling the benefits that come from this.  To do this we need to move to a position where we can effectively either store non-PDF versions instead or alongside, or identify ways of storing non-PDF files by default.  The view expressed was that if we don’t address this we risk our repositories becoming silos of content with limited use.

  • Repositories will benefit greatly from linked data, but we need persistent identifiers to be better established and standardised

There is a chicken and egg aspect to this, as there is with a lot of linked data activity.  Content is exposed as linked data, but is not then consumed as much as might be anticipated, in part because the linked data doesn’t use recognised standards, and in particular standard identifiers, in its expression.  These weren’t used because there wasn’t enough activity within the community to inform a standard to use, or there are a number of different standards but a lack of an authoritative one.  One example is a standard list of organisational identifiers: there are a few in existence, but a need to bring these together, a task that Jisc is currently investigating.  Repositories could make use of linked data if the standards existed, but where is the impetus to create them?  An opposing view to this is that the standards pretty much do exist, it is more a matter of raising awareness of the options and opportunities in how these can effectively used within repositories, e.g., ORCID, which is now starting to gain traction, or the Library of Congress subject headings.  Whichever view you take, linked data screams ‘potential’, and there was little doubt that it will become part of the repository landscape in a far greater way than it does today.

  • Repositories will focus on holding material and preserving it, leaving all other functions to services built around the repository / Repositories will become invisibly integrated within user-facing services

At first site this theme appears to suggest that we reduce a repository, which seems to contradict the benefits that the previous statements suggest.  Discussion at the workshop, though, saw this more as getting repositories to play to their strengths; we need somewhere to store and preserve digital ‘stuff’, using a digital repository as the equivalent to print repositories.  Of course it can be held in a way that allows it to be exploited through other services, but should we not focus on what a repository does really well rather than become application managers as well?  Discuss.  In taking this line, we enable content to be made available from the repository (a ‘lake of content’ as expressed by one workshop attendee) wherever it is needed; do users need to know where it came from?  Issues of perceived value clearly raise their head here given the battles to establish repositories in the first place, and moving in the suggested direction will certainly require attention to this with budget-holders.  But for users this was felt to make sense.  One approach suggested was to consider repository as infrastructure rather than application, as this may change views of the support required.

  • Repositories will be challenged by other systems offering similar capability / Repositories will develop ways of demonstrating their impact

This theme was a natural follow-on to the previous one.  The debate about CRIS’s storing content, or VLEs for that matter, seems high on the agenda in affected institutions, and will no doubt continue.  This suggests a need for clarity in the role of each system, and an understanding of their respective benefit and impact for the institution in how they work together.  We cannot take repositories for granted, though the general perception at the workshop was that they have huge value (biased audience I know, but one with experience) and we need to continue identifying how we demonstrate that to best serve our institutional needs.

So, a full afternoon.  No blinding flashes of inspiration, perhaps, but some useful staging posts against which we can plot the future course of repositories in the next 2, 5, 10, etc years.  Repositories will only be what they are then because of what we choose to do now.

My main general takeaways from the workshop:

  • The role and need for a repository as a place to manage digital ‘stuff’ seems well accepted and here to stay

    but

  • There is a need for re-stating and defining the clarity of purpose for our individual repositories, and taking ownership/leadership in how they develop
  • No specific gaps were perceived – we know what we wish to achieve with repositories, we just need a way of doing it

    but

  • We need to clarify the barriers getting in the way and look at ways of overcoming them

What are your thoughts?  Or, indeed, what processes would work best to address these points (both institutionally and across the community)?

Share
Tagged with: , , ,
Posted in Guest Blogs

Repository Fringe Day Three: What happened, what the best hacks were (and who won the lego, again!)

This is our rather belated summary of Day Three (Friday 2nd August) of the Repository Fringe. It was a very busy final day of talks, pecha kuchas and the Developer Challenge and it has taken us the long weekend to catch up!

We started the day with a set of Short Presentations, each acting as an introduction or follow up to Round Tables in the programme. Chris Awre reminded us that a Hydra has one body and many heads (it’s the one bit of Greek mythology you can count on any repository person to know!) and asserted the importance of having an engaged and active community to support technological tools. Then Andrew Dorward and Pablo de Castro continued the community theme talking about the UK RepositoryNet+ progress in building out the socio-technical infrastructure for shared respository services. Finally Angus Whyte reviewed the current status and aspirations for the DCC‘s work in supporting repositories and librarians (and others) to support and embed research data management (RDM). Angus also highlighted recent research into the priorities of libraries – and how RDM and repositories fit into that – which sparked some lively questions around who drives strategy around repositories and RDM.

Sketch of Pablo de Castro at Repository Fringe

Scott Renton’s sketch of Pablo de Castro during the short presentation on UK RepositoryNet+

After coffee we moved into Round Tables. the all-important Developer Challenge judging process.

Pablo and Andrew followed up their short presentation with a round table on Shared Repository Services (UK RepositoryNet+ continues). The session was videoed and is split into two sessions on YouTube: Part 1 and Part 2.

Sarah Jones and Angus Whyte of the Digital Curation Centre led a round table discussion around the question “How can other stakeholders support repositories on research data?”, a very timely issue given the increasing expectation and research funder mandates around the sharing of publicly funded research.

And in our parallel strand developers and our judges gathered to view and discuss each of our Challenge entries. This proved to a be a really productive session for exchanging ideas and expertise. Once all five hacks had been shown and questions and chat about each had taken place it was time for a judgely huddle to form and make the difficult decisions over prizes…

Developers work on their hacks for the challenge

In the Developer Challenge area three hacks are tweaked and progressed.

After lunch we came back together again for a series of short presentations. Paul Walk talked about the RIOXX Metadata Application Profile and Vocabularies for OA (V4OA). Paul particularly focused on RIOXX, a pragmatic project addressing the challenge of improving the quality of metadata in institutional repositories which has resulted in guidance for repository managers and a supporting application profile, although RIOXX will be depreciated in favour of FundRef going forward.

Chris Keene was our next speaker who channeled The Vapours to bring us a talk entitled “I’m turning enterprisey (I really think so)” which outlined how Sussex Research Online had been adapted to be both the University of Sussex’s institutional repository and a suitable REF system – and what this has meant for the role of the repository and how it can develop: “managing a repository in 2013 feels really different”.

Without even a pause we moved into another fantastic set of Pecha Kuchas for which, yes, there was more lego at stake!

Scott Renton's Sketch of Peter Murray Rust's Pecha Kucha

Scott Renton’s Sketch of Peter Murray Rust’s Pecha Kucha

First up – after a little emergency remixing of speakers – was Peter Murray-Rust or, more accurately, his marvellous #animalgarden plush collective – looked at repositories for scientific data. The comic book slide format was a bit hit with the audience so a viewing of the video is highly recommended!

Next Matt Taylor talked about the embarrassing but common problem of small data sets and his work on RedFeather, a solution that is effectively a repository in a single server-side script. Matt’s unique way of presenting this work again makes this a must-see video.

Our third Pecha Kucha in this session came from Sebastian Paulcha who took us through the implementation of Durham E-Theses with a significant nod towards a theme of this week: the repository of the future.

A wee break for questions enabled us a pause to introduce Professor Les Carr. The new title was bestowed during Repository Fringe and we can’t think of more appropriate timing as Les is one of the originators of this marvellous unconference!

Sketch of Professor Les Carr

Professor Les Carr, sketched by Scott Renton.

Prof Les’ talk wove together unexpected elements from CERN to Robert Maxwell to tell the story of Open Access and developing attitudes towards openness and commercialisation. And ended his energised whistle-stop tour with the Finch Report-inspired provocation:

Whose side are you on?

Our final Pecha Kucha came from Scott Renton. Scott gave a “light bit of advertising” for the image collections at the University of Edinburgh, outlining how these were digitised, managed, stored, and made discoverable. Scott’s slides included his own sketches which you will also see scattered throughout the Repository Fringe 2013 Flickr group (see also the image of Les above) as Scott has been our dedicated RepoFringe live-sketcher!

The Pecha Kucha session concluded with questions and, of course voting. As we broke for voting and a coffee break one lovely participant came forward to ask about our networking/business card contest. Unfortunately she hadn’t been able to attend day one of the event but had baked her own fortune cookies! We will feature pictures of these shortly in a special post on all of the inventive networking ideas we saw across the event. In the meantime huge kudos for an exceptional idea and baking success!

After coffee it was time for some prizes. Nicola Osborne announced the Pecha Kucha prize winner, a landslide victory for Matt Taylor. This means that two Lego Calendars are heading to Southampton – thankfully they are highly interoperable so Matt and Pat should be able to create their own hacks – 14 day weeks perhaps – to celebrate their respective PK wins.

Sketch of Matt Taylor

Matt Taylor: It’s not the size of your dataset, it’s how you use it that counts.
(sketch by Scott Renton)

With the PK prize handed out Paul Walk took over MCing duties for our Developer Challenge Show and Tell. He opened by thanking all of those who had taken part in the challenge and to our judges who this year were Bea Alex, Stuart Lewis, Padmini Ray-Murray and Paul Walk. He announced our overall winner, Russell Boyat, whose hack addressed the preservation of MOOCs. Russell then gave a great presentation of his idea, which you can view here.

Prizes were also awarded to two runners up. Peter Murray-Rust and Cesare Bellini’s Images in Scientific Publications and Chris Gutteridge’s Images with Creative Commons Licences. Both hacks were riffs on the same essential idea, an indication of the highly collaborative nature of the Challenge. Finally Patrick McSweeney presented his Preservation Toolkit, and Richard Wincewicz presented his Metadata Creator hack.

Paul then led a discussion of the Developer Challenge. Notably this year three of our participants had specifically asked not to receive a prize, they were in the Challenge to collaborate, to try things. And they want others to take part too, to see new faces. Chris Gutteridge’s presentation had included the call to arms:

I want more young people snapping at my heels for these contests! More young people now!

The discussion raised lots of interesting concerns, and alternative possibilities and formats for future Developer Challenges. Take a look at the video to get a sense of these and take a look at our Developer Challenge blog post for more on this year’s Challenge entries and judging process.

We would really appreciate your comments, additions and perspectives to continue this debate. Please leave a comment here, on the Developer Challenge liveblog, note your comments when you fill in the Repository Fringe 2013 feedback form or email us.

The finale of the day, and RepositoryFringe 2013, was our Closing Keynote from Mark Hahnel. Peter Burnhill, Director of EDINA, introduced Mark with Robin Rice’s prescient prediction – ahead of the 2011 Repository Fringe – that “FigShare: could be the new data sharing killer app!”.

Sketch of Mark Hahnel

Mark Hahnel’s Keynote, sketched by Scott Renton

Mark’s talk was “PK-style” to reflect his Friday afternoon slot. Peppered with questions and asides his presentation provided both a fun update on FigShare (and it’s newest features) but it was also a broader call to arms for repositories to be more visual, more focused on the user. To really motivate people to share their data through stats, through control, through meeting their needs.

Mark also suggested that Repository Fringe 2014 could be quite a different beast:

Next year repositories will look very different. RDM plans say they have to. Funders say they have to.

The talk included a special slide just for Peter Murray-Rust. To view that, or any other part of the video of both Mark’s keynote and the lively Q&A, head on over to YouTube.

The event came to a close with a brief thank you and wrap up from Kevin Ashley, Director of the Digital Curation Centre, He reinforced that there will be a Repository Fringe next year and that the organising team really want your feedback to help ensure the event stays relevant and facilitate the community to meet, share ideas and share experiences in an enjoyable way.

Thank You!

As Chair of this year’s event, I’d like to add my own thanks here on the blog. Thank you to all of you that came or followed the event, particularly our speakers, our developer challenge participants and judges, our bloggers and twitterers and sketcher, and, of course, our fantastic sponsors and supporters without whom the event simply wouldn’t be possible. Finally a very special thank you to all of the lovely organising team behind Repository Fringe 2013.

What to expect next?

Over the next few weeks we will be following up Repository Fringe 2013 with a round up of blog posts on others’ blogs, guest posts here on the Repository Fringe blog, highlights in pictures and video, and some analysis of the #rfringe13 tweets. If there is anything you’d like to see added or updated just let us know and we’ll do our best!

Whilst we get those follow up posts and materials ready please complete our Feedback Form – let us know what you thought, what you’d like to see more of or less of next year, how we can change the shape of the Developer Challenge, whatever you’d like us to know: https://www.survey.ed.ac.uk/rfringe13/

 

Share
Posted in Uncategorized

Reflecting Back on the Fringe

I’ve written elsewhere about my hopes and expectations prior to attending the event as a whole, and you can read about the sessions in full on the Fringe Blog, should you wish to get a taste of the whole event.  Since there have been so many wonderful posts on the individual sessions I’m not going to duplicate the efforts, rather I’ll just share some key points I learned.

  • There is still a lot of very active development going on and around repositories – that hasn’t been totally subsumed by the REF and CRISes.
  • There is a real feeling of positivity engendered by people working in this sector.  They have very tough jobs, but they all seem to relish it.
  • A sung paper is a thing of joy and delight – more unusual presentations next time please (for the Gen Y and Z people at least!)
  • The fear of cocking up a REF submission is paramount for many repository managers.  The REF has given them a greater institutional value and prominence, but greater risks come with greater reward.
  • Symplectic isn’t a CRIS.  Better not tell my old bosses that, they’d be most upset.
  • SWORD works a treat to populate a repository from external sources.
  • Metadata is either a complete waste of time or the most critical element.  Honestly, I’m still not sure which way to jump on that one.
  • Few digital systems last longer than 15 years (except in the NHS) so planning for sustainability beyond that is a futile activity.
  • Nicola’s team at Edinburgh puts on an excellent conference and makes it look effortless.  Thanks and well done!

And my favourite quote from the whole event

  • Speaker “So, how long is your repository going to last?”
  • Audience member “Probably until the end of the REF.”

Not many months now to see how true that one is – will repositories suddenly dip off the radar in November or will the REF2020 help keep their light shining brightly?

 

Share
Tagged with: , , , ,
Posted in Guest Blogs

Developer Challenge: The Results

The winners of the developer challenge were announced during the Show & Tell Session just before the closing keynote.

The top price went to Russell Boyatt for his Preserving a MOOC toolkit. This idea fits very well with the preservation theme of this developer challenge. As Universities are putting more resources in deploying MOOC, it is very appropriate that capturing the social interaction generated by students and their tutors should become a priority in order to enable future analysis, feedback and validation of MOOCs. This hack was therefore very timely and inspired our judges. It provided them with a take home message – let’s do more to save MOOCs interaction data and we must do it now!

There were two runner-ups:

  • The Image Liberation Team made of Peter Murray-Rust and Cesare Bellini which overlays the license type on top of an image.
  • ePrints plug-in for image copyright by Chris Gutteridge which adds a license and copyright to a image in ePrints.

There were an additional two entries:

  • The Preservation Toolkit from Patrick McSweeney which provides a webservice for file format conversion.
  • The Metadata Creator from Richard Wincewicz which extracts the metadata embedded in PDF files.

These last four hacks are all about improving metadata, its quality and ease of capture. This give a strong signal as to what is a major concern for repositories and their users.

These were all very interesting and exciting hacks! It was a challenge in itself for the judges to reach a decision and award the prizes. They had to take into account the novelty, relevance, potential of the idea and balance it with the production of code during Repository Fringe. Not an easy task! Thanks again to our four judges, Paul Walk, Bea Alex, Stuart Lewis and Padmini Ray-Murray, for their excellent job!

What struck me most during the 24 hours of the challenge is that most developers were happy to enter a hack but didn’t want to win!  Maybe it was the lack of time to dedicate to the coding due to the Repository Fringe sessions running in parallel, ‘it’s not fully working yet‘. Maybe it was that some of them had won previous challenges, ‘been there, done it and got the T-shirt‘. Maybe it was the lack of a new generation of coders to compete with, ‘where is the new blood?‘. Maybe prizes are not the main motivation.

The feeling was that the challenge should come from the questions to be answered rather than the competition with other developers. There was a demand for a different type of event where developers could work together to solve problems that would be set as goals. This would provide a chance for developers to collaborate, learn from each others and code solutions to important and current issues. The opportunity to learn and demonstrate theirs skills seem more valuable to the developers than a prize money. It is more important to have fun, meet other people and build a developer community. Back to basics! I couldn’t agree more.

 

Share
Tagged with: , ,
Posted in Developer Challenge, Guest Blogs

LiveBlog: Closing Keynote

Peter Burnhill, Director of EDINA is introducing our closing keynote, something of a Repository Fringe frequent flyer. But he is also announcing that this year is the 30th birthday of the University of Edinburgh Data Library. There was a need for social scientists to store data and work with it. That has come a long way since. And we now face questions like curation, access, etc. Back to my first duty here… I had an email from Robin Rice in 2011 “we like FigShare” and wrote to the organising list “FigShare: could be the new data sharing killer app!” a bit of an understatement there. So, let’s find out what’s happened in the last two days. So, over to Mark!

Mark Hahnel – FigShare

So I am doing this PK-style as it’s Friday afternoon and we have people on stilts going past! Here we have people from institutions, from libraries. I’m not. We have different ideas so I want your ideas and feedback!

So I’m going to talk about open and closed… We’ll see where we get.

So FigShare let’s you upload your research. Yo can manage your research in the cloud. This has evolved since 2011. We can’t ignore why not all data can be open… So we have a private side now. Our core goal is still being Discoverable, Sharable (social media), Citable (DOI). discoverable is tricky!

We are hosted on Amazon web services, we are ORCID launch partner (only one with non article data I think), we are on a COPE (committee on publication ethics), we are getting DOIs from DataCite AND we are backed up in LOCKSS.

We wanted dissemination of content on the internet – its a solved issue. Instead of going backwards… Let’s see how we go forward by copying this stuff. In common these services like flickr, sound loud etc. visualise content in the browser – you don’t have to download to use.

So live demo number 1. So we have a poster here. Content on the left. Author there. Simple metadata, DOI, and social media shares. We’ve just added embedding – upload content to FigShare and use on your own site. So datasets are custom built in the browser – want to see your 2GB file before you download. You shouldn’t even be downloading, should all be on the web and will be. Ad we have author profiles. With stats including sharing stats. That is motivating. That rewards sharing. Think about who is involved in research. E try to do the other side of incentives action here too! Metrics are good. So is doing something cool with it. So for instance here is a blogpost with a CSV and a graph. So we have a PNG of the data… You can’t interact. But the CSV let’s you create new interactive charts. And we also added in ways to filter data.

We are also looking at incentivising to give back – doing research like an instant T test. Moving towards the idea of interactive research. But this is something that allows you to make research more interactive.

Q – Pat McSweeney) is this live or forthcoming?

It’s live but manually done. A use case for groups that use FigShare the most, that need special interaction for journals.

We are a commercial company but you can upload data for free. We work with publishers. We visualise content really well. So this is additional materials for PLoS, these are all just here on afigShare – theres a video? Play it! It’s how the internet work! Don’t download! We do his for publishers. Another thing we created for a publisher is that click open a graph, you get a Dataset. A researcher asked for it, we built it!

So, back off the internet…

So discoverable. What does that mean? google finds us but… Well is it hearsay? So DataCite started tracking our DOIs. For three months we were 8 our of top ten, then 7 out of top ten, then 9 out of to ten for traffic. So hey, we are discoverable!

But the future of repositories… Who cares?

So who takes ownership of this problem now – funders, stakeholders, or academics? I think it’s institutions and more specifically librarians. librarians are badass. They have taken ownership. They lead change, they try new things.

but the funders? Funders are really reacting to the fact that they want their data – it may be about what researchers want to reuse but really it’s about the impact of their spending. But they are owning that problem. NSF requires sharing with other researchers, similarly humanities. The EU are also talking about this – but not owning the problem, just declaring it really.

So looking across funders… Some have policies… Some stipulations… wellcome Trust withhold 10% of cash if you do not share data. That will make a difference. But what do you do with that data?

What about academics? Well they share data! I generated 9GB a year – probably in middle of the curve in terms of scale – in my PhD. So globally 3PB/year ish. But how much of my PhD is available? A few KB of data. My PhD is under embargo until later in the year, but it will be there.

I felt there were moral and ethical obligations. Sharing detailed research data is associated with increased citation. Simplicity matters, visualisation is cool. I thought it was about an ego trip, academics have to disambiguate themselves…

Now two years after leaving I was asked t come back ion and print excel files for my data for a publication… I generated this without a research data plan. Two years after I left my boss thinks I still work for her. She will hand the next guy working for her… What does he do, copy them back in?

So there is so much more here. It is not just open or closed, it is about control. It’s the Cory Doctorrow thing, the further you are from a problem, the more data you’ll give up, the Facebook issue. you do want control, it matters.

So what motivates academics? Being easy, being useful, and what do funders what – we will jump through hoops for them.

So back to the web… My profile has new different stuff but you’ll see sharing folders – group projects and discussions, ways to reshare that data. Nudge your sharing. But you need the file uploaded now to share two years later. You can share otherwise closed things with colleagues, regardless of institution.

Btw on this slide we have our designers idea of an institutional library – looks a lot like a prison.

So back to those libraries. How much data does an institution generate? Very few know this, how do you assess. Right now we are doing stuff for PLoS we let them browse all their stuff. They can see what they produced. And this aggregation is great for SEO too. Makes it easy to Google then find the research article from there. So from this aggregation we can filter top most viewed, to particular titles. Essentially this is a repository of research outputs, we take all formats. You can imagine that this could be there for any institution. And this has an API.

Institutions also want stats. See where traffic is from. Not just location but institutional IP ranges. So we can show where that item has impact, where viewers come from. But, at the same time populating repositories is hard. But we have data from Nature from PLoS. We can hand that data back to your repositories. We can find the association with the institution.

So it’s about control. It’s Research Data Management as well as Research Output Dissemination all in one.

So we have launched FigShare for institutions. We have heard concerns about metadata standards and how much metadata we have, so Henry Winlaker used our API to build a way to add more metadata to fit institutional needs. So if you share responsibility… Well what’s the point of the institutional repository? I would say that I think IRs are about to move fast. They have to, it was idealistic but now it’s mandated! Next year repositories will look very different. RDM plans say they have to. Funders say they have to.

This community is amazing! resourceSync is great, I want to use it! PMRs Dev challenge idea is great. We are commercial but we can work together!

Do we need to go back further? People use Dropbox, drag files in. We have a desktop app too. But maybe whenever you save a file maybe you need to upload it then. So at projects.ac there is a project. A filesystem that nudges you to add metadata and do things as you are reqArded to do them. You can star things, it does version control. Digital science created this. It’s kind of like it can do so much more. So releasing it to see what’s needed. What’s really cool… You can download this now… If you press save now it saves it to FigShare. That sync would be ideal. Trying it out now. I work in the same office but there is no reason why these shouldn’t all be connected up to IRs to FigShare to all of these things…

And this is a slide specially for Peter Murray-Rust…

I know that openness is brilliant! But it’s also great to work with publishers. More files were made available for free, for academics, that’s great. Everything publicly available will ONLY be by CC0 and CC-BY. SHARE ALL THE DATA.

Q&A

Q1 – Paul) what is the business model?

A1) for PloS it’s about visualisations and data. They lay us to do that. They have a business model for that. And FigShare for Institutions is coming that’s also part of the model

Q2 – Peter MR) I trust you completely but I do not trust Elsevier or Google… Etc. so you have to build organisational DNA to prevent you becoming evil. If you left or died what would happen to FigShare, yo see the point?

A2) I see that. But this is aimed at this costs us money. E sell to institutions but there are economies of scale. Two institutions have built their own data repositories and they cost £1million and £2million. Thats a lot of money.

Q2) Mendeley have a copy of all the published scientific data these days. FigShare will have massive value of data in it, huge worth, institutions may want to know what staff are doing, t spy on the,. You have something of vast power, vast potential value. The time is now to create governance structure to address that.

peter Burnhill) there are some fundamental trust issues

Mark Hahnel) you can trust the internet to an extent. Make stuff available and it proliferates but you can reuse, you can sell it on etc.

Peter Burnhill) next year we need a discussion of ethics

Q3 – Kevin Ashley) FigShare for institutions. can you say anything about the background consultation around that. A contract is very different to free stuff

A3) sure, legally we have a lot of responsibility. Eve been working with universities, individual ones, to see what the needs are. We spoke to lots of people. Mainly in London but to see we didn’t tread on toes, we didn’t risk their research leaking out. We spoke to institutions more globally. Digital science is a good thing, this is where they come in.

Peter Burnhill) I am a member of the CLOCKSS brand. There is contract between all publishers that CLOCKSS ingests everything they make available and it says that if a failure to deliver happens – for whatever reason – then CLOCKSS have the right to make that data available via platforms (one here at EDINA, one at Stanford) so in terms of assurance that what comes in goes out, joining CLOCKSS does that. The agreement is supra government. You give up that right there that it will remain available.

Mark: absolutely. And all data is available via the API if you want to.

Final Wrap Up – Kevin Ashley

Thanks you to mark for a great final session. So, at an event like this we come here to share ideas, we come to share experience, we look for answers, we come to meet people and to make new connections. We come to learn. We may come with one or many objectives. We at the DCC certainly have been able to. Many of you are new here.

I have learnt lots of stuff. A few things stuck. A whole room of experts can’t put an object into an EPrints repository, there’s a lesson there somewhere about interfaces. And the other interesting idea I picked up from les Carr. Maintaining open access and having a business plan for what we do. So the Dcc how to set up RDM licenses are free but limited edition leather bound copies to come – great idea Les!

I hope all of you did one or several of those things then share, tell us, this is an unconference! We want to keep making this event better every year. We see the event as being about you, about facilitating you to meet and connect.

There will be a Repository Fringe next year. One reason for that is that we have fantastic sponsors. All of whom put into this event. And hopefully we can extend that further next year. But thank you also to session chairs, the speakers, and to the organising committee here. I know how much work goes into this. And a great deal happens and happens smoothly because of that work.

Two people to thank specifically. Florance Kennedy of the DCC and our chair Nicola Osborne!

Share
Tagged with: ,
Posted in LiveBlog
Repository Fringe 2013 logo

Latest tweets

Repository Fringe 2013 is organised by:

The Digital Curation Centre

EDINA

The University of Edinburgh