Paul Walk begins by thanking all of the Repository Fringe 2013 Developer Challenge sponsors. And to our judges for the Developer Challenge.
I want to thank our entrants for the Developer Challenge. We found some ideas exchanged even over the last few days. Which is great. Also interesting many of our entrants did not want to win a prize. Three of our developers said that! It gives you a sense of what doesn’t motivate people! I didn’t think that people would be content with me issuing them. We discussed the entrants this morning and took a majority vote.
The winning entry was from Russel Boyatt! Congratulations! There were four judges and we have two runners up! The first runner up is Peter Murray-Rust and Cesare Bellinni share that prize. And our other runner up is Chris Gutteridge.
As you will see from the presentations the winner was very timely, very much about preservation, and a real call to action for something they need to do. The two runners up essentially one of those projects has a very strong idea and had implemented it with executable code, the other built on top of it and integrated it into a repository. It was fantastic to have those feed into each other. This morning we had a session where all teams exchanged ideas, asked questions, offered feedback and it was really collaborative.
Russell Boyatt – Preserving a MOOC
I am interested in MOOCs. I am interested in these and there are a growing number of providers – Coursera, FutureLearn, etc. These are Massive Open Online Courses. What happens in these there are a *large* number of students taking part in the course. And there is a huge amount of social media activity around these courses. A huge amount of activity around these courses outside of these platforms. Wouldn’t it be great if we could capture and preserve that?
Why do that? Well it’s an institutional archive, you want to see what all that expenditure represents, you want to reuse, to learn from it, to see how it relates to on campus activity. Students want to look back at that data, see what they have done. And there is future research, MOOCs are new and we will have questions in five years where we will need to look back over that work so we have to collect it now. Also our current repository is assembling content for the MOOC. So the MOOC preservation toolkit I started to build it that I will reach into these platforms – particularly Moodle, EdX and OpenMOOC as they are open source – to gather material, alongside gathering social media interactions, the tweets, the discussions, everything. And then look at our own repository. And package that up together and push it into a repository using SWORD.
For social media we can use Twitter streaming API – and I’ve helpfully been pointed at Southampton EPrints Twitter tool. And I’ve been involved in blog preservation project which solves that gathering issue So we can preserve that content. We can ensure that learning resources developer for a MOOC can be captured and stored in a repository. And we can use this internally and externally to the university. And the MOOC materials could be route to collecting learning resources in a form suitable for an OER. And then we have a representation of that activity over time.
So I have started building a MOOC Preservation Toolkit. I will then be able to extract discussions from that MOOC then I can transfer that and pull it out as an XML and push it to a repository with SWORD – will try and get that working this weekend.
I haven’t done this as a tool, it might be useful. This is a call to action. There is huge activity around MOOCs right now. There is huge content in repositories being used in MOOCS and we want to know how that is being used and interacted with.
Q1) Activity data
A1) Yes, thinking about that, need to do more thinking.
Peter Murray Rust and Cesare Bellini – Images in Scientific Publication
If you use an image – here’s one on FigShare – you can claim it as your own, how can you prove it’s yours? Springer stole an image of mine! So this bothers me a lot! How do we make images proofed against copy fraud? How could we stamp that image in an unremovable way that would be immediately obvious to humans. I have blogged it already – read it there. So you take an image, you take CC-BY. You overlay one on top of another. No-one can remove the CC-BY without destroying the image. That would be hugely useful to have on ALL images. Cesare is implementing this on a server – and we hope to implement it on a server. And thanks to Chris for developing this further. Mark from Ubiquity Press likes this, I’m going to approach all the major open access publishers.
Q1) What about a water mark for video or visualisations?
A1) That might be worthwhile.
A1 – Chris G) And should be doable by reusing VLC.
Chris Gutteridge – Images with Creative Commons Licenses
I want more young people snapping at my heels for these contests! More young people now!
Anyway what I have done is I have stolen Peter’s idea but made some other tweaks. So I take an image. I say who took the image, and the creative commons data is added to the image – the CC BY etc. on top of the image, as well as a proper attribution on the side to show whose image it is. And because we have the date on license and attribution, we have it to hand and are munging the image. JPEG and PDF files have a space for metadata in the file format. JPEG have an attribution field. So we have added license to EXIF for the image.
Q1) When will it be on the EPrints training course?!
Patrick McSweeney – Preservation Toolkit
So yesterday I learned that format preservation isn’t really an issue which upended this a little! But this is what I did… I made use of some EPrints functionality built by Dave Tarrant. The idea is that you take a file, you convert the file format. Then you can open it up. So that the word document becomes easy to open as a different file. But I’ve swapped one preservation risk for another. BUT you can convert your document to native raw HTML requiring no special tools at all. It’s kind of cool. The HTML is a bit ropey. The purpose of the service was two fold. EdShare, my baby, does this stuff server side. You could also use the tool to create zip file of images of your document, PDF, PPT etc.
Richard Wincewicz – Metadata Creator
I thought of this idea on the way home last night and spent three hours coding so don’t get your expectations up too much! Yesterday there was much talk of lacking metadata but I thought, well there is loads of metadata in a document if you know how to find it! So this solution would allow you to upload a document and do text mining etc. to pull out the metadata. All good then I spoke to Chris Gutteridge who showed me something built years back… but in any case… here is what I built!
So, here is a PDF, it pulls out metadata and pulls out XML. You would use this as a web service basically so you wouldn’t see all these fields in their raw form here, you’d see a user interface. In cases where you have lots of files with no data this kind of metadata gives you a starting point.
Q1 – Balviar) Do you know something about the NZ preservation metadata extractor
A1) Yes it’s built into ApacheTika which is an umbrealla for lots of libraries. This is essentially a front end for that.
Q2 – Chris) The great idea here is the generic API that gives you structured information. So that old hack shouldn’t put you off… this is a neat idea.
Paul Walk: I noted that four of five of the entries were about metadata, addressing the lack of metadata. Maybe we need to have a bigger crack at that metadata travelling with the object idea. Chris and Peter’s idea is an extreme version but the idea of putting it in the EXIF is nice. And this is a new emphasis if not a new idea.
Peter Burnhill: I recall when people were focusing on metadata and we had the catalogue record… the arguement went that metadata should be separate but if embedded you have it in there, you can then extract if you want to. Also for images you often want to know where associated objects are. Some documentation should be intrinsic to the object. In extreme cases one wants to find something related to enhance the object in creation.
Paul Walk: Now for something not in the plan…. Chris talked about the lack of new young developers showing up the old fart that he is now. I wanted to put it as a question. Is it something we as a community can do something about. It’s been an important part of the Repository Fringe and other events. It has really helped build relationships. So many of our developers didn’t want a prize that perhaps challenges are no longer the right phrase here… I saw collaboration, cooperation and maybe next year we reframe it.
Pat: The Developer Challenge can be a challenge still, just not a competition.
Paul Walk: I think that’s a really good idea. A solution to
Peter M-R: Hackathons are the modern approach. My idea stemmed from the Hack for Ac…
Dave Tarrant: Hacks are old ideas too. What we lost this year is training for developers. Dev8D didn’t run this year. That training is important. Having that critical mass, exchanging ideas, that’s what’s so important. We have to think about how to do that again. So that vision and hacks get realised. So many developments from challenges is part of day to day work. We should embrace that.
Chris Gutteridge: The reason I talk about young developers… I’ve seen giant prizes drive people purely for the cash. It can defeat the point. You really want to go back to your work and show that your work is great. For us we have won a lot of these we have been inspired and validated but we want new people helped up and validated.
Paul: That was the motivation for the prizes. I have examples of developers who were taken more seriously because of that prize
Russell: That feedback in the session this morning was brilliant. Chris’ comment has changed what I do for the better. Do that. Collaborate more. That’s the value.
Claire: That has to be every day, perhaps not at these events. So like plugins etc. Is that a good idea, is it cool, has it be done. How do you keep up?!
Dave: You couldn’t have a better place to reach that community frankly. The friendships are built and last beyond the challenge.
Peter B: Last year when we had OR2012 we had concern that the big show was coming to town… we blended Repository Fringe in again. I’m delighted that we’ve come out the other side. There is a commitment to that. You alluded that there is an intrinsic value to this sort of “mixed collar” event. So if there is a particular problem, so those coming through compsci schools we may have an obligation to work that out for next year.
Paul: And finally a big thank you Muriel Mewissen and Nicola Osborne who did most of the heavy lifting to organise the Developers Challenge.