Repositories for Scientific Research Data – Peter Murray Rust
We spend huge money, millions, looking at phyllogenetic trees, huge amounts of data gathered but most is thrown away. Little work done to solve this but ANDS in Australia has done some great work and we have a model in Crystallography – you have to have your data to be published. So you have data as part of publications. A similar problem is computational systems. There is a lack of awareness of DataCite and of the concept of sharing their data really. Progress? Well Obama is on board…
So back to that tree of life. We need repositories for particular domains. I like FigShare. I like CKAN. Anything open. But compared with IR there are very few domain repositories but I want to index absolutely all species, places and dates in biosis, thanks to JISC for supporting this.
And hat tip to OKFN here too.
But we need to make repositories for scientific data. Jisc have clout to make domain repositories happen!
Matt Taylor – Small Dataset Support
I am wanting to address people with universities with an embarrassing problem. I speak of course of those with small datasets. We inadvertently emphasise huge data sets. But small datasets matter. I know what a lot of you are thinking is “why are you talking to me about this”. Perhaps you have a friend or coworker who is a bit quiet, who has unconventional metadata needs. But these unloved academics need a supportive hand. Not out of pity but as genuine recognition of their problem.
Red feather is a Jisc RI project design to afford a repository-like experience but in miniature. They can be trivially installed on any computer via a simple pho script. It has a simplified interface and workflow. Lea. Fr those inexperienced and intimidated by depositing their materials. There is support for audio, visual, PDF, documents, etc. and social media tools lets you spread enticing rumours about your research. And rdf, Json, allows you to spread machine readable versions of your work to the world. And you can customise, reskin, and make it work for you. As used by data.ac.uk for instance. red feather Allows those poorly endowed with data to make the maximise it’s use.
It’s not the size of your dataset that counts, it’s how you use it!
Sebastian Palucha – Implementing Durham E-These
We started with an out of the box EPrints but we wanted to make it as simple as possible to use and to highly customise it with LaTex. We added google analytics for full text. And we wanted to interoperable with EThOS. And the British library does digitisation services for us so we have changed the model a bit. We store EThos persistent IDs. There were some UTF-8 issues. Rather than update EPrints we used XML fix. And a student question raised creative commons licensing as an issue for us to work out. So we have made it clear to the user how to use it. We use google custom search to look across repositories.
We have a retrospective digitisation project on the go at the moment, lots of materials to work out. And we need to ensure we comply with EU Cookies law. And the there is the repository vs real life – users want to do bulk upload. Some try to send encrypted PDF. And now we need to look to sustainability as well.
We have plans to review our processes, to connect our CRIS. To engage with the repository of the future as a concept as well.
Q1 – Kevin) for Peter and Matt: similar problem from difference ends of the size spectrum. Very different solutions. Discuss!
A1 – Matt) my motivation was how can you make a repository like system that is super simple. I do teaching and learning repositories on the whole. They have much lower size and detail requirements than many other repositories. Often when working with EPrints I need a simplified solution. That was the idea of redfeather
A1 – Peter) if we can get people using this stuff on their own machines then that’s great. The people who can distribute are Apple and Google etc. we need to lead or they will. It’s right to out it on our own systems… Get it on the iPhone etc. that’s right but will it happen. But when we talk long tail data… I think you need a few repositories for specialist domains in order to get it to the community.
Q2) so your idea is for those without a repository
A2) the idea of red feather is that is it for those without a repository.
Open Access: Hegemonic and Subaltern – Les Carr
This came out at 45 mins when I practised it. This should be interesting!
When we think libraries we think stacks etc. universities are huge and old, conservative and sustainable – they predate the states they are located in. Very unlike many research contexts. Ten years ago the web escaped from CERN through physics departments, research, banking. People adopted stuff for open physics collaboration.
This wasnt the first time someone tried this… In the early twentieth century there was a phone based library card supported idea that was basically google but offline! HG Wells had the idea of microfiche allowing a collection of all research in the world. Vladimir Busch tried to the same with hypertext. We eventually got to one, escaped from CERN, inflated to the world with ideas like openness. Not worrying about identity, IP, theft, not issues for academia…
And ten years later… Open Access comes along, defined in Hungary. The idea of opening the door to knowledge, science, data, educational resources, government data, etc. but it doesn’t suit everyone. Not everyone is a physicists! We have commercial interests and we have the academy! The are genuine interests and tensions.
Robert maxwell was one of the first to think about commercial benefit from open materials. A damaging stuff. We publish facts in journals, we need rules, evaluation, the web changes our practice and our use of science. If we shackle that stuff because of commercial interests it spoils everything.
Our mission statement at University of Southampton does talk about benefitting the world. But we have come to the wrong conclusion with Finch etc. we have to come to this aspect of saying, whose side are you on?
Scott Renton – Images at UoE
This is light advertising! I work with special collections a the library. Will talk about what we do and what’s coming! We have CRC collections, largely prints. We have photography as our first stage, we have a huge fancy camera. We grab the images and feed into LUNA – uses very high JPEG2000 compression. Workflow is complex. We use DAMS collection management system. All images go into the collection. So much born digital stuff yo can’t apply it in the way you’d want. Metadata is provided by photographers, have cataloging in LUNA, looking to crowd source that.
Discovery and publicity of images really matters. One way to do this is using OAI to do this. Used Europeana Project with this, in MEMO project. We also have a Flickr presence connected up via the API. And there is a BookReader Object interface to display scans – low res images linked t high res version. We are part of LUNA Commons and would like UK LUNA Commons too.
Next up will be ordering systems. Digital preservation – we embed metadata into Tiffs. Our DSpace collections have Skylight as an interface. We want some so of ecommerce, we have built a check out system to sit with images. The next version of LUNA will be more scalable, faster, and will be web based which will be good.
Better visualisations would be great, more mass digitisation, and interoperability with Voyager.
All links are in the end slide here – take a look.
Q1 -Robin Rice) commercialisation came up in both your talks, was also raised in DCC Round Table…
A1 – Les) I have no problem with publishers making piles of money out of universities. Nor do I have a problem with never sites making money. But I don’t like barriers. Adding value is great. Rtificially creating scarcity is not OK. My big beef is not not affording literature, it’s not being able to datamine that data. So the Cory Doctorrow model – free but with enhanced versions for money too.
Q2 – Ribin Rice) You mentioned high res image sales, is there a business model
A2 – Scott) we had to grapple with creative commons issues for Flickr. E share low res images there.. It shifts regularly where these business models are at… No simple solution.