We are just getting started at Repository Fringe 2013. Nicola Osborne, Chair of this year’s event, has given a brief welcome and introduced Stuart Lewis to give the main welcome to the event.

Welcome from Stuart Lewis, Deputy Director of Library & University Collections and Head of Research and Learning Services, University of Edinburgh

What a difference a year makes. Last year we hosted Open Repositories 2012, a huge event, but what’s happened since? The Finch report had just come out, now huge amounts of money have been put forward to Gold open access. Even at Edinburgh University we have been given over £1 million by RCUK (from wider pots of funding for the sector) for Gold Open Access. But even more has been going into Green open access with prompts from the Wellcome Trust etc. And publishing is beginning to come up, we had a great Open Journal System workshop yesterday and really interesting discussions of the university as publisher.

Thinking back further our own repository is ten years old now. Others are a similar vintage. That gives space for reflection, many things have worked well but many think there is far greater room to make the best use of them. We had a great session yesterday on the Repository of the Future and lots of interesting ideas there.

We are here because, ultimately, we know that things could be better and we want to improve them. It is an unconference, have fun, learn, share and my hope is that we will not only do great stuff here but also go home and get things done and make a difference there. If for no other reason than next year we’ll all have new things to share at Repository Fringe 2014.

Welcome also to Jacqui Taylor who we are very privileged to have here today. She is Chief Executive of a company called Flying Binary but has had a very diverse professional background from engineering to setting up BACS to her current work with the UK Cabinet Office.

Opening Keynote: Curating the Future – Jacqui Taylor, Co-Founder and CEO of FlyingBinary 

I am really delighted to be here today. We are a web science company founded in 2009. As Stuart alluded to I did help establish the BACS system and I still work in the banking industry. But we are a web science company and we are an Industry Partner to the DTC and work with

We use the science of analytics to share what we do. I am a Tech City Mentor, and I’m based in Europe’s largest start up space. We are also an Open Data supporting business, and we were delighted when we were named as one of those.

So I have three themes I want to explore, Nicola told me that this was an unconference so I’ve lots of questions for you.

One major theme for us is Privacy. The way in which we look at this over the last 20 years has changed radically. We have work around the ethics and trust issues around sharing. This is linked but not the same. As technologists with our web platforms we have our own agenda. But the distance between consumer and service creates a changed perspective. For instance people wouldn’t think twice about sharing medical details with a doctor – or someone who looks like one in a white coat in a hospital – but they are very concerned about sharing that data online.

And Privacy Over Time… this is an interesting… we work mainly in the Y Generation, the Millennials, the 1982 generation. In the post war period everyone knew everyone – my parents really couldn’t do anything without being observed. But the Baby Boomers have lived in a different world, privacy has become important. These Gen Xers are very much an auditory generation. But Gen Y have been born into a proporous society, its no surprise that they are implicitly more open and privacy is different. Collaboration feels natural to them. We try to also get across that this generation are also very visual. For us the science of visualisation and how we communicate and maximise the message is coming from that place. And then we have Gen Z. Gen Y is our web generation, they expect to consume online. Gen Z are more entrepeunerial about the web. It’s also socially, about communities being able to. Gen Z’s view of privacy is different again. Privacy versus trust is totally different. They decide using their own network to establish trust. We are starting to see Gen Z coming into Tech City already. Gen Z are very much a kinesthetic generation. So the iPad for instance is brilliant for this generation. If you see the YouTube video “the magazine doesn’t work” a child is happy with a tablet but handed a magazine they try to flip and interact and it doesnt work. So as repository managers we really have to think about that kinesthetic aspect…

Is data quality key? No! There is not a one size fit all approach for us. It’s about data purpose – what do we want to do with it. What we deploy first may not be what we want next time, we can change it. We can manipulate data to obscure quality issues if that creates a useful data set. So for instance I work in the NHS, working nationally on their data sets, on the services to be delivered. Hospital data sets are key, they set national standards. But if you look at Trust level you find pregnant men! So you have to be able to be pragmatic with data and understand the quality you are dealing with. If you can’t improve the data or it’s provenance you have to think about whether that limits what you can do. So in the NHS using raw data isn’t right, not just for privacy but also because it is not consistent across trusts.

Aggregation can be an answer – aggregating data sets can solve some of the issues of a single set of data. We’ve done work with the Guardian. And aggregating data means you provide it, you can be the authoritative source. That’s really interesting.

But we can also use the techniques of web science approaches, such as analysis, to determine quality etc. Twitter for instance is seen as a great source but it has a real time lag, it’s hard to determine trust and provenance properly.

Heterogenous data is an issue. If you just share data, how can you determine trust. If you publish, share, and work with data that’s a great way to establish trust. Often data released in Brasil might be checked against Guardian data platform to see what has been said about it. It has become a trusted source about data. That’s very different to a few years back.

So here we have the Data Never Sleeps slide. This is a trusted source, this visualisation, but this isn’t reliable. We have to put trusted data out there because without it this sort of thing becomes a trusted source!

I talked about our Guardian work. We did a project with Stanford and them called the 99% project. We wanted to think about open data, how could we make that accessible and meaningful for real people. We are the 1% in the know but how could we engage the 99%, what would it take? What does that mean?

So here we have Sub-Saharan Africa Key Data – we looked at data around for this area, looked at the data, looked for the story, looked for the evidence. We actually decided to (optionally – via a button) exclude South Africa as it swayed the stats so much. This was for a G20 summit. We really got from visualising the data we really understood that this is a continent, we have to really think about the evidence – and how evidence can drive policy.

This piece of work looked at Global Military Spending across the continent. This was on arms trading data. This is kinesthetic stuff. You can show this  as a big front page but then you can dig. Sharing it allowed others to comment, correct, critique, to add to the debate.

And this recent Open Data Youth Group project – the group was formed a year ago to look at the roadmap for open data. We are now very much supply led – here is the data (8000 data sets) and we asked that we find out who the data is, what the benefits to using that data would be, what would the barriers to using that data would be. Most people who have requested data this way are individuals (rather than organisations). There was a perception that quality was the big barrier but this doesn’t seem to be the case. And we also asked whether or not this data request was private – like an open access mystery shopper. Most of those data providers have the data but weren’t sharing it. The barrier in some cases was it was paid for – and that’s hard to get round. But licensing, especially downstream licensing are so complex – you can use it but you can’t do anything with it. So we’ve been working to debunk the myths. Just seeing the process has been useful for many of those making these requests.

At the Open Data Summit an Open Data Charter was signed by the G8. They have agreed to open by default their data stores. They have agreed to a quality and quantity principle. To release early and see if it’s right for purpose. And to make data open whenever possible. And we will bring provider and users together. And, a key thing, here is that they will release it for innovation. So we have user groups we are setting up. we work with Southampton, Oxford and Cambridge, and I know that academia and research needs to be in here. We are working with the UK Cabinet Office to get data out there, with a refreshed platform – The National Information Infrastructure. We want research data in that mix. It’s being led by Business Innovation and Skills. And we want people to give us feedback, to look at our beta platform, in addition to being able to be part of the Open Data User Group – and we will tweet links to that application – we also want that feedback from this community. This community is key. I see the repository and research part of this being right at the front of what’s taking place. Lets make sure that we have people with the knowledge and domain specialism to bring here.

When I talk about open data people see risks. But the G8 have signed that charter to show their commitment. We don’t have a way to stop accidental or deliberate misuse of the data but we want to get the right parties engaged to get to that place. We want to make sure there is quantity, quality but also with knowledge of the impact and how we manage that – anonimising or pseudonimising data for research perhaps.

And there are social considerations for us – and for you and in repositories – we grapple with those risks of the social impact. In Cambridge the genetic researchers grapple with those social impact considerations. I would ask what is the repository approach to the social impact of what we do and what that means.

Our approach is to work together no matter what discipline. We tend to take a “shared services” approach. We’ll have data scientists, web scientists, CFO, operations people, we’ll blend the skills for a piece of work. We tend to call those SWAT teams, federating people to the problem in very agile and collaborative ways. It’s very true that in a lot of contexts it’s about the community first. Community is the forefront of work in Tech City. We are doing some work with flooding at the moment, that’s so much about how that community can engage, what motivates them, and how to build trust. All that trust and privacy around the data is where we come in. We build the tools and platforms to solve the problem in an interoperable way – often tailoring generic stuff for specific contexts.

And another aspect I wanted to draw on was the Data Protection Act. It has been part of the barriers in some case. We almost didnt get a banking service live four years ago because lawyers wouldn’t sign the contract. The law is offline law. In our online web enabled society it’s not quite right. It can be used as a barrier. As we build out our web world, the law has to be built out. So consider the legal context of the work, and also I would ask that you consider where the law is a barrier. I have a group of what I call “agile laywers” who are really considering where the law needs to move. I would encourage you to look at the legal implications of your research and bring that to the debate.

Does technology matter? Well what we do has to be technology agnostic. From a cloud engineering point of view we are looking at the private cloud deployment with some open data. Really we should be considering public cloud. In some parts of the world community cloud is taking on, and we should be doing that too.

I talked about the 99% project, when we were deploying to the cloud. We didn’t know if people would care. We saw a million people interacting very quickly. Within 2 hours of visualising Wikileaks data 34 million people had interacted. If it is compelling, if it tells a story, people really want to engage.

So in the future… as fibre is laid. As mobile first is the internet access. We’ll get another 5 billion more voices online. We will hear more views and perspectives. We’ll get more languages, more tonality, more diversity. Right now 20% of the world is online, predominently English speaking, but things are about to change a lot.

I want to share some UK Cabinet Office work we’ve been doing. A wee video to show you and hopefully inspiring. It’s about how 58 different nations have been collaborating and cooperating.

[We are now watching the video, link to follow]

So the Open Government partnership is working with 58 countries moving forward with that open agenda. The question for you is how can you help that move forward. And I would welcome your participation.


Q1 – Les Carr) I’m very interested in what you said about supply side versus demand side production of data. We have thought that in repositories and open access we have concentrated on the supply side. As we get into scientific data and all the issues around that I’d be very interested in following through some of those thoughts about facilitating how the demand questions will have an impact. In our sector impact, the use of material, is really important. Do you have any lessons you think are transferable from Government Open Data?

A1) Very definitely. When we started the work with Government Open Data there were all these myths about what people needed and wanted. But noone had checked or asked. We started by asking the open data community – we had 5000 people interacting. It reached out beyond that community. We did that in a very controlled way but to understand “you know what we’re doing, what do you want from that” is so important. Not making financial arguements but making it about what could be done. And doing that with research, and declaring data via the NII – what is open and what is not – will be very interesting. We have the chance to get the community to really articulate what they need and why. We are trying to pull that philosophy into NII and research is definitely part of that. It needs to connect back to you all. We only potentially have a single member able to do something, but how do they reach back and get input from the whole research community. Understanding demand should be entirely interchangable.

Q2 – Peter Burnhill) A couple of comments and then a question. To follow up to demand side. I think we were aware when we met last year at OR2012. To my mind in the academic world we’ve had a small group of data libraries who have been used to the demand for secondary analysis, people with questions looking for data. I think that process and thinking has to be brought into this. Within the academic world we are interested in the demand. But we have genuine need to serve the purposes of academics, social scientists etc. And care for what is released and how, to maintain appropriate access. There is opportunity to do this. I think there may be a real opportunity for a joint conference here. I think your intent was to say that visualisation matters, presentation matters, and we need to have some critique on that. The science of visualisation has been about summarising data sufficiently. We have to bring that in to know how to present that properly. My question or criticism comes with your assumption that Gen X or Gen Y or Gen Z has substance, but that’s come under a huge amount of criticism. I would suggest you look at Lynn Connoway’s work that really critiques that. It’s not neccassarily an age thing, it comes in there but it’s how people transfer and their perspective. Whether they live in internet world or whether they just pop in to shop/travel etc. I would ask you to relook at some of those presumptions.

A2) I agree with you entirely. That 99% two year project. We have analytics around that. The headline story there about Afrghanistan Wikileaks data – that the incidents are all the road to Basra. People interact for 8 to 10 minutes. If the user was in the Western world their next move was to look at the Afghan incidents. And we could build that stuff in. That visual stuff is in Gen Y, the kinesthetic stuff is in Gen Z. Now that is a spectrum and you can sit in various places. Gen Y is now the majority of the workforce, and that’s why data journalism has grown – I see this as science. It’s easy to make a data visualisation that means nothing. I want this to be seen as a tool to communicate at large on a platform for data. But it is a meaningful tool. And the NII is about making sure the data can be interacted with, not just academics or web scientist. Open access will need to facilitate that generation of people that will explore and reuse data. So this tool is one we have used. But the origins are back from where you talked about.

Q2 – Peter Burnhill) But with lots of cartographers… we have that concept of garbage in, you get garbage out – you get pretty pictures from GIS but does it have real meaning. Pictures are fine but are they scientists? You need to push on the science and the issue of ensuring there is meaning. Equilogical Fallacy I would ask you to look at.

A2) We were very careful that this tool had a button to download the data and to reupload combinations of data. There are real data conversations here, critiquing the data, really interacting. That’s where I think you all, as a group, really come in. There is opportunities with the open charter to think and reexamine what you do, see what opportunities there are.


