What next for repositories and for UKCoRR?

Since 18th December 2014 the HE sector has been in thrall to the REF results, with those that did well clamouring about it and those that did less well cherry-picking the data. And clamouring about it. For the UKCoRR membership, however, REF 2014 is perhaps little more than a sideshow as we have long since been looking forward to the *next* REF when repositories, we are told, will really come of age. We built them expecting them to come, and while some did, many more stayed away, but from April the 1st 2016 even the most recalcitrant academic will need to be escorted to the repository gates the moment their paper is accepted for publication, or within 3 months at any rate. No, sorry Professor, it’s not an April Fool…

In addition to this primary requirement, there are other fundamental, related, issues most notably APC management and Research Data Management and with little more than a year to go, the Big Question is whether repository managers, as HEFCE’s foot-soldiers, have the infrastructure, resources and expertise to achieve full green Open Access in the UK – which is surely the implicit goal – and how various stakeholders – UKCoRR, Jisc, Publishers, Universities – are collaborating and responding to the considerable challenge ahead.

The UKCoRR membership now stands at over 300 members representing well over 100 institutions and organisations, the majority of which are using either EPrints or DSpace, sometimes with a CRIS (PURE, Symplectic, Converis) though often without, and with a long-tail of other software platforms. There are also different types of repository, as there are institutions, with some managing teaching and learning resources for example, e-theses or, increasingly, research data; some have sought to manage different content with a single platform (eg. Hydra) while others have opted for multiple, specialised repository instances. Some research repositories – historically a minority – are full text only whereas the majority have tended to also include bibliographic metadata, a pragmatic approach that reflects the historic difficulties encouraging academics to self archive their work. Both EPrints and DSpace are Open Source of course and some Universities run and develop in-house while others favour software as a service, outsourcing to EPrints Services for example. Each of these approaches, of course, requires specific resources and expertise.

On the UKCoRR members’, and various other software specific mailing lists, as well as at various real-life events, I cannot be the only one who has noticed a pervading uncertainty amongst those that manage and develop these, suddenly crucial, University systems, which is hardly surprising given the HEFCE requirements and the range of technology, whether in place or in development: RIOXX, CASRAI, OA Monitor, Publications Router, CORE, IRUS-UK, to mention a few.

One idea that has recently emerged from the committee is that we should, as an organisation, seek to define some sort of guidance, perhaps even a “repository specification” to help our members and their organisations to ensure that their infrastructure and advocacy is fit for purpose. There are already a wide range of relevant projects out there, notably the Jisc Pathfinder projects* – http://openaccess.jiscinvolve.org/wp/pathfinder-projects/ – and OAWAL (Open Access Workflows for Academic Librarians) at the University of Huddersfield – https://library3.hud.ac.uk/blogs/oawal/ – so perhaps this is unnecessary. Please let us know what you think.

* An example of a Jisc pathfinder project exploring this area is HHuLOA OA which has sought to create a baseline of current OA activity within institutions as a way of identifying areas that require attention. Chris Awre of the project has recently disseminated a Google spreadsheet, encouraging other institutions to add their own information, in addition to the project partners – Hull, Huddersfield and Lincoln – and which is openly shared under a CC-BY licence at the link below:

https://docs.google.com/spreadsheets/d/1MN7Qw_wlU2LMGnlcmjzgufJZuhzL6F_ay4lhcQEGKv8/edit#gid=0

See here for a blog post on the baseline – https://library3.hud.ac.uk/blogs/hhuloa/2015/02/05/open-access-baseline-activity-tool/

6th International Open Access Week (20 – 26 October 2014)

In 2008 I was a second year doctoral student at Simmons College, Boston, MA, USA when my mentors Robin Peek and Peter Suber asked me to prepare an event at the College to celebrate the first International Open Access Day (October 14th 2008). Back then the open access movement (OA) was less well established than today and OA advocates, like myself, often faced considerable resistance from scholars. Seven years on and now that, rather than a single day, OA week is an annual and international event, I am thrilled to see that the movement has gained such momentum and that events are organised all over the world.

Almost a month ago I emailed the UKCoRR listserve membership asking about their plans for this year’s OA week. I received plenty of replies, so a big thank you to those who responded!

The main focus of this year’s events is no surprise, with the new HEFCE Open Access Policy and its implications on the post-2014 Research Excellence Framework being presented at the vast majority, targeting both compliance and deposit requirements. Other topics include general presentations on open access and the various routes to OA (e.g. green vs gold), ORCiD ids and copyright. Some institutions have also arranged subject specific presentations, i.e. humanities and sciences, with presentations on how open access specifically relates to these fields. Jisc will celebrate the launch of a wonderful project, the Open Access Button, which enables users to record and ‘map’ outputs where access is restricted by a paywall, and also includes technology to source an OA version of the article (e.g. from a repository).  In addition, events this year explore Research Data Management (RDM) practice, since more and more funders are mandating not only the open accessibility of research output, but also of the data that accompanies these outputs.

There are a wide variety of events, with UK HEIs running face to face events, online webinars, formal presentations and informal discussions, some of which are addressed solely to internal delegates or to both internal and external delegates.

Here is a list of events that were announced on the UKCoRR listserve (in alphabetical order):

I am sure that other institutions are also planning to run their own events and UKCoRR would love to know more about them. Feel free to add your event further down in the “Comments” and please include links to presentations.

Happy Open Access Week 2014!
May this year our repositories grow in full-text deposits and flourish!
Enjoy!

 

UKCoRR Response to HEFCE’s Open Access Policy for the Post-2014 REF

Following their consultation in the summer of last year[1] HEFCE have released their policy on open access in the post-2014 REF process.  This is the third open access policy from a major UK funder in as many years and there are lot of reasons to be cheerful.  HEFCE’s policy as published this morning is a genuine, cost effective route to widening the access to the UK’s research outputs.

Firstly I would like to commend HEFCE for their acknowledgement of the work done by the UK repository community, both institutionally based and subject based something that has been disappointingly lacking in the other policies of its type.  The UK has a (still-) growing and passionate repository community who are doing great work which has been misunderstood and poorly valued by the Finch Report in particular.  The new policy from HEFCE is a chance to stand-up and demonstrate the value of our services to the academic community and to the other research funders as well.

I would also like to acknowledge the commitment of HEFCE to work with the repository community to ensure that all of our systems are ready to comply with this policy by the time it takes effect, regardless of their shape and set up.  There are a number of issues and process questions that the policy leaves unresolved at the moment, these we would urge HEFCE to make clear as soon as possible.  This issues have the potential to cause a number of resourcing issues for HEIs in terms of tracking and ensuring compliance.  We look forward to discussing these process issues further with HEFCE.

HEFCE’s policy also takes a pragmatic view on the issues of licensing and exceptions.  There is a strong awareness of the complex nature of academic publishing throughout the policy.  There is also a real sense that HEFCE is trying to take into account every part of the UK academic community in a way that accepts the distinct needs of the different disciplines.  We also welcome the commitment implicit in HEFCE’s policy for researchers continuing to choose the “most appropriate” venue of publication for their outputs even if it means their options for compliance are reduced.

The suggestion of the Creative Commons Attribution-Non-Commercial-No-Derivatives (CC-BY-NC-ND) license and the fact that requirements to allow text-mining are now missing from the policy as well as the extensive list of allowed exceptions[2] make the policy practical but do not go as far as many would have liked.  However they will allow institutions to meet the requirements comfortably.  We need to remember that these are minimal requirements, we are always free to strive for more and HEFCE have stated that they will acknowledge the efforts of those who do[3].

This is a policy routed in the belief that the route to open access is a long term one and will only be achieved incrementally.  HEFCE’s policy coupled with the policies of bodies such as RCUK, the Wellcome Trust, Horizon 2020 and others are part of the continuum of open access and unless the underlying business models that drive this sector change we won’t ever get true or ‘libre’ open access as it is just not financially practical.  We in UKCoRR have the skills, knowledge and passion to make this work and I look forward to working with HEFCE, Jisc and our researchers to do just that.


[1] UKCoRR’s Response to the HEFCE consultation has been published on this site along with our responses to other similar consultation documents.

[2] A full list of the permitted exceptions in their categories has been extracted from the HEFCE policy document by UKCoRR for the use of their members and others.

EPrints for Research Data – Workshop at University of Leeds (15/10/13)

Though the workshop was focussed specifically on EPrints software, many of the issues apply equally to other repository platforms.

Also see the event blog post at http://blog.library.leeds.ac.uk/blog/roadmap/post/184

Now that we’ve got Open Access to peer reviewed research output all sewn up (well, almost!) the next challenge is associated research data and on Tuesday 15th October the University of Leeds hosted a workshop exploring EPrints in this context. There is a storify of tweets from the event at https://storify.com/mrnick/eprints-and-research-data-collaboration-workshop

After an introduction from Bo Middleton there were short presentations from several institutions that had begun to explore the issue and Rachel Proudfoot began by describing Leeds repository requirements: why and how we chose EPrints (slides). Originally derived from the Jisc Managing Research Data pilot project RoaDMap – http://library.leeds.ac.uk/roadmap-project – Rachel emphasised that there were no real exemplars out there and as a starting point they considered the technical review of platform strengths and weaknesses from the KAPTUR project – http://www.research.ucreative.ac.uk/1239/ and compared requirements against DataFlow – http://www.dcc.ac.uk/resources/external/dataflow – and CKAN – http://ckan.org/ – “the world’s leading open-source data portal platform”.

Michael Whitton was up next describing how Southampton are using an existing IR for data (slides) and demonstrated the customised EPrints interface with the deposit process for a dataset discrete from the standard workflow, a “minimalist” approach to metadata and the option to easily link a dataset to a journal article:

Tom Ensom from the University of Essex described ReCollect: a research data plugin for EPrints (slides); the plug-in provides “expanded metadata profile for describing research data (based on DataCite, INSPIRE and DDI standards) and a redesigned data catalogue for presenting complex collections” and is available from the Bazaar- http://bazaar.eprints.org/280/

Valerie McCutcheon, not present in person, had pre-recorded a presentation on EPrints as a data registry (recorded presentation) describing how they are working to support researchers at the University of Glasgow in response to new requirements from funders (eg. RCUK) on how underlying research materials (i.e. data, samples, models) can be accessed – see http://www.gla.ac.uk/services/datamanagement/rdm-at-gu/ for more information of policy and support at UoG. Andrew Bell gave an overview of EPrints Services (slides) and how they might liaise with the community to prioritise development and Balviar Notay of Jisc concluded the morning with a review of the repository landscape and service transition from the RepNet project including the SHERPA sevices RoMEO and JULIET, RJ Broker, “OpenMirror“, OpenDOAR, IRUS-UK and emphasising that Jisc are considering support for linking datasets and other outputs.

Questions that arose in the Q & A discussion included workflow implications for big data sets (i.e. multi-terrabyte) as focus so far seems to be on traditional workflow/metadata input and minting DOIs to facilitate citation (a la figshare?)

A series of breakout groups proposed for the afternoon were introduced before lunch:

1. Access control requirements – due confidentiality issues, commercial sensitivity etc there is a requirement to provide some level of managed access to data sets.

2. Metadata requirements – exploration of metadata fields for a data registry (i.e. fields that could be applied to any data set).

3. EPrints gap analysis – brainstorm around RDM requirements with a view to informing an EPrints gap analysis.

4. Use cases – scenarios to inform more detailed requirements for EPrints e.g. data / user journeys during the research data lifecycle.

5. Discipline Specific Views onto Data held in EPrints (OR “Time-Signatures vs. Dynamic Viscosity”) – The multidisciplinary nature of research data at institutions poses particular challenges. In particular, how can we hope to store all the necessary discipline / project specific metadata that might be produced by research projects across large research intensive organisations? There may be scope to build a customised layer on top of a data repository to optimise how data is presented and navigated. Is this feasible? Is it desirable?

________________________________________________________________

In the event groups 3 and 4 were amalgamated and discussions were captured in Google docs which are linked below with a short summary:

1. Access control requirements – Capture document: http://bit.ly/165rt9

In discussions on jiscmail, several institutions have expressed an interest in more granular control of access to EPrints content; some access scenarios are supported ‘out of the box’ through EPrints embargo and request button features. However, these may not be sufficient for all access scenarios: for example, time limited access.

There were differences in opinion about the pros and cons of offering ‘Registered access’ to data. Although we can encourage maximum openness as best practice (for data without commercial or ethical requirements for restriction), research data deposit is new in several subject disciplines and some level of control may be the price we pay to populate data repositories during a period of cultural change.

Licence and re-use conditions should also be considered. Some commentators questioned whether the CC0 licence is appropriate for data. Others highlighted that incompatible licences with different re-use conditions will make it difficult or impossible to combine data sets; where feasible, metadata and research data should be openly available with as few restrictions as possible to avoid licence clashes.

2. Metadata requirements – Capture document: http://bit.ly/GN4jcK

The capture document represents a ‘master’ spreadsheet; community ownership is encouraged and ongoing discussion of core fields and field names.

3. EPrints gap analysis – combined with group 4 below

4. Use cases – Capture document: http://bit.ly/19F2M4Q

Use case scenarios include: submit data, find data, pull data, enrich with additional metadata, export to other systems, provide data in alternative formats, visualise data, relationships between objects, provide details of reuse, usage statistics.

It was also noted that it was also important to consider use cases that are out of scope. Such “anti” use cases might include large datasets, confidential data, “live” data that is continually changing.

What is missing from EPrints?

  • Grant code and auto-completion of other metadata fields from interaction with other systems; systems to interoperate with include CRIS (PURE, Symplectic), DMPOnline – https://dmponline.dcc.ac.uk/
  • During data import there should be some way to flag up if any confidential or sensitive data is being imported
  • Support for pseudonymisation for researchers that need to be identifiable, but that might need to keep their identity more private
  • Allow the user to modify access controls to data and metadata

Big Issues were identified as:

  • Security, confidentiality issues (Access control, Anonymisation/Pseudonymisation
  • Desire to have fewer systems or systems that better interoperate to reduce the input requirements
  • Development roadmap for EPrints (What is coming and when is it likely to be?)
  • Community collaboration/development process to get EPrints to do what we want it to do

5. Discipline Specific Views onto Data held in EPrints (OR “Time-Signatures vs. Dynamic Viscosity”) – Capture document: http://bit.ly/16IvDme

Discovery function is provided for by current metadata fields but how do we provide more detailed discovery or even navigation within a dataset? How can EPrints be configured to provide more disciplinary / subject specific metadata needed for data reuse (reuse metadata) and should we do this?

Wide range of potential users – scientists, maybe storing their data on their own systems (why would they want to use the repository?) or arts researchers needing somewhere to store their datasets – They have very different ways of documenting their data and searching for data.

If eprints can’t provide this functionality, can we envisage a separate discovery layer to the architecture?

What next?

There was discussion of the potential to bring EPrints/plug-in developers and less technical repository and research practitioners together for some sort of “hack day” or mash-up event; the central message was to keep talking and collaborating across institutions and with EPrints services: