What next for repositories and for UKCoRR?

Since 18th December 2014 the HE sector has been in thrall to the REF results, with those that did well clamouring about it and those that did less well cherry-picking the data. And clamouring about it. For the UKCoRR membership, however, REF 2014 is perhaps little more than a sideshow as we have long since been looking forward to the *next* REF when repositories, we are told, will really come of age. We built them expecting them to come, and while some did, many more stayed away, but from April the 1st 2016 even the most recalcitrant academic will need to be escorted to the repository gates the moment their paper is accepted for publication, or within 3 months at any rate. No, sorry Professor, it’s not an April Fool…

In addition to this primary requirement, there are other fundamental, related, issues most notably APC management and Research Data Management and with little more than a year to go, the Big Question is whether repository managers, as HEFCE’s foot-soldiers, have the infrastructure, resources and expertise to achieve full green Open Access in the UK – which is surely the implicit goal – and how various stakeholders – UKCoRR, Jisc, Publishers, Universities – are collaborating and responding to the considerable challenge ahead.

The UKCoRR membership now stands at over 300 members representing well over 100 institutions and organisations, the majority of which are using either EPrints or DSpace, sometimes with a CRIS (PURE, Symplectic, Converis) though often without, and with a long-tail of other software platforms. There are also different types of repository, as there are institutions, with some managing teaching and learning resources for example, e-theses or, increasingly, research data; some have sought to manage different content with a single platform (eg. Hydra) while others have opted for multiple, specialised repository instances. Some research repositories – historically a minority – are full text only whereas the majority have tended to also include bibliographic metadata, a pragmatic approach that reflects the historic difficulties encouraging academics to self archive their work. Both EPrints and DSpace are Open Source of course and some Universities run and develop in-house while others favour software as a service, outsourcing to EPrints Services for example. Each of these approaches, of course, requires specific resources and expertise.

On the UKCoRR members’, and various other software specific mailing lists, as well as at various real-life events, I cannot be the only one who has noticed a pervading uncertainty amongst those that manage and develop these, suddenly crucial, University systems, which is hardly surprising given the HEFCE requirements and the range of technology, whether in place or in development: RIOXX, CASRAI, OA Monitor, Publications Router, CORE, IRUS-UK, to mention a few.

One idea that has recently emerged from the committee is that we should, as an organisation, seek to define some sort of guidance, perhaps even a “repository specification” to help our members and their organisations to ensure that their infrastructure and advocacy is fit for purpose. There are already a wide range of relevant projects out there, notably the Jisc Pathfinder projects* – http://openaccess.jiscinvolve.org/wp/pathfinder-projects/ – and OAWAL (Open Access Workflows for Academic Librarians) at the University of Huddersfield – https://library3.hud.ac.uk/blogs/oawal/ – so perhaps this is unnecessary. Please let us know what you think.

* An example of a Jisc pathfinder project exploring this area is HHuLOA OA which has sought to create a baseline of current OA activity within institutions as a way of identifying areas that require attention. Chris Awre of the project has recently disseminated a Google spreadsheet, encouraging other institutions to add their own information, in addition to the project partners – Hull, Huddersfield and Lincoln – and which is openly shared under a CC-BY licence at the link below:


See here for a blog post on the baseline – https://library3.hud.ac.uk/blogs/hhuloa/2015/02/05/open-access-baseline-activity-tool/

6th International Open Access Week (20 – 26 October 2014)

In 2008 I was a second year doctoral student at Simmons College, Boston, MA, USA when my mentors Robin Peek and Peter Suber asked me to prepare an event at the College to celebrate the first International Open Access Day (October 14th 2008). Back then the open access movement (OA) was less well established than today and OA advocates, like myself, often faced considerable resistance from scholars. Seven years on and now that, rather than a single day, OA week is an annual and international event, I am thrilled to see that the movement has gained such momentum and that events are organised all over the world.

Almost a month ago I emailed the UKCoRR listserve membership asking about their plans for this year’s OA week. I received plenty of replies, so a big thank you to those who responded!

The main focus of this year’s events is no surprise, with the new HEFCE Open Access Policy and its implications on the post-2014 Research Excellence Framework being presented at the vast majority, targeting both compliance and deposit requirements. Other topics include general presentations on open access and the various routes to OA (e.g. green vs gold), ORCiD ids and copyright. Some institutions have also arranged subject specific presentations, i.e. humanities and sciences, with presentations on how open access specifically relates to these fields. Jisc will celebrate the launch of a wonderful project, the Open Access Button, which enables users to record and ‘map’ outputs where access is restricted by a paywall, and also includes technology to source an OA version of the article (e.g. from a repository).  In addition, events this year explore Research Data Management (RDM) practice, since more and more funders are mandating not only the open accessibility of research output, but also of the data that accompanies these outputs.

There are a wide variety of events, with UK HEIs running face to face events, online webinars, formal presentations and informal discussions, some of which are addressed solely to internal delegates or to both internal and external delegates.

Here is a list of events that were announced on the UKCoRR listserve (in alphabetical order):

I am sure that other institutions are also planning to run their own events and UKCoRR would love to know more about them. Feel free to add your event further down in the “Comments” and please include links to presentations.

Happy Open Access Week 2014!
May this year our repositories grow in full-text deposits and flourish!


UKCoRR Response to HEFCE’s Open Access Policy for the Post-2014 REF

Following their consultation in the summer of last year[1] HEFCE have released their policy on open access in the post-2014 REF process.  This is the third open access policy from a major UK funder in as many years and there are lot of reasons to be cheerful.  HEFCE’s policy as published this morning is a genuine, cost effective route to widening the access to the UK’s research outputs.

Firstly I would like to commend HEFCE for their acknowledgement of the work done by the UK repository community, both institutionally based and subject based something that has been disappointingly lacking in the other policies of its type.  The UK has a (still-) growing and passionate repository community who are doing great work which has been misunderstood and poorly valued by the Finch Report in particular.  The new policy from HEFCE is a chance to stand-up and demonstrate the value of our services to the academic community and to the other research funders as well.

I would also like to acknowledge the commitment of HEFCE to work with the repository community to ensure that all of our systems are ready to comply with this policy by the time it takes effect, regardless of their shape and set up.  There are a number of issues and process questions that the policy leaves unresolved at the moment, these we would urge HEFCE to make clear as soon as possible.  This issues have the potential to cause a number of resourcing issues for HEIs in terms of tracking and ensuring compliance.  We look forward to discussing these process issues further with HEFCE.

HEFCE’s policy also takes a pragmatic view on the issues of licensing and exceptions.  There is a strong awareness of the complex nature of academic publishing throughout the policy.  There is also a real sense that HEFCE is trying to take into account every part of the UK academic community in a way that accepts the distinct needs of the different disciplines.  We also welcome the commitment implicit in HEFCE’s policy for researchers continuing to choose the “most appropriate” venue of publication for their outputs even if it means their options for compliance are reduced.

The suggestion of the Creative Commons Attribution-Non-Commercial-No-Derivatives (CC-BY-NC-ND) license and the fact that requirements to allow text-mining are now missing from the policy as well as the extensive list of allowed exceptions[2] make the policy practical but do not go as far as many would have liked.  However they will allow institutions to meet the requirements comfortably.  We need to remember that these are minimal requirements, we are always free to strive for more and HEFCE have stated that they will acknowledge the efforts of those who do[3].

This is a policy routed in the belief that the route to open access is a long term one and will only be achieved incrementally.  HEFCE’s policy coupled with the policies of bodies such as RCUK, the Wellcome Trust, Horizon 2020 and others are part of the continuum of open access and unless the underlying business models that drive this sector change we won’t ever get true or ‘libre’ open access as it is just not financially practical.  We in UKCoRR have the skills, knowledge and passion to make this work and I look forward to working with HEFCE, Jisc and our researchers to do just that.

[1] UKCoRR’s Response to the HEFCE consultation has been published on this site along with our responses to other similar consultation documents.

[2] A full list of the permitted exceptions in their categories has been extracted from the HEFCE policy document by UKCoRR for the use of their members and others.

EPrints for Research Data – Workshop at University of Leeds (15/10/13)

Though the workshop was focussed specifically on EPrints software, many of the issues apply equally to other repository platforms.

Also see the event blog post at http://blog.library.leeds.ac.uk/blog/roadmap/post/184

Now that we’ve got Open Access to peer reviewed research output all sewn up (well, almost!) the next challenge is associated research data and on Tuesday 15th October the University of Leeds hosted a workshop exploring EPrints in this context. There is a storify of tweets from the event at https://storify.com/mrnick/eprints-and-research-data-collaboration-workshop

After an introduction from Bo Middleton there were short presentations from several institutions that had begun to explore the issue and Rachel Proudfoot began by describing Leeds repository requirements: why and how we chose EPrints (slides). Originally derived from the Jisc Managing Research Data pilot project RoaDMap – http://library.leeds.ac.uk/roadmap-project – Rachel emphasised that there were no real exemplars out there and as a starting point they considered the technical review of platform strengths and weaknesses from the KAPTUR project – http://www.research.ucreative.ac.uk/1239/ and compared requirements against DataFlow – http://www.dcc.ac.uk/resources/external/dataflow – and CKAN – http://ckan.org/ – “the world’s leading open-source data portal platform”.

Michael Whitton was up next describing how Southampton are using an existing IR for data (slides) and demonstrated the customised EPrints interface with the deposit process for a dataset discrete from the standard workflow, a “minimalist” approach to metadata and the option to easily link a dataset to a journal article:

Tom Ensom from the University of Essex described ReCollect: a research data plugin for EPrints (slides); the plug-in provides “expanded metadata profile for describing research data (based on DataCite, INSPIRE and DDI standards) and a redesigned data catalogue for presenting complex collections” and is available from the Bazaar- http://bazaar.eprints.org/280/

Valerie McCutcheon, not present in person, had pre-recorded a presentation on EPrints as a data registry (recorded presentation) describing how they are working to support researchers at the University of Glasgow in response to new requirements from funders (eg. RCUK) on how underlying research materials (i.e. data, samples, models) can be accessed – see http://www.gla.ac.uk/services/datamanagement/rdm-at-gu/ for more information of policy and support at UoG. Andrew Bell gave an overview of EPrints Services (slides) and how they might liaise with the community to prioritise development and Balviar Notay of Jisc concluded the morning with a review of the repository landscape and service transition from the RepNet project including the SHERPA sevices RoMEO and JULIET, RJ Broker, “OpenMirror“, OpenDOAR, IRUS-UK and emphasising that Jisc are considering support for linking datasets and other outputs.

Questions that arose in the Q & A discussion included workflow implications for big data sets (i.e. multi-terrabyte) as focus so far seems to be on traditional workflow/metadata input and minting DOIs to facilitate citation (a la figshare?)

A series of breakout groups proposed for the afternoon were introduced before lunch:

1. Access control requirements – due confidentiality issues, commercial sensitivity etc there is a requirement to provide some level of managed access to data sets.

2. Metadata requirements – exploration of metadata fields for a data registry (i.e. fields that could be applied to any data set).

3. EPrints gap analysis – brainstorm around RDM requirements with a view to informing an EPrints gap analysis.

4. Use cases – scenarios to inform more detailed requirements for EPrints e.g. data / user journeys during the research data lifecycle.

5. Discipline Specific Views onto Data held in EPrints (OR “Time-Signatures vs. Dynamic Viscosity”) – The multidisciplinary nature of research data at institutions poses particular challenges. In particular, how can we hope to store all the necessary discipline / project specific metadata that might be produced by research projects across large research intensive organisations? There may be scope to build a customised layer on top of a data repository to optimise how data is presented and navigated. Is this feasible? Is it desirable?


In the event groups 3 and 4 were amalgamated and discussions were captured in Google docs which are linked below with a short summary:

1. Access control requirements – Capture document: http://bit.ly/165rt9

In discussions on jiscmail, several institutions have expressed an interest in more granular control of access to EPrints content; some access scenarios are supported ‘out of the box’ through EPrints embargo and request button features. However, these may not be sufficient for all access scenarios: for example, time limited access.

There were differences in opinion about the pros and cons of offering ‘Registered access’ to data. Although we can encourage maximum openness as best practice (for data without commercial or ethical requirements for restriction), research data deposit is new in several subject disciplines and some level of control may be the price we pay to populate data repositories during a period of cultural change.

Licence and re-use conditions should also be considered. Some commentators questioned whether the CC0 licence is appropriate for data. Others highlighted that incompatible licences with different re-use conditions will make it difficult or impossible to combine data sets; where feasible, metadata and research data should be openly available with as few restrictions as possible to avoid licence clashes.

2. Metadata requirements – Capture document: http://bit.ly/GN4jcK

The capture document represents a ‘master’ spreadsheet; community ownership is encouraged and ongoing discussion of core fields and field names.

3. EPrints gap analysis – combined with group 4 below

4. Use cases – Capture document: http://bit.ly/19F2M4Q

Use case scenarios include: submit data, find data, pull data, enrich with additional metadata, export to other systems, provide data in alternative formats, visualise data, relationships between objects, provide details of reuse, usage statistics.

It was also noted that it was also important to consider use cases that are out of scope. Such “anti” use cases might include large datasets, confidential data, “live” data that is continually changing.

What is missing from EPrints?

  • Grant code and auto-completion of other metadata fields from interaction with other systems; systems to interoperate with include CRIS (PURE, Symplectic), DMPOnline – https://dmponline.dcc.ac.uk/
  • During data import there should be some way to flag up if any confidential or sensitive data is being imported
  • Support for pseudonymisation for researchers that need to be identifiable, but that might need to keep their identity more private
  • Allow the user to modify access controls to data and metadata

Big Issues were identified as:

  • Security, confidentiality issues (Access control, Anonymisation/Pseudonymisation
  • Desire to have fewer systems or systems that better interoperate to reduce the input requirements
  • Development roadmap for EPrints (What is coming and when is it likely to be?)
  • Community collaboration/development process to get EPrints to do what we want it to do

5. Discipline Specific Views onto Data held in EPrints (OR “Time-Signatures vs. Dynamic Viscosity”) – Capture document: http://bit.ly/16IvDme

Discovery function is provided for by current metadata fields but how do we provide more detailed discovery or even navigation within a dataset? How can EPrints be configured to provide more disciplinary / subject specific metadata needed for data reuse (reuse metadata) and should we do this?

Wide range of potential users – scientists, maybe storing their data on their own systems (why would they want to use the repository?) or arts researchers needing somewhere to store their datasets – They have very different ways of documenting their data and searching for data.

If eprints can’t provide this functionality, can we envisage a separate discovery layer to the architecture?

What next?

There was discussion of the potential to bring EPrints/plug-in developers and less technical repository and research practitioners together for some sort of “hack day” or mash-up event; the central message was to keep talking and collaborating across institutions and with EPrints services:


The Institutional Web Site and the Institutional Repository: Addressing Challenges of Integration – workshop at IWMW13

(Warning: includes more than a little musing on CRIS vs repositories!)

Last week I attended possibly the very last Institutional Web Managers’ Workshops (IWMW), my very first, at the University of Bath. From Wednesday 26th – Friday 28th June, web developers, commercial software vendors and independent consultants came together for an excellent programme that took in everything from Open Access and Open Education, which I know a little about, to Responsive Web Design, about which I now know a little more than I did. The conference asked “What next?” which, in the current climate (TM) carries a multiplicity of implicit questions for all of us working in HE and digital technology, but rather more explicit and personal for tens of colleagues at UKOLN which has been decimated by cuts and I wish those that have been made redundant all the best for the future, many of whom I have come to know through their work in the repository space.

I had been invited by Brian Kelly to deliver a workshop with UKCoRR colleague Stephanie Taylor (who is also one of those being made redundant from UKOLN) on “The Institutional Web Site and the Institutional Repository: Addressing Challenges of Integration”. Ahead of the workshop we disseminated a brief survey and I am grateful to UKCORR who plied me with case studies and have put together this resource (using the excellent Open Source software from the University of Nottingham xerte online toolkits) which hopefully people will find useful.

The final plenary before the parallel sessions was from Amber Thomas, previously a programme manager at Jisc and now of the University of Warwick, talking about “Turning our attention to supporting research”. As always, Amber’s talk was insightful – in fact, hearing it as I did while thinking about my own imminent workshop, possibly seminal! As I mentioned to Amber immediately afterwards I was struck by the gap in sophistication between what needs to happen and, frankly, what many of us are actually achieving (or, more fairly, what we are able to achieve within the strictures of our organisations and software) . Amber’s slides are embedded below and you can follow the link to a recording of her talk from http://iwmw.ukoln.ac.uk/iwmw2013/video/ – Wed 26 June (afternoon: 13:55 – 15:45) – Supporting Key Institutional Drivers (direct link).

During her talk, Amber referred to the same article as was the starting point for us, “Where are university websites hiding all their research?” (Guardian, 10th January 2013):

“When scouring through university websites in search of their latest developments and projects for the launch of our new research round-up, Research in brief (RIBS), it became increasingly apparent this information was not always easily accessible – to those outside the realm of academia at least.

Amber went on to emphasise that through activities including academic blogging, open lab notebooks, collaborative texts, crowdsourcing, citizen science, open access and public datasets a “more participatory and public scholarly discourse is emerging” and that “institutional webpages are not enough”. I can’t do Amber’s talk justice here and it’s well worth taking the time to view but the central message I came away with was that the life of research is outside the university website and we need to effectively aggregate and make sense of that activity:

Our workshop, by contrast, was starting from the premise that while institutional webpages might not be enough, they are certainly a start and with the help of UKCoRR I have assembled case studies from the Open University, the University of Glasgow, Leeds Metropolitan University, Northampton University, London School of Economics, University of Sussex and the University of the West of England which you can read more about here.

A question on the initial survey was “Do you know what repository/research management software your institution uses?” and by way of introduction to the workshop I posed the same question. Most – though not everyone – did know which software their institutions  were running and had more or less involvement in reusing data from those systems. I had deliberately avoided the use of the term “CRIS”, assuming that it might not be familiar to academics or web developers – in any case I would argue that terminology can be problematic and that “CRIS” generally refers to specific commercial systems with comparable infrastructure described simply as a Research Management System (RMS) perhaps based on an existing repository appropriately linked with HR and finance systems, for example.

I have suggested previously, though possibly not in public, that the dichotomy between repositories and CRIS is at best unhelpful, at worst specious and the great thing about bringing different people together is that it suddenly throws into relief the functional and semantic obsessions that I think tend to develop in any specialised community…so, UKCoRR, what do you make of this tweet during the workshop from the CMS vendor @TERMINALFOUR:

But EPrints is a repository and PURE is a CRIS I hear you cry! Yes and perhaps I should have been clearer on that point but I’m not sure the lady that tweeted it would really care and nor should she!

Research management at UK institutions increasingly comprise a combination of software ranging from DSpace and EPrints (and their ever more sophisticated ecology of plug ins) to commercial CRIS like PURE and Converis which may be linked to a repository or, increasingly, subsume the functionality of a repository, managing full text and other digital assets and supporting interoperability standards like SWORD and OAI-PMH. The other popular commercial software is Symplectic Elements which shouldn’t perhaps be regarded as a fully fledged CRIS (though can certainly become part of an integrated research management infrastructure) and needs to be linked to a repository in order to manage full text (disclaimer, this is the system we are currently implementing at Leeds Met and with which I am most familiar.)

To my mind, the alleged dichotomy between these various systems fails to take proper account of the functional components of what an integrated research management infrastructure should comprise and even perhaps, to some extent, reflects the entrenched infighting between the Green and Gold Open Access camps (apologies to James Toon for capitalisation!)

It could be broken down further but in essence I think this functionality comprises 4 components:

  • Research grant and award management
  • Easy / effective workflow to capture institutional research outputs (including data)
  • Dissemination/discoverability of those research outputs (and data!) on the open web
  • Support for Open Access (both Green and Gold)

I am aware of the potential strengths of the CRIS model, especially in the context of research grant and award management, though I don’t think it effects the general argument that disparate institutional systems can be integrated.

The main difference is arguably one of scope which, in turn, is reflected in different data models, repositories tend to be based on Dublin Core metadata with specialised CRIS software, by comparison, supporting CERIF (Common European Research Project Information Format) to facilitate a more sophisticated model of relationships between data, and for greater flexibility when sharing local data with another institution or funding body. There is a great deal of variety across institutional research management infrastructure however and a specific implementation will depend on individual requirements and the underlying software.

Another key difference tends to be the service which manages the respective systems, with repositories traditionally being located in the library and tending to emphasise archival, preservation and dissemination of research outputs. A CRIS, on the other hand, is likely to be overseen by the university research office with a focus on projects, proposals and other funding information. This distinction, of course, is a generalisation and the boundary is increasingly variable, especially with the renewed emphasis on Open Access resulting from the Finch report and the new RCUK policy that came into force on 1st April 2013 meaning that Open Access is likely to become more relevant for research administrators. Accordingly it is ever more important for research administrators to liaise with academic libraries, in terms of policy, expertise, software and systems.

And if anything, for me, this was the point of iwmw13. I would not typically have the opportunity to attend a conference that is arguably largely outside the narrow remit of my day job. I learnt plenty and made some valuable contacts. Like our data, people tend to be in functional silos, and it’s impossible to know what serendipity might arise until we mix it up a bit.

Aside from technical and system considerations there is an overriding need for effective communication channels amongst the full range of stakeholders at a given institution in order to more effectively integrate the disparate systems of institutional research management infrastructure.

IWMW has been running since 1997. 16 years. I hope that the community finds a way to run another.