Repository Professionals: The Next Generation

The internet is basically a teleportation device for information [citation needed] and like the original Star Trek series, where the technology may have aspired to be futuristic but is very firmly rooted in a 1960s aesthetic, repository systems are still using technologies and protocols from the early days of the web (COAR 2016).

Spock and Kirk 1968
© Public domain. Image source: Wikimedia Commons

In April 2016 the Confederation of Open Access Repositories (COAR) launched a working group focussed on Next Generation Repositories and as the 9th International Open Access Week rolls around it’s another chance to take stock of the repository landscape and its mission to boldly promote open access, the recent ongoing discussion around which is captured by Richard Poynder and Kathleen Shearer of COAR.

Equally important as the technology, if not more so, are those on the bridge and in the engine room who increasingly need a professional skill set the breadth and depth of which rivals anything required by Starfleet; from traditional librarianship, to web-science, from a hundred and one technical protocols to an arcane realm of policy edicts from university, research funder and government. We even have our own Borg in the form of the commercial publishing industry, ever more efficient at assimilating the infrastructure and co-opting the language of open access. As a case in point, the publishing giant Elsevier that acquired Mendeley in 2013 as well as [Atira] Pure (CRIS software) in 2012 and more recently SSRN in 2016, now run a Mendeley Certification Program for Librarians, as they seek to lock-in researchers and their librarians, Facebook-like, into their ecosystem. A particularly jarring example of corporate hubris even by their standards.

For this year’s Open Access week, then, we want to know what you think UKCoRR’s role should be in nurturing the next generation of repository professionals?

As argued by UKCoRR member Jennifer Bayjoo recently in her paper Getting New Professionals into Open Access at the Northern Collaboration Conference, OA and repositories are still not a priority in many CILIP accredited professional library and information management qualifications. CILIP assess courses against their Professional Skills and Knowledge Base which has just one single reference to Open Access buried in point 7.3 ‘Selection of materials and resources’ (and which is only accessible to paid-up members of CILIP, in stark contrast to Elsevier’s ‘freemium’ model for Mendeley.)

It is instructive also to consider the types of job that have been posted to the UKCoRR list which increasingly focus on a broader range of skills than the traditional ‘Repository Manager’ and with a growing emphasis on research data management for example. Of the 16 roles posted to the list in 2016 only 2 explicitly mention the word ‘repository’ and just 1 ‘librarian’:

Research Repository Data Administrator
Research Publications Officer
Research Data Management Advisor
Research Data Support Manager
Copyright and Scholarly Communications Manager
Research and Scholarly Communications Consultant
Open Access & Research Data Advisor
Manager of the Institutional Repository
REF and Systems Manager
Research Data Adviser
Research Publications Manager
Research support librarian
Research Publications Officer
Research Data Officer
Research Publications Assistant
Open Access Officer

The most common perspective on the value of UKCoRR seems to be our supportive community which is largely self-sustaining via the email list, do we need to do anything beyond this?

What is our role in liaising with other organisations like Jisc, CILIP or ARMA?

Might you be willing to share your expertise via an informal mentorship scheme for example?

With these issues in mind, we have put together a very short survey and would like your help to identify the skills and knowledge the future Open Access professionals should have.

As Captain Jean Luc Picard might have said to send his (much more modern) Starship Enterprise to warp speed, “Engage!”

Technology words for repository managers

Posted on behalf of Nancy Pontika, UKCoRR External Liaison Officer and Open Access Aggregation Officer for CORE 

The role of the repository manager is constantly evolving. The repository manager of today needs to be aware and interpret not only their affiliated institution’s open access policies, but also the national and international ones that emerge from public funding agencies. The proliferation of these policies introduces technical requirements for repositories, the use of current research information systems (CRIS) and the installation of various plug-ins for example, and often repository managers serve as intermediary between the IT department of their institution and their supervisors or library directors and have to communicate messages and requests. A couple of months ago on the UKCoRR members list we had a discussion around the specific technological terms that repository managers hear regularly.

Even though currently I am not a repository manager, I couldn’t help but sympathise. The past year that I have been working for CORE – a global repositories harvesting service – I am the only non-developer in a team of four (wonderful!) developers. As a result, there are times where I feel lost in our discussions, so I decided to put together a list of often used technical terms that can relate to repositories. As a first step, a Google Spreadsheet was created with some basic terms. Then the UKCoRR list members were asked to add more of these jargon words and also weight each term based on how often they tend to hear it. (I am going to keep this list over there for future reference and do not hesitate to save local copies if you find it useful.) In the end, with the help of two CORE developers – Samuel Pearce and Matteo Cancellieri – we tried to provide brief definitions and give simple examples when possible.

The following table contains a list of these terms and their definitions. Since this is not an exhaustive list (and it was never meant to be) feel free to add other terms in the comments area of this blog post.


Web technologies
Apache Apache is a web server. When your browser such as Internet Explorer or Google Chrome requests a website (for example Apache is the software that returns the webpage to your browser.
Tomcat Tomcat is a web application server. Tomcat works like a web server but it serves more complex pages and operations than the web server. For example your online banking system uses a web application server while this blog uses a web server.
Java A programming language that usually runs in Web Application Servers (such as Tomcat). CORE uses Java for running the harvesting of the repositories.
PHP An Open Source scripting language particularly suited for website development. It is commonly installed with Apache, which allows web pages to be more complex without having to run a separate web application server such as Tomcat. For example, CORE uses PHP in its web pages.
robots.txt A text file that specifies how a web server regulates access of automatic content downloaders.  For example, CORE follows the rules in the robots.txt file. The rules may limit the number of requests per second made to your webserver or restricts access to certain places on your website, such as a login page.
SSH (Secure Shell) A protocol that allows one computer to connect to another and send commands by typing in text rather than clicking buttons.
MySQL MySQL is an Open Source Database Management System owned by the Oracle Corporation.
Perl A programming language usually used for scripting and text processing.
JavaScript A programming language that usually runs in your browser to allow web pages to be more dynamic and reactive. Web forms may use JavaScript to ensure they are filled in correctly before submitting them.
Crawler A crawler is a machine which automatically visits web pages and processes them. A common example is Google, which crawls websites, extracts content and makes it available via its search engine.
Cron jobs Programs that are set to run at specific times. For example, they are used for periodic tasks such as running automatic updates every day at midnight or extracting and processing the text from your full-text outputs in your repository to make them searchable.
dev site A website used for testing. This allows developers to test and process information without the risk of breaking the “live” production website.
Git A version control system, like Subversion (SVN). It enables tracking changes in code.
SVN/Subversion “SubVersioN” – it’s a version control system, like git. It enables tracking changes in code.
clone A command in Git that copies code from a remote server to a local machine.
UNIX An operating system analogous to DOS, Windows and Mac OS. Nowadays, Unix refers to a group of operating systems that adhere to the Unix specification. An example of Unix based operating systems are Linux and Mac OS.
LINUX An operating system based on Unix. The Linux code is open source and allows anyone to modify and distribute software and source code creating different variants of ‘Linux’. The most popular version of Linux are Ubuntu, RedHat, Debian and Fedora.
HTTP proxy An HTTP Proxy is a gateway for users on a network to access the internet. This allows large organisations to track internet usage and also limits the amount of downloaded data by storing it within the proxy. The next time the same website is requested, the local copy is sent to the user rather than re-downloading it.
External resolver A external resolver service (such as The DOI® System or HDL.NET®) allows a digital object, such as research outputs, to have a unique global identifier.
Mirrors A Mirror is a copy of another website. An organisation may mirror a website to reduce traffic and hits to the source website.
Metadata Protocols
OAI-PMH OAI-PMH, (Open Archives Initiative Protocol for Metadata Harvesting) is a standard for exposing metadata in a structured way – particularly for computers to understand.
SWORD SWORD (Simple Web-service Offering Repository Deposit) is a protocol that simplifies and standardises the way content is deposited into repositories.
Data access
API An API (Application Program Interface) is a set of rules that defines how parts of software or two separate programs interact with each other. For example, the CORE API allows developers to use CORE’s data from within their own applications.
Widget A small application with limited functionality that runs within a larger application or program. The CORE Similarity Widget retrieves similar articles based on metadata and runs within the larger application of a repository.
Plugin Similar to a widget, a Plugin adds extra functionality to software. This may add new features or change the way an existing feature works.
Text mining The process in which high quality data is extracted from text using a computer.
Data Dumps A single or multiple files that contain a large set of data.


CRIS and retirement of repositories?

Recently there was a discussion on the UKCoRR mailing list around whether institutions implementing CRIS (Current Research Information System) like Pure (formerly Atira, now owned by Elsevier) might retire their repositories in favour of a single system.

Other CRIS systems include Converis (now part of Thompson Reuters) and Symplectic Elements (part of Digital Science’ portfolio) N.B. Elements is more publication management system than full blown CRIS, and unlike Pure, does not manage files in its own right; it needs to be integrated with a traditional repository so retirement is not an option.

As an addendum it is worth noting that all three systems are owned by well known commercial organisations operating in academic publishing and scholarly dissemination*, diametric to the Open Source credentials of EPrints and DSpace repositories, still by far the most popular systems for managing open access research in the UK and globally, whether on conjunction with a commercial or part of a homebrew institutional CRIS.

* see related discussion from Stevan Harnad and with his typical candour –
Elsevier’s PURE: self-interest and exploitation

The responses are presented below, summarised by Dimity Flanagan from the London School of Economics; sources have been anonymised:

Part One: Retiring

– Retiring the repository for the launch of Pure next year

Part Two: Future unclear

– In the process of purchasing a CRIS. At this stage it is unknown what will happen to the eprints repository. Will it be a second repository for non REF eligible items? The institution is planning to expand its research activities hence the CRIS purchase. Librarian predicts the repository (which is quite small) will eventually be phased out.
– Has the research portal and eprints. Considering retiring eprints as they are largely duplicate systems.
– Likely to retire Eprints repository and use PURE Portal as the publications repository – this is purely because the technical support is not available to integrate the two systems. Believe that they can both fulfil the university’s needs. As the Research Division will use Pure, it will be easier to work with them.

Part Three: Pro repository (for now)

– Has a Pure portal with metadata but has integrated the system with the DSpace repository which is where full text can be accessed. This is kept under review – the connection does cause technical problems but want to be sure that the Portal can offer everything.
– Our take has always been that the repository is for much more than articles, hence it has inherent value in itself as a means for managing the structured digital collections of the University.
– Do not have a commercial CRIS. The research system, finance system and repository are linked and cover all functionality.
– Our decision has been to continue our ePrints repository in parallel with PURE, and at time of writing I cannot foresee us retiring it. The arguments surrounding this decision for institutions is a horses for courses game but I think it is important to be clear that having PURE’s front-end product (known as the Advanced Portal) is not the same as having an ePrints or DSpace repository, nor does it offer comparable functionality. E.g

1. Maximising discovery potential of research content (The Pure Portal is not properly supported by Google Scholar, for example – but also suffers other discovery impediments)
2. Exposing content easily via OAI-PMH (PURE offers clunky/limited OAI-PMH support)
3. Undertaking digital preservation or curation activities and/or ensuring persistent access to research outputs (PURE offers zero features in this area, inc. no linkage with Arkivum…)
4. Using mainstream repository protocols such as SWORD or participating in many mainstream repository developments (PURE offers limited or zero support….)
5. Ensuring compliance with metadata applications profiles, such as OpenAIRE, RIOXX v2, etc. (Limited compliance with prevailing metadata application profiles…)
– Kept both (but not connected), the logic being that it would be good to have a service to meet the broad university requirement for capturing research information (the CRIS) AS WELL AS having a set of services based around a repository infrastructure (Dspace in our case) that could be implemented as required for focused purposes.


The EC FP7 Post-Grant Open Access Pilot: An Attempt to Implement Fair Gold Open Access

Guest post by Pablo de Castro, Open Access Project Officer, LIBER

A new Gold OA funding initiative has been launched earlier this year by the European Commission: the FP7 Post-Grant Open Access Pilot. This initiative, which is being implemented under the OpenAIRE2020 project, will use its EUR 4m budget to fund OA publishing fees for publications arising from over 8,000 completed FP7 projects covering the whole European Union and neighbouring countries – a large subset of which feature British institutions among its partners.

This post-grant Pilot aims to test the need for additional funding once the FP7 projects finish and their grants are over. Publications will very often arise after the project end-date, so this is an instrument for supporting the Open Access publishing of research articles and monographs for which the projects and their researchers may no longer have any available budget.

The biggest challenges the project team have identified thus far lie in the sheer geographic coverage of the initiative and in the radically different approaches towards Gold Open Access that co-exist across Europe. The scale issues will be addressed by relying on the comprehensive network of professionals, institutions and their best practices in Open Access implementation and dissemination that OpenAIRE provides. The
Pilot partner network – which includes the Jisc in the UK, SURF in The Netherlands, the University of Göttingen in Germany and the University of Athens in Greece under the coordination of LIBER in The Hague – will also contribute to meet the need to reach out to eligible researchers and projects across the whole Continent.

The second big challenge involves the need for this Pilot to operate under the three main different scenarios for Gold OA implementation identified by the network of institutions taking part in the project kick-off stage, as described in the slide below which was presented and discussed last June at the Gold Open Access Pilot workshop at the LIBER Annual Conference in London.

Gold OA funding landscape

In order to deal with this fragmented funding landscape, the Pilot will try to promote a specific brand of fair Gold OA that aligns with the highest possible number of funders’ policies across the Continent. There is a well-acknowledged divide in the way Gold OA is being presently implemented in Europe: while UK funders like RCUK or the Wellcome Trust/COAF will fund APCs in both fully Open Access and hybrid journals applying no funding cap to their eligibility criteria, European funders like the German DFG, the Norwegian Research Council, the Austrian FWF and the Dutch NWO will all either rule out or restrict funding for hybrid journals and often establish funding caps.

The eligibility criteria for this FP7 Post-Grant OA Pilot will then exclude funding for hybrid journals and set a €2,000 funding cap for research articles and a €6,000 one for monographs and edited volumes. Besides this, the policy requires a CC-BY licence, a text-minable file version besides the standard PDF one and their deposit into an OpenAIRE-compliant repository. This policy for funding APCs is then coupled to an alternative APC-equivalent funding mechanism which aims to also fund APC-free OA journals as a means to address the wider scope of Gold Open Access, which does not just involve the APC-based business model.

Thorough discussions held on such policy both within the project partner group and with an extensive selection of external reviewers highlighted the severe risk of underspending that ruling out funding for hybrid titles would pose, and in fact the results of the first two months of Pilot operation show this as a looming threat on the initiative. However, the project has only just started and the summertime season means there’s less activity: data on the next report due end-of-Sep will surely look more encouraging. But there is a clear need anyway for institutions to support the dissemination of the initiative towards their eligible researchers and projects and to help them navigate the potentially obscure policy requirements (despite the strong recommendation included in the policy guidelines that authors should check with their institutional Library or Research Office before submitting a funding request, plenty of funding applications are arriving for articles accepted at hybrid journals, which are systematically turned down much to the author’s occasional dismay). The Pilot coordination – which may be reached here (email) – is keen to support the outreach work at institutions by delivering eligible FP7 project lists where useful or by providing any additional assistance – a support service that the Jisc will also be able to provide as OpenAIRE National Access Desk (NOAD) for the UK.

Should this initiative manage to safely clear the perilous strait between the Scyllean Green Open Access that OpenAIRE is understandably very keen on and the Charybdian threat of the often ill-informed dismissal of any kind of Gold OA as commercial publisher-friendly and prone to double-dipping, this would be very good news for the Open Access community. The success of this initiative would mean a way to implement a reasonably well-aligned policy that will eventually drive down APC costs by promoting a cultural change in the publishing habits. The opportunity UKCoRR have kindly offered to further disseminate this funding initiative to institutions where the mechanisms are often already in place to provide researchers the support they require is greatly appreciated.

Results of the Sherpa FACT accuracy testing – 95% accurate

UKCoRR welcomes the results of a  recent exercise – undertaken by UK librarians, repository managers and Sherpa Services – that has shown that the results produced by SHERPA/FACT (Funders & Authors Compliance Tool) have an accuracy rate of over 95%

The FACT service was developed to help researchers get a simple answer to the question “does this journal have an open access publishing policy compliant with my funder’s open access mandate?”. The FACT service – which draws its information from the SHERPA/ROMEO and SHERPA/JULIET databases – seeks to provide a yes/no answer to this question, as well as providing information about how an author can comply with a funder policy.

There had been some discussion at the SHERPA/FACT Advisory Board, raised by UKCoRR members and their institutions – as to whether the information provided by FACT was accurate.

To address this issue an exercise was undertaken – by members of UKCoRR – to manually check a statistically significant number of journal/funder combinations and then compare the information this group had found with the information provided by FACT. Where the independent reviewers arrived at a different conclusion to that provided by FACT, then that journal/publisher combination was subjected to detailed and exhaustive investigation to arrive at an evidenced answer.

At the end of this exercise, it was found that the FACT service provides correct information in over 95% of cases.

The study clearly highlighted the difficulties that even highly experienced repository staff have at deciphering publisher OA policies. Indeed, the initial testing undertaken by UKCoRR members suggested that FACT was only accurate on 57% of occasions. When these journal/funder combinations were investigated further, however, close examination of the often complex conditions and the interactions between different statements and policies showed that FACT was correct in almost all of the cases.

The SHERPA/FACT team as well as the SHERPA/FACT Advisory Group would like to extend their thanks to the UKCoRR Members who took part in this checking process for their time commitment as well as their extensive knowledge of this area of work. This exercise has proved that the SHERPA/FACT service can be relied upon as a source of advice for UK researchers. UKCoRR also encourages it’s members to continue to communicate with SHERPA/FACT where discrepancies are found to continue to improve the quality of the information SHERPA/FACT relies upon.

Issues of interpretation and the interaction between the various policies have been seen to be the key to the discrepancy between our manually checked results and Sherpa’s findings. There is further work to be done here and UKCoRR looks forward to continuing to work with the SHERPA/FACT Advisory Group to develop increased clarity in this area.

To see the full data and study methodology, visit Figshare.

The study was commissioned by the SHERPA/FACT Advisory Board – which includes representatives from UKCoRR, Jisc, Wellcome Trust, Research Councils UK (RCUK), CRC Nottingham, Association of Learned & Professional Society Publishers (ALPSP), Higher Education Funding Council for England (HEFCE), Publishers Association and SCONUL.

A blog post from Jisc on this project is available as well as a press release on the project.