Tag Archives: CORE

What next for repositories and for UKCoRR?

Since 18th December 2014 the HE sector has been in thrall to the REF results, with those that did well clamouring about it and those that did less well cherry-picking the data. And clamouring about it. For the UKCoRR membership, however, REF 2014 is perhaps little more than a sideshow as we have long since been looking forward to the *next* REF when repositories, we are told, will really come of age. We built them expecting them to come, and while some did, many more stayed away, but from April the 1st 2016 even the most recalcitrant academic will need to be escorted to the repository gates the moment their paper is accepted for publication, or within 3 months at any rate. No, sorry Professor, it’s not an April Fool…

In addition to this primary requirement, there are other fundamental, related, issues most notably APC management and Research Data Management and with little more than a year to go, the Big Question is whether repository managers, as HEFCE’s foot-soldiers, have the infrastructure, resources and expertise to achieve full green Open Access in the UK – which is surely the implicit goal – and how various stakeholders – UKCoRR, Jisc, Publishers, Universities – are collaborating and responding to the considerable challenge ahead.

The UKCoRR membership now stands at over 300 members representing well over 100 institutions and organisations, the majority of which are using either EPrints or DSpace, sometimes with a CRIS (PURE, Symplectic, Converis) though often without, and with a long-tail of other software platforms. There are also different types of repository, as there are institutions, with some managing teaching and learning resources for example, e-theses or, increasingly, research data; some have sought to manage different content with a single platform (eg. Hydra) while others have opted for multiple, specialised repository instances. Some research repositories – historically a minority – are full text only whereas the majority have tended to also include bibliographic metadata, a pragmatic approach that reflects the historic difficulties encouraging academics to self archive their work. Both EPrints and DSpace are Open Source of course and some Universities run and develop in-house while others favour software as a service, outsourcing to EPrints Services for example. Each of these approaches, of course, requires specific resources and expertise.

On the UKCoRR members’, and various other software specific mailing lists, as well as at various real-life events, I cannot be the only one who has noticed a pervading uncertainty amongst those that manage and develop these, suddenly crucial, University systems, which is hardly surprising given the HEFCE requirements and the range of technology, whether in place or in development: RIOXX, CASRAI, OA Monitor, Publications Router, CORE, IRUS-UK, to mention a few.

One idea that has recently emerged from the committee is that we should, as an organisation, seek to define some sort of guidance, perhaps even a “repository specification” to help our members and their organisations to ensure that their infrastructure and advocacy is fit for purpose. There are already a wide range of relevant projects out there, notably the Jisc Pathfinder projects* – http://openaccess.jiscinvolve.org/wp/pathfinder-projects/ – and OAWAL (Open Access Workflows for Academic Librarians) at the University of Huddersfield – https://library3.hud.ac.uk/blogs/oawal/ – so perhaps this is unnecessary. Please let us know what you think.

* An example of a Jisc pathfinder project exploring this area is HHuLOA OA which has sought to create a baseline of current OA activity within institutions as a way of identifying areas that require attention. Chris Awre of the project has recently disseminated a Google spreadsheet, encouraging other institutions to add their own information, in addition to the project partners – Hull, Huddersfield and Lincoln – and which is openly shared under a CC-BY licence at the link below:


See here for a blog post on the baseline – https://library3.hud.ac.uk/blogs/hhuloa/2015/02/05/open-access-baseline-activity-tool/

Counting on IRUS

As we all know, repositories are an established component of the rapidly evolving scholarly web infrastructure in the UK and globally, and whatever the impact of Finch and the potential shift from Green to Gold, they are likely to remain a primary source of authoritative full-text versions of research outputs and, increasingly, associated data-sets as well as a variety of other scholarly outputs including electronic theses and Open Educational Resources (OER). The institutional variety are increasingly better integrated within University websites and research management infrastructure and emphasis on Search Engine Optimisation (SEO) means that we can only expect in-bound traffic to increase.

The repository landscape, however, is fragmented with 1813 repositories currently registered globally with OpenDOAR utilising dozens of different software platforms with a total of 206 in the UK including 154 institutional repositories and 47 disciplinary repositories. As explored in my Pecha Kucha at OR2012, it is far from easy to consistently provide accurate, dynamic, article level usage data across the various software platforms and, in addition to the functionality of the underlying software, can depend on how that software has been implemented and the technical ability of supporting staff. EPrints, for example, by far the most popular software, has the excellent IRStats plug-in but it is not implemented consistently across EPrints installations. Many repository managers also utilise Google Analytics which can be a powerful tool but requires a degree of technical intervention and active management.

There is therefore an urgent need for a standardised method of aggregating usage data across repositories which is where the IRUS-UK project comes in. IRUS (Institutional Repository Usage Statistics) follows on from the PIRUS2 project, which demonstrated how COUNTER-compliant article-level usage statistics could be collected and consolidated from Publishers and Institutional Repositories.

To participate in IRUS, repositories will need to install ‘tracker code’ which pings the IRUS server with a defined OpenURL string every time an item is downloaded from the repository. Personally I have been interested to learn a little about how IRUS will eliminate search engine spiders and robots by screening “user-agents” defined in the COUNTER official list (available from here as an XML file and a TXT file).

There are currently plug-ins available for EPrints 3.2/3.3 and DSpace 1.8.x (for other software, ‘tracker’ installation will need additional work which will vary according to the software (specification/requirements will be defined in the PIRUS and IRUS-UK Codes of Practice which has not yet been released.) When implemented in your repository, COUNTER compliant usage statistics will be available from IRUS via standard COUNTER reports (SUSHI and/or Excel spreadsheets/CSV/TSV files) as well as via an API to enable retrieval and display of data in repository records.

There is also the potential to implement IRUS in third-party aggregation services like CORE and CiteSeer which both cache copies of full-text, thereby enabling item-level data to be consolidated from different sources.

Thanks to Paul Needham for this information; IRUS have also agreed to come and speak to the UKCoRR membership at the next members meeting at Teesside in November (full programme soon.) For more information or to register your interest in the meantime email irus@mimas.ac.uk

N.B. Neil Stewart has recently blogged on early participation with IRUS-UK at http://cityopenaccess.wordpress.com/2012/08/29/city-research-online-irus-uk/

The train home…

I’m on the train, on my way back to Leeds from the 7th International Open Repositories Conference at the University of Edinburgh and though I’m disappointed not to be able to stay longer and for the céilidh this evening, I’m still able to participate remotely in the conference via Twitter and various blogs albeit on a rather slow 3G connection via my phone….which rather illustrates two of the themes of Cameron Naylon’s opening keynote yesterday; connectivity and low friction. And also, to some extent, his third theme of demand side filters in that I can tweet a link to this post tagged #or2012 and know that I am sharing with the colleagues I’ve met over the past few days.

(N.B. Cameron Naylon’s keynote is now available on YouTube)

I had volunteered to be a member of the blogging team for the conference answering a call from @OpenRepos2012 but in the end only managed to post one attempt at a live blog from the RSP Workshop on Monday “Building a National Network”. I’m afraid I can’t quite type or think fast enough for live blogging (though I did tweet a lot!) so apologies and kudos to Nicola for her detailed live blogs from various sessions and, in the spirit of Open, I’ll use verbatim / adapt exerpts from http://or2012.ed.ac.uk/category/liveblog/ to help jog my memory, fill in some of the gaps and report on the sessions that I attended with no further attribution (I hope this is OK, let me know if not, preferably not through your lawyer.)

I enjoyed Cameron Neylon’s keynote “Network Enabled Research” http://or2012.ed.ac.uk/2012/07/10/opening-plenary-cameron-neylon-network-enabled-research-liveblog/ though did notice one or two voices on Twitter sighing that it wasn’t terribly cutting edge and that we’d perhaps heard most of it before. May be so (for the record I think this is unfair) but Cameron himself acknowledged that he was preaching to the choir and more interesting to me are the vast swathes of heathens not yet (formally) converted to the Church of Open, to of whom Cameron’s ideas and those of the conference as a whole were, and continue to be, amplified through Twitter and other social media. I myself have over 600 followers on Twitter which is peanuts to some of the big Twitter hitters, and though I wouldn’t blame some of them for muting my conference output there is still a considerable amplification outside a specialised community to the global public. i.e. the customers of Open. And they want outcomes; not research outputs per se but meaningful outcomes from publicly funded research.

Another excuse for not blogging more during the conference itself was that I was somewhat preoccupied with my own Pecha Kucha that I delivered in the afternoon session on Tuesday and though I received a lot of positive (possibly polite) feedback I am by no means a conference veteran and was glad to get through my 20 slides without too much fuss, though I did wander off with the mic still pinned to my shirt, fortunately called back before I got to the loo (a la Frank Drebbin in The Naked Gun.) My PK was on “Open Metrics for Open Repositories” and the slides and associated paper are available at http://www.slideshare.net/MrNick/open-metrics-for-open-repositories-at-or2012 and http://opus.bath.ac.uk/30226/ respectively. I’ve learned a great deal more about metrics than I knew before the conference and will certainly be following up on IRUS-UK, for example, and one or two posters and relevant Pecha Kucha presentations. COUNTER compliance is certainly important and something that I think ukcorr should be advocating and, I believe, is all the more important since the Finch report.

I was particularly interested to learn about UK RepositoryNet+, based at EDINA, which is aiming to create a socio-technical infrastructure to manage the human interaction that helps make good data happen, and ultimately to justify the investment that JISC has made into open access and repository infrastructure by mediating between open access and research information management and differentiating between evolving models of open access and between various technical standards. Wave 1 is focussing on deposit tools (SWORD, RJ Broker), benchmarking, aggregation (RepUK, CORE, IRS) and registries (OpenDOAR, ROAR) to underpin Green, though, post Finch, it will also be necessary to consider Gold OA mechanisms more fully. Wave 2 will focus on “micro-services” (N.B. I don’t fully understand what this means…)

I participated in a break-out session on deposit and learned more about RJ Broker from Ian Stewart and was interested to hear the level of engagement from publishers though I’m not sure I’m entirely clear of the advantages over WoS / Scopus APIs increasingly implemented by CRIS (and repositories) though appreciate it could be a valuable alternative especially where institutions don’t subscribe to the commercial providers (it was pointed out though that CRIS aren’t generally compatible with SWORD which is the mechanism that RJ Broker utilises). There was an interesting and less formal discussion around some of this with JISC’s Balviar Notay, James Toon and others in the pub later and Balviar did convince me of the importance of RJ Broker in terms of cultural change.

This morning before I rushed off I attended a session on Augmented content, I confess to not fully understanding the technicalities of first presentation on “Augmenting open repositories with a social functions ontology” but it was interesting nevertheless and made me consider just how static and unsocial many of our repositories still are. “Microblogging Macrochallenges for Repositories” was good fun and I might even have a go at implementing it myself though did make me wonder whether there would be any issues with Twitter’s ToS. The 3rd and final presentation of the session was “Beyond Bibliographic Metadata: Augmenting the HKU IR” a very impressive CRIS like implementation of DSpace at Hong Kong University.

A cup of tea and half an hour’s networking brought us to my final session of the conference, another round of Pecha Kucha presentations collectively organised around “National Infrastructures” and including a presentation from UKCoRRs very own Paul Stainthorp. Paul’s slides are available at http://paulstainthorp.com/2012/07/11/oa-the-noo-my-ukcorr-pecha-kucha-slides-from-or2012-in-edinburgh/

All in all a hugely enjoyable and informative couple of days and with plenty more to come for those still in Edinburgh, the full programme is available at https://www.conftool.net/or2012/sessions.php and I for one will be keeping at least one eye on the #or2012 hashtag.

CORE “Similar documents” widget

What with the Finch report making a bit of a splash right now and by all accounts downplaying the role of repositories and Green Open Access (expect a formal response from UKCoRR in due course, though in the meantime see this critical overview from Peter Suber – https://plus.google.com/109377556796183035206/posts/DsBAeSCofDX) it seems like a good time to remind ourselves of some of the exciting developments in repository world that should help us make Green a more viable alternative to Gold than the Finch report might seem to suggest. One such (technical) development is the Connecting Repositories (CORE) project at the Open University who have developed a widget that can be embedded in your local IR to generate links to similar documents from other repositories harvested by CORE.

The original CORE project focussed on harvesting papers from repositories and used text mining techniques to calculate ‘semantic similarity’ between different papers. This data was published openly, but not really in a format that could easily be used by others (at least outside a specialist community) so the CORE team have developed some simple javascript that, in theory, can be dropped into any repository setup and that will take the current document being displayed in the web interface and, if it is present in CORE data with some similar documents listed, will display links back to those documents in the CORE interface (these papers may come from any CORE harvested repository).

Obviously this means that CORE needs to be harvesting full text from your repository with a reasonable success rate for the plugin to work effectively (data on whether your repository is being harvested is available at http://core.kmi.open.ac.uk/repository_analytics/), although where there are no similar documents it should do nothing, so it won’t do any harm in any event.

The plugin is based on JQuery (jsonp call) which is compatible with any browser and any library system and can be styled to fit the UI of any repository. Currently ORO is the only repository to have implemented the plug-in (see this example record – http://oro.open.ac.uk/7123/) where it works only for full-text resources, though the team have developed an update that recommends full-text resources to metadata records (i.e. if there is a record in repository A that is metadata only, it will try to recommend related papers that do have full-text available), the update also filters duplicities and is more flexible.

The widget is currently available as a plug-in for EPrints from http://core-project.kmi.open.ac.uk/files/widget3.0.zip and the team are considering making it available from the Bazaar but are happy to assist in implementing in repositories running on other software.

Contacts are:

Owen Stephens (CORE Project Manager) – owen@owenstephens.com / @ostephens

Petr Knoth (Research Associate from the OU working on CORE) – p.knoth@open.ac.uk

Chris Yates (Systems Librarian at the OU who has implemented the plug-in in ORO) – c.s.yates@open.ac.uk / @chris_s_yates