The Unfulfilled Promise of Aggregating Institutional Repository Content (Guest Post)

Our thanks to Neil Stewart, Digital Repository Manager at City University London for the following guest post which raises some interesting questions for us all.
—-

A very good question was posed on Stephen Curry’s blog by Björn Brembs recently (Curry and Brembs are a couple of the more prominent figures supporting the Elsevier boycott):

I’ve always wondered why the institutional repositories aren’t working with, e.g. PubMed etc. to make sure a link to their version is displayed with the search results. I mean, how difficult can this be?

This got me thinking, how difficult can it be? Aggregating and re-using institutional repository (IR) content at subject level is, after all, one of the promises of the Green road to Open Access.

The infrastructure is already in place, in the form of the many OAI-PMH compliant institutional repositories out there, and there is also the SWORD client, which allows flexible transfer of repository content. Some examples do exist- for example the Economists Online service, which harvests material from selected economics research-intensive universities, then makes it available via a portal. But (to my knowledge) there has been no work done to provide a way of e.g. ensuring all a repository’s eligible physics content is automatically uploaded to ArXiv, or all biomedical research to UKPMC.

Subject repositories have gained critical mass in certain disciplines (to add to the examples above, see also RePEC for economics, SSRN for social sciences and DBLP for computer science), meaning that if a paper doesn’t appear there, it’s far less visible. This means that the incentive to post locally is greatly reduced- yes, your paper will appear in Google, but a paper in ArXiv will appear both in Google and in the native interface of the repository where everyone else in your discipline is depositing.

So if the infrastructure is there and the rationale to create these links exists, why has it not been happening to any meaningful extent already? I suspect it’s because of the fact that the IR landscape is, by its very nature, a fragmented one. Those with responsibility for IRs (managers, IT people, and senior management) are understandably concerned with local issues: ensuring that IRs are properly managed and integrated with the university’s systems, as well as the usual open access and service awareness-raising and advocacy. Having time to think about the automatic population of ArXiv with papers from your home repository is probably pretty far down one’s to-do list.

That’s not to say that repository managers are oblivious to these issues- but here another problem arises. Few individual repository managers, I would guess, would think that they individually could negotiate with and persuade ArXiv that automatic harvesting of physics content from their repository, and their repository alone, would be worth ArXiv’s while. This is, perhaps, where UKCoRR (or other national bodies- JISC perhaps?) might come in. If ArXiv or similar subject repositories could be persuaded of the merits of harvesting IR content (whether full text or metadata pointing back to IR holdings), it would allow all repositories to plug in to this system, and offer it as a service to academics (two for one deposit- local IR and ArXiv at the same time!)

So, what do people think? Is there any appetite for turning this into a project that UKCoRR members could take forward, perhaps with UKCoRR and/ or JISC oversight? Comments please!