Technology words for repository managers

Posted on behalf of Nancy Pontika, UKCoRR External Liaison Officer and Open Access Aggregation Officer for CORE

The role of the repository manager is constantly evolving. The repository manager of today needs to be aware and interpret not only their affiliated institution’s open access policies, but also the national and international ones that emerge from public funding agencies. The proliferation of these policies introduces technical requirements for repositories, the use of current research information systems (CRIS) and the installation of various plug-ins for example, and often repository managers serve as intermediary between the IT department of their institution and their supervisors or library directors and have to communicate messages and requests. A couple of months ago on the UKCoRR members list we had a discussion around the specific technological terms that repository managers hear regularly.

Even though currently I am not a repository manager, I couldn’t help but sympathise. The past year that I have been working for CORE – a global repositories harvesting service – I am the only non-developer in a team of four (wonderful!) developers. As a result, there are times where I feel lost in our discussions, so I decided to put together a list of often used technical terms that can relate to repositories. As a first step, a Google Spreadsheet was created with some basic terms. Then the UKCoRR list members were asked to add more of these jargon words and also weight each term based on how often they tend to hear it. (I am going to keep this list over there for future reference and do not hesitate to save local copies if you find it useful.) In the end, with the help of two CORE developers – Samuel Pearce and Matteo Cancellieri – we tried to provide brief definitions and give simple examples when possible.

The following table contains a list of these terms and their definitions. Since this is not an exhaustive list (and it was never meant to be) feel free to add other terms in the comments area of this blog post.

Web technologies
Apache	Apache is a web server. When your browser such as Internet Explorer or Google Chrome requests a website (for example http://core.ac.uk) Apache is the software that returns the webpage to your browser.
Tomcat	Tomcat is a web application server. Tomcat works like a web server but it serves more complex pages and operations than the web server. For example your online banking system uses a web application server while this blog uses a web server.
Java	A programming language that usually runs in Web Application Servers (such as Tomcat). CORE uses Java for running the harvesting of the repositories.
PHP	An Open Source scripting language particularly suited for website development. It is commonly installed with Apache, which allows web pages to be more complex without having to run a separate web application server such as Tomcat. For example, CORE uses PHP in its web pages.
robots.txt	A text file that specifies how a web server regulates access of automatic content downloaders. For example, CORE follows the rules in the robots.txt file. The rules may limit the number of requests per second made to your webserver or restricts access to certain places on your website, such as a login page.
SSH (Secure Shell)	A protocol that allows one computer to connect to another and send commands by typing in text rather than clicking buttons.
MySQL	MySQL is an Open Source Database Management System owned by the Oracle Corporation.
Perl	A programming language usually used for scripting and text processing.
JavaScript	A programming language that usually runs in your browser to allow web pages to be more dynamic and reactive. Web forms may use JavaScript to ensure they are filled in correctly before submitting them.
Crawler	A crawler is a machine which automatically visits web pages and processes them. A common example is Google, which crawls websites, extracts content and makes it available via its search engine.
Cron jobs	Programs that are set to run at specific times. For example, they are used for periodic tasks such as running automatic updates every day at midnight or extracting and processing the text from your full-text outputs in your repository to make them searchable.
Development
dev site	A website used for testing. This allows developers to test and process information without the risk of breaking the “live” production website.
Git	A version control system, like Subversion (SVN). It enables tracking changes in code.
SVN/Subversion	“SubVersioN” – it’s a version control system, like git. It enables tracking changes in code.
clone	A command in Git that copies code from a remote server to a local machine.
Other
UNIX	An operating system analogous to DOS, Windows and Mac OS. Nowadays, Unix refers to a group of operating systems that adhere to the Unix specification. An example of Unix based operating systems are Linux and Mac OS.
LINUX	An operating system based on Unix. The Linux code is open source and allows anyone to modify and distribute software and source code creating different variants of ‘Linux’. The most popular version of Linux are Ubuntu, RedHat, Debian and Fedora.
HTTP proxy	An HTTP Proxy is a gateway for users on a network to access the internet. This allows large organisations to track internet usage and also limits the amount of downloaded data by storing it within the proxy. The next time the same website is requested, the local copy is sent to the user rather than re-downloading it.
External resolver	A external resolver service (such as The DOI® System or HDL.NET®) allows a digital object, such as research outputs, to have a unique global identifier.
Mirrors	A Mirror is a copy of another website. An organisation may mirror a website to reduce traffic and hits to the source website.
Metadata Protocols
OAI-PMH	OAI-PMH, (Open Archives Initiative Protocol for Metadata Harvesting) is a standard for exposing metadata in a structured way – particularly for computers to understand.
SWORD	SWORD (Simple Web-service Offering Repository Deposit) is a protocol that simplifies and standardises the way content is deposited into repositories.
Data access
API	An API (Application Program Interface) is a set of rules that defines how parts of software or two separate programs interact with each other. For example, the CORE API allows developers to use CORE’s data from within their own applications.
Widget	A small application with limited functionality that runs within a larger application or program. The CORE Similarity Widget retrieves similar articles based on metadata and runs within the larger application of a repository.
Plugin	Similar to a widget, a Plugin adds extra functionality to software. This may add new features or change the way an existing feature works.
Text mining	The process in which high quality data is extracted from text using a computer.
Data Dumps	A single or multiple files that contain a large set of data.

Leave a Reply Cancel reply