Thursday, 15 July 2010

Reflections from OR2010: Part 2

Another activity I was involved in at OR2010 was the Developer Challenge as a ‘non-techie’ judge. Organised by the DevSci project (managed by UKOLN, funded by JISC), this year’s challenge was to ‘create a functioning repository user-interface, presenting a single metadata record which includes as many automatically created, useful links to related external content as possible.’

The winning entry was from Richard Davis and Rory McNicholl, University of London Computer Centre, who enhanced the records of the Linnean Collections, held on EPrints, which the ULCC are responsible for. As many of the metadata fields in the record as possible linked out to external sites- some general such as Google and Wikipedia and some more subject specific such as horticultural indexes. Although only one metadata record was demonstrated, the links which appeared in the record were determined by the entries on a master sheet (an excel spreadsheet) and therefore, would apply to all records within the repository. This development came out top as although the links weren’t truly automated, they were managed externally, it was felt that this was actually advantage for a non-techie repository manager could update for themselves rather than calling on the support of a tame developer.

Coming in a narrow second was ChEsis, presented by Sam Adams, University of Cambridge, which created links to enhance Chemistry e-thesis records. Links were available to show chemical structures of molecules/ crystals used/ created along with their mass spectrums. Fun links were also included such as the Last FM playlist of the student when their thesis was submitted and the BBC headlines for the day.

Another entry utilised OpenCalais to automatically create links from their created repository record. OpenCalais is a free to use service which automatically creates links from open content to other open content sites such as Twitter, YouTube, Flickr etc. It can be used to add a bit of fun to any open source web content such as a blog but be warned the links are automatic and you can’t necessarily restrict what content it links to!

A full write-up and videos of all the Developer Challenge entries is available via DevSci blog.

Wednesday, 14 July 2010

Reflections from OR2010: Part 1

Last week Antony and I attended the 5th International Conference on Open Repositories in Madrid. The conference boasted a fully packed, 4 day programme including ‘General’ presentation sessions, User group sessions, working groups and forums. Nearly 500 delegates were in attendance, representing countries from all across the globe.

One of the reasons Antony and I were in attendance was to present a Poster, authored in conjunction with Glen Robson and Ioan Isaac-Richards from the NLW, about the work of the Welsh e-theses harvesting service. A copy of the poster is available from the Aberystwyth University repository CADAIR.

With parallel streams running for the majority of the programme there were too many sessions for one person to attend- let alone comment on- so below I’ve discussed the sessions I found of most interest and relevance to the work of the WRN.

The first couple of interesting sessions related to nationwide open access/ repository support networks: the first located in Germany; the second located in Australia. The OAN (Open Access Network) initiated by the DINI (German Initiative for Network Information) and funded for a two-year term by the German Research Foundation (DFG), has created an over-arching infrastructure between quality certified German IRs to act as a single interface for research promotion and to support other DINI Open- Access projects. DINI certification, a certificate of IR quality, denotes that an IR utilises international standards, such as DRIVER for metadata, has determined and makes its policies regarding use clear and available, and is well-positioned within both its own institution and the greater open access arena.

The OAN harvests data from the DINI certified repositories within Germany, aggregates the data and puts it through a number of value added modules such as data clean-up, FT link finding, OCR, and citation tracking. The aggregated data is then presented within a single search interface, and acts as a single point for data export and further harvesting. It also acts as a single point for the other OA projects, some of which were presented at OR2010, such as OAS (Open-Access Statistik) and OAFR (OA Subject Based Repositories).

The OAN is also responsible for increasing the number of certified repositories and offers support to repository managers in order for their repository to achieve certification. The alignment of WRN repositories, specifically in the area of policies, is an area of focus for the WRN team this autumn so the process of DINI certification will work well as a basis for this process.

Caroline Drury, University of Southern Queensland presented on the ANDS (Australian National Data Service), a service looking to inform and influence national policy on the curation of data. ANDS has created Research Data Australia, a central collection of curated data sets produced by Australian academics. ANDS also offers the following services: Publish my data; Register my data; Identify my data; which are related to this central collection. Also based at Queensland is Tim McCallum, the technical support half of the CAIRSS repository support team (the team resembles that of the WRN team with one technical and one organisational support officer). Piggy-backed on to a CAIRSS repository survey, ANDS has been investigating the data management practices at Australian Universities. This survey found that there was a low-level of repository manager involvement within the University in regards to data management, a trend that ANDS are looking to change with Senior Management intervention, in conjunction with CAIRSS. Data management is a new area of interest for the WRN so we will be watching the progress of ANDS with interest.

The other session of direct relevance and interest in regards to the work of the WRN, and more specifically the poster presented e-theses harvesting service, was from Nikos Houssous, National Documentation Centre (EKT), Greece. Nikos was describing the National Archive of PhD Theses developed at EKT, a single search interface presented within DSpace. Like the NLW in Wales, the EKT have a historic role in the collection of Greek PhD theses, a role they were looking to extend to the digital realm. The EKT are undertaking a digitisation project of the PhDs currently held in print form, as well as encouraging institutions to submit theses electronically. Records are held in a bespoke theses admin system and then pushed to both the DSpace system (via SOAP in ETD-MS (a metadata standard for e-theses devised by the Networked Digital Library of Theses and Dissertations (NDLTD)) and to the EKT Library Catalogue (via Z39.50 in UNIMARC). The DSpace collection also forms a central harvesting point for DART Europe, a service aggregating PhD theses records for the whole of Europe. I was unaware of NDLTD and ETD-MS before Nikos’ presentation and their relation to DART is of interest to the next stage of the Welsh e-theses harvesting service.

Through other sessions and networking I became aware of two other national aggregation services: NARCIS in the Netherlands and RCAAP in Portugal. Whereas RCAAP is an aggregation of IR content, NARCIS is an aggregation of IR and National information, such as DANS (Data Archiving and Networked Services). There are also plans to incorporate the data from METIS the Dutch national CRIS, which will provide much richer information about researchers and their projects. Anecdotally, the NARCIS presenter reported that theses and dissertations were the most frequently retrieved items through the system, perhaps as NARCIS provided the only central point of discovery for these types of items.

It’s certainly nice to know that the work of the WRN parallels that carried out within other countries and that we have an extended network to call upon when in need of best practice advice.

Monday, 12 July 2010

2010 Ranking Web of Repositories

The second edition of 2010 Ranking Web of Repositories has been published:

Close to 1000 repositories have been analyzed this year and the top 800 are ranked here according to their web presence and visibility. The aim of this ranking is to support Open Access initiatives and therefore the free access to scientific publications in an electronic form and to other academic material. The web indicators are used here to measure the global visibility and impact of the scientific repositories. Two lists are available - top 800 and top 800 institutional.

I've done a bit of trawling and number crunching on the institutional list extracting both a ranked list for UK only institutional repositories, and a subset of those Welsh repositories that appear in the list. Of the top 800 institutional repositories globally the UK has 82 entries and Wales has five entries from those. Details:

International rank (UK rank in brackets)

257 (18th in UK) Aberystwyth
601 (62nd in UK) Bangor
696 (73rd in UK) Glamorgan
730 (78th in UK) UWIC
752 (80th in UK) Trinity

Bits and bobs

Over the past week or so I've collected together a few random repository related items that might be of interest to our partners. Enjoy!

Copyright Workflows
Ann Hanlon and Marisa Ramirez. "Asking for Permission: A Survey of Copyright Workflows for Institutional Repositories" 2010
Available at:

This poster details the results of a US survey about copyright workflows and was presented at the Annual Conference of the American Library Association, Washington, D.C. in June 2010. Exploring staffing, resources, activities and tools employed to clear copyright for published work, with the intent to deposit into an IR, this nicely summarises their preliminary findings. In 2008 a survey was undertaken in the UK on the same topic:

Jones, Mark. Intellectual property rights survey, University of East Anglia, 2008

New Team Digital Preservation Film
WePreserve and Planets have released their fourth Team Digital Preservation film. Team Digital Preservation and Arctic Mountain Adventure is available to view at

Digiman is baby-sitting his niece and nephew for the weekend, but things go horribly wrong when he sends them out on an arctic mountain adventure. Never fear trusty viewers, PLATO, the Planets Preservation Planning tool, comes to the rescue to show Digiman the error of his ways.

Other editions of these popular videos are available here

Metadata Forum
At the Open Repositories Conference 2010 last week in Madrid the Metadata Forum was officially launched. A new initiative, run by UKOLN at the University of Bath and funded by JISC, the Metadata Forum is planning four face-to-face meetings throughout the UK and ongoing conversations online where anyone who has an interest in metadata can ask for help, share experiences and learn from others. The Forum is open to everyone, from novice to expert and anyone in between who deals with metadata in their day-to-day work.

Get involved by following the Forum blog - or following the Forum on Twitter – @MetadataForum

SWORD v2.0: Deposit Lifecycle white paper

The aim of this paper is to stimulate discussion around introducing more complete treatment of "deposit lifecycle" management of objects in digital repositories, and to propose the next small steps in this direction. Abstract:

"SWORD is a hugely successful JISC project which has kindled repository interoperability and built a community around the software and the problem space. It explicitly deals only with creating new repository resources by package deposit a simple case which is at the root of its success but also its key limitation. This next version of SWORD will push the standard towards supporting full repository deposit lifecycles by using update, retrieve and delete extensions to the specification. This will enable the repository to be integrated into a broader range of systems in the scholarly environment, by supporting an increased range of behaviours and use cases."