Tuesday, 29 June 2010

delicious

The WRN team now have a delicious site available at http://delicious.com/welshrepositorynetwork

Delicious is a social bookmarking service that allows users to tag, save, manage and share web pages centrally. For more information about social bookmarking a useful explanation is available on Wikipedia.

Here in the WRN offices we started to use delicious to gather together sites of potential interest to us internally as a project team, but we've quickly come to realise that having access to these links would also be of use to the wider repository community within Wales.

screen shot of delicious site

There are various ways of exploring content on delicious, I find one of the most useful is to use the tags list from the right hand menu to look at sites gathered together under various themes. This will quickly take you out and beyond the sites we've gathered at WRN into the wider collection of URLs from the whole site.

As with many other Web 2.0 tools making them useful and developing a useful community actually takes a lot of effort - and to be honest our work on populating the site with links to date has been sporadic. It is another of those changes to the way we work and think that doesn't yet happen automatically. However, with time we hope to improve this situation and keep adding useful sites as we come across them.

If you know of any sites that would be useful to add to our page, or if you would like to become an active contributor to our site, then please just drop us a line via the usual email wrnstaff@aber.ac.uk.

Friday, 11 June 2010

Advocacy discussion: barriers and solutions

As part of the Repository Stream at the Gregynog Colloquium we held a discussion session on the hurdles faced by Repository Administrators when trying to encourage academic buy-in to their systems. These have been listed below and grouped into topics.

As part of the discussions we also suggested solutions for each of the obstacles. These appear after each problem raised in a different colour. The solutions are by no means exhaustive and there are some gaps.

Please add comments and suggestions to the list below, and suggest advocacy ideas that have worked for you. It is hoped this exchange of ideas will aid both our WRN community and the repository community as a whole.


Perception of time and effort required

No time
Demonstrate ease of deposit. Video materials to demo deposit using academic champions. Practice reduces time. Look into automatic completion APIs for repository.
Extra admin work
Mandate. Suggest using admin staff or PhD students to help- good practice for new researchers.
Backlog of research will take too much time to enter
Offer self-deposit to relieve backlog then encourage self-deposit. Suggest using admin staff or PhD students to help.


Benefit of repository interaction

What’s in it for me?- Apathetic to the process
Education- more widespread audience; greater recognition; higher/ faster/ sustained citation rates. Demonstration of RAE impact. Use of peers as champions. Video materials?
The paper is already published- anyone who wants to read it already has
More widespread audience- publically funded research available to whole of the public beyond subscription barriers.
Takes time to see benefit
Difference between print and electronic world?


Perception of repository importance

Lack of integration with other Uni systems and processes
Top-level buy-in to push for integration/ Mandates
Repository is an archival end point
Education on benefits- use as Management Information tool
Perceived value of system through lack of dedicated staff time
Top-level buy-in to fund positions to administer repository. Use further staff network- subject liaison; research administrators- to spread load and form experts for each school/ collection.


Copyright and IPR issues

Unsure of copyright status in papers
Use of SHERPA RoMEO/ include API on repository front page
Unsure of what was signed away with publishing license
Education. Feedback from academics to publishers. UKCoRR MoU
No longer have copies of different versions
Education
Worries about plagiarism and IPR protection
No real difference between print and online world. Getting the paper out on the web and recognised as author’s work should counteract plagiarism risk. Benefits associated with citation rates and recognition should outweigh IPR risks.


Conflicts with traditional publishing

Publishing within a prestigious journal the priority
Use of OA funds to encourage OA publishing
Older research is no longer felt relevant
Evidence of older PhD work being requested for digitisation as now informs modern research.


Other issues

Collection policy confusion- what can be accepted
Have clear collection policy stated within repository site FAQ
Can the repository take different file types?
Have clear collection policy stated within repository site FAQ- the repository can store diff file types but can end users access them easily?/ Preservation.
Don’t want to make draft version publically available

Gregynog Repository Stream

The presentations delivered during the Repository Strand at this week's Gregynog Colloquium are now available online on our project website or follow the relevant links below.

The Power of the Mandate Sue Hodges, University of Salford.
Research Publishing at Swansea University Alex Roberts, Swansea University.
Research management system at the University of Glamorgan Leanne Beevers and Neil Williams, University of Glamorgan.
Developing a repository: caring, sharing and living the dream Misha Jepson, Glyndŵr University.
Encouraging author self- deposit at Cardiff University Tracey Andrews and Scott Hill, Cardiff University.
Using statistics as an advocacy tool Nicky Cashman, Aberystwyth University.
Advocacy: the theory Jackie Knowles, WRN.

Tuesday, 8 June 2010

New IR Cross Search Service launched in Ireland

RIAN is a newly launched cross-search service for content held within 7 HEI IR's in Ireland- DCU, NUIG, NUIM, TCD, UCC, UCD, UL. An outcome of a Strategic Innovations Fund project, it was sponsored by the Irish Universities Association (IUA) and funded by the Irish Higher Education Authority (HEA).

The aim of the service is 'to harvest to one portal the contents of the Institutional Repositories of the seven university libraries, in order to make Irish research material more freely accessible, and to increase the research profiles of individual researchers and their institutions.'

Thursday, 3 June 2010

DSpace Add-Ons

The following are descriptions of a range of DSpace add-ons available to install which provide additional functionality to the software.

Commenting
The commenting feature brings informal communication capabilities to the DSpace environment. A threaded forum, or comments stream, can be attached to any web-page, community, collection, submitted item or e-person within DSpace. The add-on allows comments to be inserted by both anonymous users and authenticated ones while functionality for reviewing/moderating comments is also provided.

An example (in Portuguese) of comments appearing below a collection.

Compatible with DSpace 1.1 and 1.2 possibility of updating for DSpace 1.5.x


Controlled Vocabulary/Ontology

This add-on applies a subject classification system of the institutions choice to their DSpace instance. Once implemented the user chooses from the predefined taxonomy of keywords to describe items of information that are being submitted to the repository and that same taxonomy is used to find and access items held in the repository.

An example (in Portuguese) of a subject classification system in use.

Compatible with DSpace 1.1 and 1.2 possibility of updating for DSpace 1.5.x



Dublin Core Meta Toolkit

The Dublin Core Meta Toolkit gives DSpace administrators the ability to convert large amounts of information from their desktop database programs into DSpace compatible Dublin Core metadata. The toolkit provides a number of out-of-the-box database structures to ease data collection as well as enabling users to create custom converters for existing databases. The Toolkit is ideal for converting formats from Microsoft Access, MySQL and comma delimited value (CSV).

Compatible with DSpace 1.5.1


Embargo

Content submitted to a repository may be restricted by laws, policies, or contractual obligations that require the submitter not to publish or enable public access to the content for a period of time.This add-on allows DSpace administrators to build in functionality to handle embargoed items in the workflow. It allows for the metadata of the embargoed item to be indexed and viewed, but the full text of the item cannot be retrieved while the embargo is in force.

Compatible with DSpace 1.4.x, 1.5.x and 1.6


Format validator and virus check

This add-on provides rough-and-ready format checking by identifying that the file/bitstream extension matches formats verifiable by JHOVE. Currently DSpace accepts a deposit's file extension as gospel, so a user could tack a .txt extension onto a GIF and DSpace would assign the incorrect format to the file based on that incorrect extension. It also checks the file for the presence of viruses.

Compatible with DSpace 1.4.x and 1.5.x


Recommendation

This DSpace content-based recommendation feature automatically shows links to articles within the repository that a user is likely to be interested in by mapping items related to the document currently being visualized by the user. Similar to functionality seen on Amazon this feature can greatly improve user experience.

Compatible with DSpace 1.1 and 1.2 possibility of updating for DSpace 1.5.x


Request Copy

This add-on creates a semi-automated mechanism whereby would-be users can request and authors can email an individual copy of a full-text deposited within the repository whose full-text access privileges are set to restricted.

The purpose of this feature is to increase both the content deposited in an IR and its immediate usability by providing a way to accommodate the (frequently unfounded) worries of authors and their institutions about copyright infringement during any publisher embargo periods on public self-archiving.

The link is provided on all non-OA items and activates a form where the user requester must enter his/her email address and name, and may add a comment, and press a 'Request-a-copy' button. An email is sent to the depositor and the email message contains a token. Using that token, the author may reply, by just clicking in one of the two buttons available: 'Send Copy', 'Don't send copy'.

Compatible with DSpace 1.4.x and 1.5.x


Semantic Search for DSpace
Semantic Search allows intelligent search of DSpace content, using Semantic Web technologies and performs knowledge discovery on DSpace metadata. Semantic Search uses the science of meaning in language, to produce highly relevant search results.

Compatible with DSpace 1.4.2 possibility of updating for DSpace 1.5.x


Statistics
This add-on allows gathering, processing and presenting usage, content and administrative statistics from the repository. The system is based on components that can easily be configured, changed or extended, to respond to different information needs.

Compatible with DSpace 1.4.x and 1.5.x (JSPUI)


Tombstoning
The add-on allows a tombstone to be added when an item is withdrawn from the repository. The user selects from 3 reasons for withdrawing the item: 1. Removed from view by legal order; 2. Removed from view by the [authority doing removal]; 3. Removed from view at request of the author.

Compatible with DSpace 1.4.x and 1.5.x


Further details about this range of options are available here. Any WRN partner interested in discussing or investigating any of these DSpace add-ons should contact the team using the usual address wrnstaff@aber.ac.uk

Wednesday, 26 May 2010

Statistical Evaluation

The WRN project team is currently looking in some depth at evaluating our project activities. We are concerned with gathering both non-numerical qualitative data to analyse about our activities (stories/opinions and narratives from our users) alongside some more quantitative statistical measures about repositories and their use across Wales.

Surprisingly, the collection of the statistical aspect of our evaluation data - something we originally envisaged as being the quick and easy stuff to generate - has proved to be quite problematic. Establishing a base line set of measures has been difficult with varying data coming out of everyone's systems and a lack of consistency in obtaining measures for central recording purposes. Even the most basic measure of all, i.e. how many deposits are recorded each quarter in each repository, can be difficult to obtain and we are only just managing to make this measure something we accurately record in 100% of the repositories across Wales.

So, while we hear lots of stories about the power of statistics and the help they can offer in making a case for a repository, it seems that we still have some work to do to convince people it is worth the effort of setting up robust statistical measures. We thought we'd try and address this by providing information about a selection of basic options open to most repository managers. The following information, from the Digital Repositories InfoKit, provides an overview of some of the most commonly employed methods of collecting statistics:

http://www.jiscinfonet.ac.uk/infokits/repositories/management-framework/usage-statistics

Any WRN partner interested in reviewing their statistics and collection methods, or needing assistance in setting up any of the tools mentioned here, should contact the project team via the usual email at wrnstaff@aber.ac.uk

Monday, 17 May 2010

CRIS Event Cafe Society Write Up - Group 4: Data Quality

At the JISC/ARMA Repositories and CRIS event 'Learning How to Play Nicely' held at the Rose Bowl, Leeds Met University on Friday 7th May the afternoon was dedicated to a cafe society discussion session. Four topics were explored by delegates and over the course of four blog posts we are disseminating the facilitator reports from each session.

Please use the comment option below to contribute or comment on these discussion topics.

Group 4 - Data Quality
Facilitator: Simon Kerridge, ARMA

The issue to be discussed was Data Quality and it was framed as “How do we ensure data quality in our systems? What are the best methods for getting data out of legacy systems?” however a number of related issues also cropped up in the discussions

The time was split into four 30 minute slots with delegates attending as many times as they liked. Some issued were identified on many occasions and others less often, most are presented.

Unique Identifiers - (for many, perhaps all data items) was considered to be a big issue. Examples included:
• PersonId: usually not a single one is used in an institution; the various systems (eg HR, CRIS, IT, PGR and others) generally used different ids. Moreover the HR system, which might seem like the obvious primary source, might have multiple entries for the same person (if they had more than one contract), but worse, only usually had entries for paid staff – there are many examples of unpaid people involved in research.
• FunderId: many expressed problems with de-duplicating similar looking funders. It was thought that the funders themselves could/should provide a unique reference

Authority Lists
• Even if an institution could de-duplicate all their own data and use a single id internally, it was likely that other institutions would not use the same system and so exchange of data would be problematic. This could be resolved by an agreed independent authority (for example staff HESAid). But one does not exist for (for example) Funders. This was thought to be something that would be extremely useful.
• A national policy on national data (eg FunderId) was seen as desirable
• Scopus / WoS / Pubmed were seen as possible partial authority lists for publications (and authors) but they contain differing information and do not cover the whole spectrum – and indeed not worth using in some subject areas

Data Quality
• Many places have a feedback loop (eg monthly show academic staff what has been added to their profile).
• Use carrots and sticks, eg only allow publications from the IR/CRIS to be used on internal promotions or for the annual report
• One stick method that was generally liked was the Norwegian system where in order to receive public funding for a research project a prerequisite was that all of the authors publications (where possible) had to be submitted to an open access repository
• Good enough is good enough
• Data should be re-used where possible, but only where it is appropriate; sometimes systems can be developed organically to meet too many requirements and end up not doing any of them well
• Try to think about potential future use of data and collect what you might need – but don’t go overboard. For example one institution has additional classification for all publications using the library of congress system, but so far has not used that meta data
• Have processes in place to check data quality on input and as a secondary check to ‘approve’ the data – one institution has a ‘checked by Carol’ flag!
• In general self-archive was not approved of due to the lack of quality and copyright checking
• There is some good software available for data quality checking against publications (using Scopus / WoS / PubMed data) and for data aggregation
• One institution uses Lieberstein string comparison to help identify possible duplicate entries
• The RAE / REF was seen as a good driver for increasing data and data quality
• Periodic data maintenance and cleansing is essential, but often not undertaken – data quality is unsexy!

Data Sharing
• Authority lists would make this much easier – surely some work can be done in this area?
• Two institutions recounted the issues of doing a joint submission to the RAE and the data fusion issues. It simplified a later choice of IR, the second institution simply plumped for the same as the first

Parallel Systems
• Many reported using parallel systems within their institutions as the data in the (normally) central system was simply not trusted by all the users.

Priority
• It was universally agreed that problems tended to occur where an issue was not given a high enough priority by the institution. For example, if a DVC took an interest in the quality of data in the IR then resources were made available to improve the processes and data quality.

Legacy Systems
• Often resources were made available for moving data from a legacy system to a new one
• However this was often seen as solving data quality issues, whereas in reality it is an ongoing issue, but often not resourced as such

Primary Data Source
• It was agreed that there is not one system for all an institutions data needs. Indeed that might not be desirable as individual systems tend to meet different requirements.
• However it should be known where the primary data resides, understanding that for a single record (eg information about staff) this might not all be in one system

Summary (the facilitators view of the discussions)
Overall the discussions were very open and positive. Many participants took away some ideas for use in their own institutions. Most were also sure that they would not find it easy to get the resource required to do a proper job in improving their data quality. Some systems were reportedly working very well, other systems were not. In general the former were the result of new developments whereas the latter tended to be systems that have been in use for a while. Hopefully this is the result of better new technology being used to support processes; however it seems likely that the reason is more to do with system being neglected once they are seen as being embedded and working.