Friday, August 27, 2010

Guest post: Elena Pierazzo on the Arabic ENRICH schema

Elena Pierazzo, from the Centre for Computing in the Humanities at King’s College London describes a new metadata schema for Arabic manuscript cataloguing. The Wellcome Arabic Manuscript Cataloguing Partnership, as previously announced, is working toward providing greater access to the Arabic manuscripts held in the Wellcome Library. One of the goals was to employ a metadata schema appropriate to the rich descriptions that can be captured for this type of material. Elena Pierazzo developed this schema.

The task of creating a cataloguing model for the Wellcome Library has represented for me an exciting opportunity to learn a lot about Arabic manuscripts. My experience so far, although extensive, had only concerned Western manuscripts and I was curious to see where the differences, if any, were to be found. Needless to say, the challenge has proved to be invigorating and rewarding at the same time.

Two main design principles were established from the very beginning:

1. The model should provide a flexible, extendable framework able to accommodate sophisticated data relating to the structure of the physical object (the manuscript) as well as its cultural content. In a manuscript the text cannot be separated from its container without loss of information, the two representing the yin and the yang, body and soul of the same entity.

2. The model should be compliant to the main international standards for cataloguing and classification.

A model based on the Text Encoding Initiative (TEI) seemed to be then the best choice. The TEI Guidelines provide a very flexible but rigorous framework for encoding humanities data of a heterogeneous nature. TEI uses XML as a base technology, meaning that the records expressed in TEI are software and platform independent and can be easily used on the web. Furthermore the TEI Guidelines provide scholarly support for every sector of the humanities. In particular, TEI has proved extremely successful for cataloguing manuscripts, having been chosen as the basis of a very important European cataloguing project called ENRICH which had the aim to provide a framework for the cataloguing of European manuscripts across different countries and libraries.

For us, the possibility of adopting the same format used by the ENRICH project was very appealing, as it would allow us to share and interchange data with other libraries and scholars in Europe and beyond. On the other hand, this model had been designed to describe and catalogue western manuscripts; therefore we soon found that some adjustments were necessary. We had, for instance, to increase the number of possible calendars to cover a wider variety of dates and to substitute entirely the list of the types of script used by the scribes. The project in the end, required extensive customization of the schema in order to ensure compliance with established standards (such as those of the Library of Congress) as well as emerging ones (the Ligatus project), while at the same time trying to maintain compliance with ENRICH.

While we were developing the model for the Wellcome Library, another important initiative for cataloguing Arabic manuscripts was coming to the same conclusions; this was the joint JISC project undertaken by the Bodleian Library in Oxford and Cambridge University Library. The model produced for the Wellcome Library was evaluated positively by the Oxbridge project team, and they decided to adapt it for their own catalogues. This makes our model the main format for cataloguing Arabic manuscripts in the UK.

We think that what we have called the Arabic ENRICH schema could also be used by other libraries and projects, and for this reason we decided to make it available for everybody. A annotated template of a typical record and the ODD file that is used for generating a TEI schema are available via the Wellcome Library’s project website. (If you don’t know what an ODD file is or you are not familiar with the TEI Guidelines, you can find more information from the TEI website).

Dr Elena Pierazzo
Centre for Computing in the Humanities
King’s College London

 
Design by Free Wordpress Themes | Bloggerized by Lasantha - Premium Blogger Templates