======================================================= Archimedes Palimpsest Digital Release README Document ======================================================= :Authors: Mike Toth, Doug Emery :Date: October 29, 2008 .. contents:: .. 1 Rights and Conditions of Use 2 Intended Audience and Consumers 3 Digital Project Data Set Purpose 4 Data Set Contents 4.1 Core Data Content 4.2 Documentation 4.2.1 External Documentation 4.2.2 Internal Documentation 4.3 Supporting Functional Files 4.4 Supplemental Files 4.5 Contributed Research Files 5 How to Use This Data Set 5.1 General Orientation 5.2 Metadata 5.2 Computer Access Tools 5.3 Scientific Information 1 Rights and Conditions of Use =============================== The Archimedes Palimpsest data is released with license for use under Creative Commons Attribution 3.0 Unported Access Rights. It is requested that copies of any published articles based on the information in this data set be sent to The Curator of Manuscripts, The Walters Art Museum, 600 North Charles Street, Baltimore MD 21201. 2 Intended Audience and Consumers ================================= The Archimedes Palimpsest Digital Product is intended to serve any interested user or party. However, its content is focused on serving the following groups. 1. Scholars of Greek and mathematics 2. Application providers 3. Libraries and archives 4. Image scientists, and scientists in other disciplines interested in the production of the images 3 Digital Project Data Set Purpose ================================== The Archimedes Palimpsest Digital Product provides all the digital information available on the Archimedes Palimpsest in a single digital data set, with a standard structure. Its purposes are threefold: 1. Serve as the authoritative digital data set of images in a standardized format that meets the needs of users, information providers, archives and libraries. 2. Provide derived information (i.e. transcriptions, processing information) in the context of digital images of the original manuscript in a single integrated package. 3. Offer a standard product sustainable by users to which current or future contributors can add additional standardized information (e.g. alternate texts, image analyses or conservation information). 4 Data Set Contents ==================== This data set consists of: 1. a *core* content set digital images and transcriptions of the Archimedes Palimpsest, each with accompanying metadata and checksums 2. project-generated and third-party documentation of all included components 3. supporting functional files, including XML schemas, and cascading style sheet files 4. supplemental versions of the transcriptions by treatise and work 5. a directory for researcher contributed content files, not a part of the core data set 4.1 Core Data Content --------------------- The core content of images and supporting transcriptions is the focus of the Digital Product. For each folio, a comprehensive set of registered images is provided of the palimpsest. Available transcriptions are provided to support use of the images. For this release, the core data includes: a. Image data consisting of large 8-bit image files, including requantized raw images, processed pseudo-color images, registered Heiberg images andregistered XRF images. All these files include embedded metadata and metadata files. b. A set of TEI (Text Encoding Initiative) conformant XML tagged Unicode transcriptions from all Archimede and Hyperides texts, including embedded metadata and associated metadata files. c. Spatially mapped transcriptions for each palimpsest folio of the Archimedes and Hyperides texts. For each folio in the palimpsest, the data set provides: * All eight-bit raw and processed registered TIFF images for the directory’s folio, including XRF images or images of prints of Heiberg’s 1906 photographs, when they exist, images of photographs of an unfoliated palimpsest leaf from Cambridge University, and an image a negative of folio 57v from the University of Chicago. * For all of the Archimedes and available Hyperides texts, an XML encoded transcription of the directory’s folio spatially mapped to all the registered images in the directory * An XML metadata file for each of the TIFF files in the directory [forthcoming] * An MD5 checksum file for each of the TIFF and XML content files All file names follow strict naming conventions to facilitate easy identification of file type and content. The core content set contains folio-by-folio versions of the Netz-Wilson transcriptions of the Archimedes texts, and of the Hyperides texts transcriptions and line-by-line text-to-image spatial mappings in integrated files. These files collect in one place transcription mapping data for all images of a single undertext folio. In addition to its images and transcriptions, each content directory provides preservation information in the form of: * Metadata embedded in image files * XML metadata files for each image [forthcoming] * Metadata embedded in the mapped transcription file * MD5 checksum data for all TIFF and XML files to ensure their fixity The metadata for images and transcriptions complies with the Archimedes Palimpsest project metadata standards, which are provided with this set as documentation. The metadata provides investigative, data sharing and scientific information on the images and transitions. Metadata are data elements about the content, quality, condition, and other characteristics of the data sets that make up the digital holdings. Metadata records are produced according to rules and definitions governing several subtypes: 1. Identification Information 2. Spatial Data Reference Information (images and spatial indexes, only) 3. Imaging and Spectral Data Reference Information (images only) 4. Data Type Information 5. Data Content Information 6. Metadata Reference Information 4.2 Documentation ----------------- Documents are provided to fully describe the contents of the data set and facilitate their use. There are both *external* and *internal* documents. External documents detail data standards, file specifications, and technologies used by the project, such as the TIFF specification, MD5 checksum algorithm, and various XML-related technologies. Internal documents detail project data standards and practices, image processing algorithms, and information required to use the data set not detailed in the external documentation. 4.2.1 External Documentation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ External documentation includes: * ASCII specification [forthcoming] * CSS 2.0 * Dublin Core [forthcoming] * GNU TAR file archive algorithm [TBD] * GZIP file compression algorithm [TBD] * HTML 4.0 * MD5 hash - rfc1321.txt * PDF 1.7 * RELAX NG * TIFF 6.0 * XML 1.0 * XML Schema * XSL 1.0 * Unicode - Unicode Code charts - Unicode specifications and technical reports * ZIP file format specification 6.3.2 4.2.2 Internal Documentation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Internal documentation includes: * Archie Image Manipulation software documentation - Manual [to be updated] - Algorithms employed [forthcoming] - C code [TBD] * File Naming Conventions * Folio Index * MD5 How-To * Metadata Data Dictionary [forthcoming] * Metadata How-To [TBD] * Metadata Standard * Transcription Integration Plan * Transcription Metadata Standard * Scientific documents describing: - Spectral image capture techniques [forthcoming] - XRF image capture techniques [forthcoming] - Image processing [forthcoming] * XRF Metadata Extensions 4.3 Supporting Functional Files ------------------------------- The data set provides supporting files needed to share or work with the Digital Product content data. Primarily these files are XML schema documents used to validate and process transcription, spatial index, and metadata files in XML format. The following supporting file collections are included. * Archimedes-Palimpsest: Custom XML schema files for working with project metadata XML files and custom mapped transcription formats [forthcoming] * TEI: Documentation and XML schema files for the TEI guidelines * Dublin-Core: XML schema files for the Dublin Core metadata elements * CHS: RELAX NG schemas for Center for Hellenic Studies spatial indexing XML files 4.4 Supplemental Files ---------------------- The purpose of the Supplemental material is to provide alternate presentations of XML-encoded data for scholars, application developers, and other interested parties who may want to use them. It contains “master” files created for the transcription and spatial mapping efforts. For each work there may be: * TEI XML-encoded transcripts - All Archimedes works have a Netz-Wilson transcription - Heiberg transcriptions are provided for the Archimedes texts On Floating Bodies, On Spiral Lines, and Sphere and Cylinder - Hyperides transcriptions are included * XML-encoded line-by-line mappings of transcriptions to images The combined folio-by-folio spatially mapped transcriptions files included in the core data set have been derived via XSL transformation from the transcription and mapping files. 4.5 Contributed Research Files ------------------------------ This Contributed Research data is intended initially to include useful and specialized images contributed to the project by image scientists. These are images useful to scholars, but not integrated into the core data set because, for example, they are not registered to core image dimensions or they are not accompanied by complete metadata. Over the life of the data set, this directory may be used to include carefully vetted contributions that provide critical contributions to the data set, such as conservation, codicological, and other information. This component includes experimental diagrams, and may later contain close-up images of special regions of interest and images captured or processed using experimental techniques. 5 How to Use This Data Set ========================== This data set contains supporting documentation to enable discovery of the data and available access tools. The files named below may be located by using the file 1_FileList.txt which accompanies this ReadMe file. 5.1 General Orientation ----------------------- For General Orientation to the data set, see * 0_ReadMe.txt: this file * 1_FileIndex.txt: list of files in the data set * FileNamingConventions.txt: a description of naming conventions for image, XML, and MD5 files * FolioIndex.txt: a list of the Archimedes Palimpsest folios by work, undertext folio, and Euchologion folio * MD5_README.txt: a brief how-to on using MD5 files to confirm the integrity of content files * TBD: A lay description to the image types. 5.2 Metadata ------------ Metadata information for the images and transcriptions is described in several supporting documents. * Image_Metadata_Standard.pdf: The projects imaging metadata standard document. * Image_Metadata_Standard_XRF_Extensions.pdf: Extensions to the metadata standard to support XRF imaging * Transcription_Metadata_Standard.pdf: Metadata elements for transcriptions and spatial mappings of transcriptions to images * Transcription_Metadata_Mapping.txt: A mapping between project-selected Dublin Core identification elements and TEI header elements used for metadata in the transcription files * MetadataDataDictionary.txt: A complete dictionary of the metadata elements used in all contexts * TEI documentation: Documentation of the TEI guidelines used for the transcriptions * rfc5013.txt: Dublin Core metadata elements * DCMI_Metadata_Terms: Dublin Core metadata term specification * ArchimedesPalimpsestXML.xml: Documentation of Archimedes Palimpsest custom metadata schemas for metadata and content management 5.2 Computer Access Tools ------------------------- For machine access to the files in this data set the following files can be used. * ArchimedesPalimpsestXML.txt: Documentation of Archimedes Palimpsest custom metadata schemas for metadata and content management [forthcoming] * Content.xml: a machine readable table of contents for the data set, connecting content files to their unique identifiers, metadata records, and folios [forthcoming] * FolioIndex.xml: a machine readable list of the Archimedes Palimpsest folios, by work, prayer book folio, and undertext folio [forthcoming] * XML schemas and DTDs for working with content XML files, including TEI, DublinCore, and custom schemas created for the data set * TEI documentation: Documentation of the TEI guidelines used for the transcriptions 5.3 Scientific Information -------------------------- The included scientific texts provide descriptions of image capture and processing techniques used to create the data set. * ImageCapture.txt: Documentation of techniques used to capture spectral images used in the data set [TBD] * ImageProcessing.txt: Documentation of techniques and algorithms used createthe processed images used in the data set [TBD] * XRFCaputre.txt: Documentation of XRF imaging used to capture XRF images used in the data set [TBD] * Archie_1.0.pdf: Documentation of the Archie 1.0 image manipulation software suite [to be update]