================================================================ Archimedes Palimpsest encoding scheme for digital transcriptions ================================================================ :Authors: Alex Lee, Doug Emery :Date: April 7, 2010 .. contents:: Included transcriptions ======================= The digital product includes the following transcriptions reporting the readings of the manuscript undertext: ================================ ============================ Text As read by ================================ ============================ Equilibrium of Planes (PE) Netz-Wilson Floating Bodies (FB) Netz-Wilson, Heiberg Measurement of the Circle (DC) Netz-Wilson Method (ME) Netz-Wilson, Heiberg Sphere and Cylinder (SC) Netz-Wilson Spiral Lines (SL) Netz-Wilson, Heiberg Stomachion (ST) Netz-Wilson, Heiberg Hyperides (HYP) ZPE publication ================================ ============================ Transcriptions are available both as full treatises (e.g. ``FloatingBodies-NW-p5.xml``) or as individual folios (e.g. ``017r-016v_Arch07r_Tei_Netz-Wilson.xml``). TEI conformance =============== These transcriptions are presented as XML files conforming to the `TEI P5 standard`_. The purpose of this document is to describe the encoding decisions specific to these transcriptions; for all other issues, the TEI standards should be consulted. Only a subset of the available TEI modules are used in these transcriptions: core, header, textstructure, analysis, figures, linking, transcr, and gaiji. The TEI `Roma`_ generator has been used to generate a `RelaxNG`_ schema for validating the transcriptions. The Roma customization file is provided as ``archie-tei.xml``; the schema file is ``archie-tei.rng``. .. _TEI P5 standard: http://www.tei-c.org/ .. _RelaxNG: http://www.relaxng.org/ .. _Roma: http://www.tei-c.org/Roma/ Overall structure (``div``, ``milestone``, ``head``, ``ab``, ``lb``) ==================================================================== For treatises that are divided into books (e.g. FB), the content for each book has been enclosed in a ``div[@type="book"]`` element. Other treatises have their content directly within the ``body`` element. The content itself comes as a series of ``milestone``, ``head``, and ``ab`` elements. Here the milestones indicate units of text such as postulates or propositions. These are presented according to Heiberg's edition. Each ``head`` or ``ab`` element contains the words of the transcription. Preceding each line of transcription is an ``lb`` element, whose ``@n`` attribute indicates the number of the line that follows. Lines are numbered according to each column of the undertext folio. Within the ``head`` and ``ab`` elements there occur two further types of milestones. Those of type "folio" indicate the overtext (= Euchologion) folio number and undertext column number, e.g. *81v1*. Those of type "underTextFolio" indicate the undertext (= Archimedes or Hyperides) folio number, e.g. *Arch03v*. In most cases, the undertext folio milestone will be followed by four overtext folio milestones. For example, *Arch03v* is composed of *81v1, 88r1, 81v2, 88r2*. Words (``w``, ``unclear``, ``supplied``) ======================================== If a word has no further markup associated with it, it is simply provided as text, e.g. "ὥστε τῶν μέρων". But if a word has any extra markup, it is contained within a ``w`` element. *NB. All textual whitespace at any level within a `w` element is to be ignored or discarded.* The tags ``unclear`` and ``supplied`` never span across words; instead, separate tags are used for each word. So, for example, "κ̣α<ί ἐ>στι" would be encoded as:: κα ί ἐστι Line breaks ----------- When a word breaks across a line, it is presented in two parts, the initial ``w[@part="I"]`` and the final ``w[@part="F"]``, with the line break ``lb`` placed between. For example, given this text:: 5 ... ... ... συνε- 6 χέων ... ... ... the corresponding markup would be:: ... συνε χέων ... Here the word συνεχέων breaks with συνε at the end of line 5 and is completed by χέων in line 6. In cases where a preceding or following folio is missing, it is possible that a word-part will lack its complement. Abbreviations (``choice``, ``abbr``, ``am``, ``g``, ``expan``, ``ex``) ====================================================================== All abbreviations are encoded as ``choice`` elements. Each ``choice`` element contains an ``abbr`` element giving the manuscript form of the word and an ``expan`` element giving the editorially expanded form of the word. Partial-word abbreviations -------------------------- For abbreviations in which only a portion of the word has been abbreviated, such as "ἔχο(ν)", the entire word is represented by the ``choice`` element. An ``am`` element is used in the ``abbr`` to show where the abbreviation occurs, and a corresponding ``ex`` gives the expansion within the ``expan``. The ``am`` always contains a single ``g``, to represent whatever symbol was used for the abbreviation. (No details are given about the symbols used, though this would be a nice addition to the transcriptions down the road.) :: ἔχο ἔχον Here an enclosing ``w`` element is optional. Multiple ``am`` and ``ex`` instances are possible, so long as they occur in pairs. The doubly abbreviated word "τ(ου)τ(έστιν)" would be encoded as:: ττ τουτέστιν If the abbreviation has been flagged by the editors as ``supplied`` or ``unclear``, these tags will go outside of the ``am`` and the ``ex`` (it is not permitted by the schema to have them within those elements, nor would this have any clear meaning). Full-word abbreviations ----------------------- For many common, short words, the entire word is represented with a symbol. In such cases any extra markup, such as ``supplied`` or ``unclear``, is placed outside of the choice. For example, suppose that a symbol representing the word καὶ has been supplied by the editors; the word is demanded by the context, and the manuscript shows only a one-character space, so an abbreviation is necessary. The word would be encoded as:: καὶ Abbreviations at word breaks ---------------------------- One final complication occurs when an abbreviated word occurs at a line break. Here the word is divided into two parts, ``w[@part="I"]`` and ``w[@part="F"]``, as mentioned above. When one of those parts contains an abbreviation, its entire contents will be contained within a ``choice``. So, given the following text:: 1 ... ... ... ἔχ(ου)- 2 σαν ... ... ... the corresponding markup would be:: ἔχ ἔχου σαν Because the initial part of the word, before the line break, contains an abbreviation, its entire contents are given in a ``choice`` element. The final part of the word, which resumes after the line break, needs no ``choice`` because it does not contain an abbreviation. Encoding of other features (``hi``, ``figure``) =============================================== The manuscript contains marginal numbers (typically indicating a proposition), as well as a few instances of marginal text. These are enclosed within ``hi[@rend="margin"]`` elements. Cases of raised text are enclosed within ``hi[@rend="superscript"]`` elements. Diagrams are represented with ``figure`` elements, each of which has an ``@n`` attribute and a short ``figDesc`` within it. Note that these elements are also included when the manuscript merely contains space for a figure, even though no figure has been drawn in. Some features of the manuscript, including large text, extra spaces, and out-dented text, have not been included in this transcription. Editorial additions =================== All diacritics, word spacing, and punctuation are editorial additions. Punctuation always appears within ``pc`` tags (so that punctuation can easily be stripped out if so desired). On occasion the reading in the manuscript is such that it could be taken for an error in the transcription (for instance if there has been scribal error or some other form of corruption). These readings have been flagged with a ``sic`` element. Such elements can span multiple words. Whitespace ========== As mentioned above, in certain elements whitespace must be ignored. This can be accomplished, in XSLT for instance, by the following template::

The need for this workaround is regrettable, but it seems to be the most robust way of accounting for the whitespace-handling behaviors of various XML tools. Linking to text images for individual folios (``facsimile``, ``surface``, ``graphic``, ``zone``) ================================================================================================ Each individual folio transcription (e.g. ``017r-016v_Arch07r_TEI_Netz-Wilson.xml``) is mapped to the accompanying images of the corresponding folio using the ``facsimile`` section following the TEI header. A single ``surface`` element is used, containing one ``graphic`` element for each image file and one ``zone`` element for each mapped line or other element (such as figures, and identified proposition and book enumerations). The following shows a portion of a facsimile element. :: The ``surface``'s ``@ulx``, ``@uly``, ``@lrx``, and ``@lry`` attributes provide the pixel dimensions of the images referenced by the enclosed graphic elements. The ``zone``'s ``@ulx``, ``@uly``, ``@lrx``, and ``@lry`` attributes describe a box containing the referenced component. The ``zone/@xml:id`` begins with "z", followed by the undertext folio, prayer book folio and column (if applicable), and the ``@n`` attribute value of the referenced element. Elements that are not lines, such as figures, have the element type in the ID in lieu of the column number, for example: ``zArch02r_14r_fig_1``. In the text portion of the file, each mapped element will employ the ``@facs`` attribute to reference the corresponding ``zone`` by its ``@xml:id``, as follows:: φοτέρου τᾶς