Manuel de Codage
The "Manuel de codage" is a standard for describing hieroglyphic texts.
Its official description can be found in [BGH+88].
Most hieroglyphic typesetting system use this standard, but they generally extend or modify it.
These modification address a number of needs :
first, users want to have a certain amount of control on the way the hieroglyphic texts
are drawn, and second, they correct a number of bugs in the original conception of the standard
(notably for group shading).
This document purport to be a description of the « current
state » of the standard (macscribe, winglyph1.2, tksesh).
When an element presented without specific comment are part of the original standard
([BGH+88]).
Thos which come from Winglyph are followed by WG; those
who come from Tksesh are followed by Tk. The tksesh
version refered to is the one currently developped (and not yet
released). I don't know Macscribe well enough to include it here, but help is welcome on this. Last,
new extensions are followed by the symbol PROP
Conventions
A text in Manuel de Codage format is
written a priori in ASCII or ISO8859.1.
Because of the
multiplicity of the platforms, it would be desirable (it is not
currently the case, to my knowledge) that any program processing
texts in MDC format would recognized the sequence of
ASCII codes (10), (13, 10) and (13) like ends of lines.
Casual description of the Manuel de Codage
the general principle of the Handbook is as follows: the signs
are coded according to their typographical code in the font made
for A. H. Gardiner (A40, I3, etc.) except for the unilitary,
the bilitary, and trilitary signs. For those, it is also
possible to use their transliteration (see table for the codes).
In the event of ambiguity, only the most frequent sign will be
accessible by its transliteration.
The choices are sometimes curious. For example, iw
is N18 (the island), not D54 (moving legs). Was the
corpus used to make these choices representative of the whole of
the texts? (in the current state of my base, where the
Late-Egyptian and the hieratic texts are very
represented, I find 40 N18 for 892 D54!)
The sign variants, whose codes are followed of an
asterisk in the Gardiner list, like A14*, are coded by replacing the
asterisk by "A": "A1A", those followed by two asterisks, as G7**
are coded with a "B": "G7B".
Signs are combined by separating them by the symbols "-",
"*", ": ". "-" separates two quadrats, ": " allows to pile up signs
vertically, and" * " separates the signs horizontally, inside
a quadrat. The parenthesis make it possible to change the
priorities.
Ligatures (TK, WG)
The ligature system proposed here is specific to the
current version of TKsesh. Winglyph uses the same notation, but
considers the bindings commes composite hieroglyphs. In
particular, for winglyph, a binding is known or unknown. For
Tksesh, a binding is a particular way to group hieroglyphs. If
the binding is defined, tksesh will group them in a special way,
if not, it will behave as if the signs were separated by '*'.
the symbol being used to bind hiéroglyphes is ' & '. the
modifiers like \, \t, etc are not taken into account. The
grammatical marks are on the other hand significant.
exemples
- A&t
- simple ligature of signs A and t
- t&w&t
- ligature of three signs. One can bind as many signs as he wants.
- k-A &p-w
- ligature of two signs not belonging to the same word
Grammar of the manuel de codage
We give here a formal grammar of the handbook of coding. This
grammar is intended for programmers who would like to write
software reading and writing texts in Manuel de Codage
handbook format.
The description of this grammar uses the following notations:
- [...]
- text between hooks is optional
- (...)
- parenthesis group text
- *
- text followed by a star can be repeated 0 or more times;
- +
- text followed by « plus » can be repeated 1 or more times;
- A|B
- the bar means "or" : text A or text B
-
"..."
-
text between quotation marks is found as is in the document.
For instance, "*" means the star sign.
mdcfile |
::= |
[WORDEND] [SEPARATOR] (textitem [SEPARATOR])* textitem
[SEPARATOR] |
textitem |
::= |
uppercadrat | PAGEEND | LINEEND | textsuper |
textsuper |
::= |
TEXTSUPER TEXTSUPERCHAR* |
uppercadrat |
::= |
cadrat [SHADING]
|
|
::= |
TOGGLE
|
|
::= |
TEXT
|
cadrat |
::= |
(subcadrat ":")* subcadrat |
subcadrat |
::= |
(inhierlist "*")* inhierlist |
inhierlist |
::= |
sign |
|
::= |
STARTCONSTRUCT cadrats ENDCONSTRUCT |
|
::= |
"(" cadrats ")" |
cadrats |
::= |
uppercadrat ([SEPARATOR] uppercadrat)* |
sign |
::= |
hieroglyphs | lig |
|
|
| HALFSPACE | FULLSPACE | REDPOINT | BLACKPOINT |
|
|
| SHADG | SHADV | SHADT | SHADH |
hieroglyphs |
::= |
hieroglyph | hieroglyph OVERWRITE hieroglyph |
lig |
::= |
hieroglyph ("&" hieroglyph)+ |
hieroglyph |
::= |
[GRAMMAR] HIEROGLYPH MODIFIER* [WORDEND] |
Note: notice that the text in Latin characters does not
intervene as element of quadrat (+lsic+s:n
for example, is illegal).
It's unclear whether or not it is permitted to use this text
between parenthesis, like this (g-+lsic+s-t):pt
. The
current system tends to disallow it.
Lexical elements
- ESPSO ::= (" "|\t|\n|\015|"_"|"-")*
- ( Tk ) optional Spaces. This token makes it
possible to bring a certain robustness to the parser. It is
legal after the majority of the others tokens, where a space is irrelevant.
- WORDEND ::= " ", " " (deux espaces), "_", "__"
- a "space" or an "underline" marks the end of a word.
Two spaces or two "underlines" mark the end of a sentence.
(Tk) Note: notice that the marker
immediately follows the last sign of the word.
Thus, the sequence: "tA:Aa32*(t:xAst)" must be coded
"tA:Aa32*(t:xAst_)" and not " tA:Aa32*(t:xAst)_ ".
- SHADING ::= "#"["1"]["2"]["3"]["4"]
-
( WG , included in Tk ) simple shading
system. The quadrat is separated in four quadrants, as
follows:
each quarter can be shaded or not. Thus, "# 124" mean that
quarters 1,2, and 4 are shaded.
- PAGEEND ::= "-!!"
- PAGEEND ::= "-!!" [ ENTIER "%" ] ESPSO (WG)
- LINEEND ::= "-!"
- LINEEND ::= "-!" [ ENTIER "%" ] ESPSO (WG)
- marks the end of a page (resp. of a line).
The optional integer which follows indicates a vertical space in winglyph.
- SEPARATOR ::= "-"
- SEPARATOR ::= "-" ESPSO(Tk)
- Separate two quadrats.
- OVERWRITE ::= "#" | "##"
- Superposition of two signs. The first version is to be
avoided, it should be removed from the MDC, because it is
ambiguous.
- TOGGLE ::= ("-#-" | "#b" | "#e" | "$r" | "$b" | "?" | "??" | "^" | "$" ) ESPSO
- These separators are "flags". They modify the text which
follows them, and the state of the text in question remains
modified until another TOGGLE change it again. For example,
the text after "$r" is in red until "$b" passes it in black.
- -#- : switch from shaded to not shaded and vice versa
- #b : switch to "shaded" mode ( WG )
- #e : switch to "unshaded" mode (WG)
- $r : switch to red mode (WG)
- $b : switch to black mode (WG)
- $ : switch between red and black modes (avoid it)
- ? : begin/end of lacuna
- ?? : begin/end of line in lacuna
- ^ : begin/end of haplography
- +[sl] : switch to hieroglyphic mode/roman text/italic...
- TEXT ::= ([^+])*
-
Le texte dans un passage en caractères alphabétiques. Notez
que le symbole "+" ne peut être utilisé. On pourra le remplacer par "+"
- The text in a passage in alphabetical characters. Note
that the symbol "+" cannot be used. One could replace it by "+"
- TEXTSUPER ::= "|"
- TEXTSUPERCHAR ::= [^-]
-
a line number is indicated by a "|"followed by the text to put over the bar.
It ends with a "-"
Cartouches
Cartouches, serekh, etc... are built with "<" and ">".
- "<"["S"|"F"|"H"]["b"|"m"|"e"] ESPSO... ">"
- Cartouche. The first letter indicates which type of cartouche it is:
- nothing : cartouche
- S : serekh
- F : enclosure
- H : Hwt-castle
the second letter specifies which part of the cartouche must be drawn:
- nothing : the whole cartouche
- b : beginning
- m : middle
- f : end
- "<"["s"|"f"|"h"][0123] ESPSO... ["s"|"f"|"h"][0123]">"
- (WG) System similar to the precedent. Note that the
cartouche-type names are lowercase. The numbers have the
following significances:
- 0: this extremity of the cartouche is not drawn
- 1: this extremity is drawn as the undecorated one of the cartouche
(for a true cartouche, that without the node. For a serekh, the "high" part, etc...)
- 2: this extremity is drawn as the decorated part of the cartouche
- 3: for Hwt only, the extremity decorated with the square in the upper part.
Philological comments
These constructions are:
-
[[..]] : erased text (drawn with [...]). Note
that WinGlyph considers [[ et ]] as symbols
(and gives them the same class as sign in our grammar).
It makes difficult to determine which is the erased part of the text, as an opening
"[[" can have more than one "]]". Yet, the corresponding drawing is closer to the usual
practices.
[[*pt:p*]]*t
-
[{...}] : superfluous text (drawn as {....})
-
["..."] : vanished, but formerly readable text
lisible.
-
['...'] : scribal adjunct (drawn as '...')
-
[&...&] : editor adjunct (drawn as <...>)
Modifiers
Modifiers (MODIFIER
in the previous grammar) allow
to fix or change certain characteristics of the signs. Currently,
the recognized modifiers are:
- "\" NUMBER
- sets the size of the sign, expressed as a percentage of its "normal" size.
The significance of "normal size" is not fixed a priori.
- "\"
-
reverses un signe. If the text is oriented left/right, it will
draw the sign as oriented right/left. Example :
C2\-A30
- "\r" ("1"|"2"|"3"|"4")
- counter clockwise rotation (thus in the trigonometrical
orientation). Each unit corresponds to 90 degrees.
- "\t" ("1"|"2"|"3"|"4")
- turning and clockwise rotation in the direction of the needles of a
watch. Each unit corresponds to 90 degrees. The signs is
reversed before.
- "\R" ["-"]ENTIER
-
clockwise rotation, in degrees.
- "\i"
- (Tk) sign to ignore. This sign is there
only for space-filling purposes. The system may either
not draw it at all, or draw it in dotted lines, pale gray,
etc... The sign won't be taken into account in
lexicographical applications either.
The interest of
this sign is to solve the problem which arises when the
logic design of the text and the graphic structure of the
quadrats do not match. For a real-world example, see
for example Hornung, Amduat, III, p. 780. A
column starts as follows: i\i*(n\i:U36*(1:n)
because i-n
is already written on the
preceding page.
The idea of this extension owes much
to Spencer Tasker, in the preliminary versions of ETML.
- "\inword" NUMBER
- ( PROP ) The sign belongs to the word whose number
follows. This is an alternative method to delimit the words.
Two possibilities are to be studied. From a logical point of
view, this system functions, but makes complex the scan for the
words of a text. One can plan to add a marker of beginning of
word and a marker of end of word. That would make it possible
of more than have a certain locality of the numbers, if it is
wished.
Example:
H\fw1-Xr\fw2\ew2 -b\ew1
allows to keep
track of the two different words which form the group Xry-Hbd.
A problem of this approach is that we would have two
designations of the words, compatible, but
difficult to combine.
General form for modifiers
For the sake of compatibility between applications, we suggest
that any sequence of the form "\" LETTER* DIGIT* be
recognized as a modifier, and ignored if not understood.
Other elements
- HALFSPACE ::= "."
- Half space : a quater-quadrat.
- SPACE ::= ".."
- space. a quadrat long
- SHADG= "//""
- shading the size of a whole quadrat
- SHADV= "v/"
- vertical shading half a quadrat wide
- SHADH= "h/"
- horizontal shading half a quadrat high
- SHADT= "/"
- shading the size of a quater quater
Bibliography
- [BGH+88]
-
Jan BUURMAN, Nicolas GRIMAL,
Michael HAINSWORTH, Jochen
HALLOF et Dirk VAN
DER PLAS. « Inventaire des
signes hiéroglyphiqures en vue de leur saisie informatique
» ; Mémoires de l'Académie des Inscriptions et Belles
Lettres. Institut de France, Paris, 1988.
Serge
Rosmorduc