Manuel de Codage

The "Manuel de codage" is a standard for describing hieroglyphic texts. Its official description can be found in [BGH+88]. Most hieroglyphic typesetting system use this standard, but they generally extend or modify it. These modification address a number of needs : first, users want to have a certain amount of control on the way the hieroglyphic texts are drawn, and second, they correct a number of bugs in the original conception of the standard (notably for group shading).

This document purport to be a description of the « current state » of the standard (macscribe, winglyph1.2, tksesh). When an element presented without specific comment are part of the original standard ([BGH+88]). Thos which come from Winglyph are followed by WG; those who come from Tksesh are followed by Tk. The tksesh version refered to is the one currently developped (and not yet released). I don't know Macscribe well enough to include it here, but help is welcome on this. Last, new extensions are followed by the symbol PROP

Conventions

A text in Manuel de Codage format is written a priori in ASCII or ISO8859.1.

Because of the multiplicity of the platforms, it would be desirable (it is not currently the case, to my knowledge) that any program processing texts in MDC format would recognized the sequence of ASCII codes (10), (13, 10) and (13) like ends of lines.

Casual description of the Manuel de Codage

the general principle of the Handbook is as follows: the signs are coded according to their typographical code in the font made for A. H. Gardiner (A40, I3, etc.) except for the unilitary, the bilitary, and trilitary signs. For those, it is also possible to use their transliteration (see table for the codes). In the event of ambiguity, only the most frequent sign will be accessible by its transliteration.

The choices are sometimes curious. For example, iw is N18 (the island), not D54 (moving legs). Was the corpus used to make these choices representative of the whole of the texts? (in the current state of my base, where the Late-Egyptian and the hieratic texts are very represented, I find 40 N18 for 892 D54!)

The sign variants, whose codes are followed of an asterisk in the Gardiner list, like A14*, are coded by replacing the asterisk by "A": "A1A", those followed by two asterisks, as G7** are coded with a "B": "G7B".

Signs are combined by separating them by the symbols "-", "*", ": ". "-" separates two quadrats, ": " allows to pile up signs vertically, and" * " separates the signs horizontally, inside a quadrat. The parenthesis make it possible to change the priorities.

Ligatures (TK, WG)

The ligature system proposed here is specific to the current version of TKsesh. Winglyph uses the same notation, but considers the bindings commes composite hieroglyphs. In particular, for winglyph, a binding is known or unknown. For Tksesh, a binding is a particular way to group hieroglyphs. If the binding is defined, tksesh will group them in a special way, if not, it will behave as if the signs were separated by '*'. the symbol being used to bind hiéroglyphes is ' & '. the modifiers like \, \t, etc are not taken into account. The grammatical marks are on the other hand significant.

exemples

A&t: simple ligature of signs A and t
t&w&t: ligature of three signs. One can bind as many signs as he wants.
k-A &p-w: ligature of two signs not belonging to the same word

Grammar of the manuel de codage

We give here a formal grammar of the handbook of coding. This grammar is intended for programmers who would like to write software reading and writing texts in Manuel de Codage handbook format.

The description of this grammar uses the following notations:

[...]: text between hooks is optional
(...): parenthesis group text
*: text followed by a star can be repeated 0 or more times;
+: text followed by « plus » can be repeated 1 or more times;
A|B: the bar means "or" : text A or text B
"...": text between quotation marks is found as is in the document. For instance, "*" means the star sign.

mdcfile	::=	[WORDEND] [SEPARATOR] (textitem [SEPARATOR])* textitem [SEPARATOR]
textitem	::=	uppercadrat \| PAGEEND \| LINEEND \| textsuper
textsuper	::=	TEXTSUPER TEXTSUPERCHAR*
uppercadrat	::=	cadrat [SHADING]
	::=	TOGGLE
	::=	TEXT
cadrat	::=	(subcadrat ":")* subcadrat
subcadrat	::=	(inhierlist "") inhierlist
inhierlist	::=	sign
	::=	STARTCONSTRUCT cadrats ENDCONSTRUCT
	::=	"(" cadrats ")"
cadrats	::=	uppercadrat ([SEPARATOR] uppercadrat)*
sign	::=	hieroglyphs \| lig
		\| HALFSPACE \| FULLSPACE \| REDPOINT \| BLACKPOINT
		\| SHADG \| SHADV \| SHADT \| SHADH
hieroglyphs	::=	hieroglyph \| hieroglyph OVERWRITE hieroglyph
lig	::=	hieroglyph ("&" hieroglyph)+
hieroglyph	::=	[GRAMMAR] HIEROGLYPH MODIFIER* [WORDEND]

Note: notice that the text in Latin characters does not intervene as element of quadrat (+lsic+s:n for example, is illegal). It's unclear whether or not it is permitted to use this text between parenthesis, like this (g-+lsic+s-t):pt. The current system tends to disallow it.

Lexical elements

ESPSO ::= (" "|\t|\n|\015|"_"|"-")*

( Tk ) optional Spaces. This token makes it possible to bring a certain robustness to the parser. It is legal after the majority of the others tokens, where a space is irrelevant.

WORDEND ::= " ", " " (deux espaces), "_", "__"

a "space" or an "underline" marks the end of a word. Two spaces or two "underlines" mark the end of a sentence.

(Tk) Note: notice that the marker immediately follows the last sign of the word. Thus, the sequence: "tA:Aa32*(t:xAst)" must be coded "tA:Aa32*(t:xAst_)" and not " tA:Aa32*(t:xAst)_ ".

SHADING ::= "#"["1"]["2"]["3"]["4"]

( WG , included in Tk ) simple shading system. The quadrat is separated in four quadrants, as follows:

1	2
3	4

each quarter can be shaded or not. Thus, "# 124" mean that quarters 1,2, and 4 are shaded.

PAGEEND ::= "-!!"

PAGEEND ::= "-!!" [ ENTIER "%" ] ESPSO (WG)

LINEEND ::= "-!"

LINEEND ::= "-!" [ ENTIER "%" ] ESPSO (WG)

marks the end of a page (resp. of a line). The optional integer which follows indicates a vertical space in winglyph.

SEPARATOR ::= "-"

SEPARATOR ::= "-" ESPSO(Tk)

Separate two quadrats.

OVERWRITE ::= "#" | "##"

Superposition of two signs. The first version is to be avoided, it should be removed from the MDC, because it is ambiguous.

TOGGLE ::= ("-#-" | "#b" | "#e" | "$r" | "$b" | "?" | "??" | "^" | "$" ) ESPSO

These separators are "flags". They modify the text which follows them, and the state of the text in question remains modified until another TOGGLE change it again. For example, the text after "$r" is in red until "$b" passes it in black.

-#- : switch from shaded to not shaded and vice versa
#b : switch to "shaded" mode ( WG )
#e : switch to "unshaded" mode (WG)
$r : switch to red mode (WG)
$b : switch to black mode (WG)
$ : switch between red and black modes (avoid it)
? : begin/end of lacuna
?? : begin/end of line in lacuna
^ : begin/end of haplography
+[sl] : switch to hieroglyphic mode/roman text/italic...

TEXT ::= ([^+])*

Le texte dans un passage en caractères alphabétiques. Notez que le symbole "+" ne peut être utilisé. On pourra le remplacer par "+"

The text in a passage in alphabetical characters. Note that the symbol "+" cannot be used. One could replace it by "+"

TEXTSUPER ::= "|"

TEXTSUPERCHAR ::= [^-]

a line number is indicated by a "|"followed by the text to put over the bar. It ends with a "-"

Cartouches

Cartouches, serekh, etc... are built with "<" and ">".

"<"["S"|"F"|"H"]["b"|"m"|"e"] ESPSO... ">"

Cartouche. The first letter indicates which type of cartouche it is:

nothing : cartouche
S : serekh
F : enclosure
H : Hwt-castle

the second letter specifies which part of the cartouche must be drawn:

nothing : the whole cartouche
b : beginning
m : middle
f : end

"<"["s"|"f"|"h"][0123] ESPSO... ["s"|"f"|"h"][0123]">"

(WG) System similar to the precedent. Note that the cartouche-type names are lowercase. The numbers have the following significances:

0: this extremity of the cartouche is not drawn
1: this extremity is drawn as the undecorated one of the cartouche (for a true cartouche, that without the node. For a serekh, the "high" part, etc...)
2: this extremity is drawn as the decorated part of the cartouche
3: for Hwt only, the extremity decorated with the square in the upper part.

Philological comments

These constructions are:

[[..]] : erased text (drawn with [...]). Note that WinGlyph considers [[ et ]] as symbols (and gives them the same class as sign in our grammar). It makes difficult to determine which is the erased part of the text, as an opening "[[" can have more than one "]]". Yet, the corresponding drawing is closer to the usual practices.
```
[[*pt:p*]]*t 
```
[{...}] : superfluous text (drawn as {....})
["..."] : vanished, but formerly readable text lisible.
['...'] : scribal adjunct (drawn as '...')
[&...&] : editor adjunct (drawn as <...>)

Modifiers

Modifiers (MODIFIER in the previous grammar) allow to fix or change certain characteristics of the signs. Currently, the recognized modifiers are:

"\" NUMBER: sets the size of the sign, expressed as a percentage of its "normal" size. The significance of "normal size" is not fixed a priori.
"\": reverses un signe. If the text is oriented left/right, it will draw the sign as oriented right/left. Example : C2\-A30
"\r" ("1"|"2"|"3"|"4"): counter clockwise rotation (thus in the trigonometrical orientation). Each unit corresponds to 90 degrees.
"\t" ("1"|"2"|"3"|"4"): turning and clockwise rotation in the direction of the needles of a watch. Each unit corresponds to 90 degrees. The signs is reversed before.
"\R" ["-"]ENTIER: clockwise rotation, in degrees.
"\i": (Tk) sign to ignore. This sign is there only for space-filling purposes. The system may either not draw it at all, or draw it in dotted lines, pale gray, etc... The sign won't be taken into account in lexicographical applications either.
The interest of this sign is to solve the problem which arises when the logic design of the text and the graphic structure of the quadrats do not match. For a real-world example, see for example Hornung, Amduat, III, p. 780. A column starts as follows: i\i*(n\i:U36*(1:n) because i-n is already written on the preceding page.
The idea of this extension owes much to Spencer Tasker, in the preliminary versions of ETML.
"\inword" NUMBER: ( PROP ) The sign belongs to the word whose number follows. This is an alternative method to delimit the words. Two possibilities are to be studied. From a logical point of view, this system functions, but makes complex the scan for the words of a text. One can plan to add a marker of beginning of word and a marker of end of word. That would make it possible of more than have a certain locality of the numbers, if it is wished. Example: H\fw1-Xr\fw2\ew2 -b\ew1 allows to keep track of the two different words which form the group Xry-Hbd.
A problem of this approach is that we would have two designations of the words, compatible, but difficult to combine.

General form for modifiers

For the sake of compatibility between applications, we suggest that any sequence of the form "\" LETTER* DIGIT* be recognized as a modifier, and ignored if not understood.

Other elements

HALFSPACE ::= "."

Half space : a quater-quadrat.

SPACE ::= ".."

space. a quadrat long

SHADG= "//""

shading the size of a whole quadrat

SHADV= "v/"

vertical shading half a quadrat wide

SHADH= "h/"

horizontal shading half a quadrat high

SHADT= "/"

shading the size of a quater quater

Bibliography

[BGH+88]

Jan BUURMAN, Nicolas GRIMAL, Michael HAINSWORTH, Jochen HALLOF et Dirk VAN DER PLAS. « Inventaire des signes hiéroglyphiqures en vue de leur saisie informatique » ; Mémoires de l'Académie des Inscriptions et Belles Lettres. Institut de France, Paris, 1988.

Serge Rosmorduc