Manuel de Codage

The "Manuel de codage" is a standard for describing hieroglyphic texts. Its official description can be found in [BGH+88]. Most hieroglyphic typesetting system use this standard, but they generally extend or modify it. These modification address a number of needs : first, users want to have a certain amount of control on the way the hieroglyphic texts are drawn, and second, they correct a number of bugs in the original conception of the standard (notably for group shading).

This document purport to be a description of the « current state » of the standard (macscribe, winglyph1.2, tksesh). When an element presented without specific comment are part of the original standard ([BGH+88]). Thos which come from Winglyph are followed by WG; those who come from Tksesh are followed by Tk. The tksesh version refered to is the one currently developped (and not yet released). I don't know Macscribe well enough to include it here, but help is welcome on this. Last, new extensions are followed by the symbol PROP


A text in Manuel de Codage format is written a priori in ASCII or ISO8859.1.

Because of the multiplicity of the platforms, it would be desirable (it is not currently the case, to my knowledge) that any program processing texts in MDC format would recognized the sequence of ASCII codes (10), (13, 10) and (13) like ends of lines.

Casual description of the Manuel de Codage

the general principle of the Handbook is as follows: the signs are coded according to their typographical code in the font made for A. H. Gardiner (A40, I3, etc.) except for the unilitary, the bilitary, and trilitary signs. For those, it is also possible to use their transliteration (see table for the codes). In the event of ambiguity, only the most frequent sign will be accessible by its transliteration.

The choices are sometimes curious. For example, iw is N18 (the island), not D54 (moving legs). Was the corpus used to make these choices representative of the whole of the texts? (in the current state of my base, where the Late-Egyptian and the hieratic texts are very represented, I find 40 N18 for 892 D54!)

The sign variants, whose codes are followed of an asterisk in the Gardiner list, like A14*, are coded by replacing the asterisk by "A": "A1A", those followed by two asterisks, as G7** are coded with a "B": "G7B".

Signs are combined by separating them by the symbols "-", "*", ": ". "-" separates two quadrats, ": " allows to pile up signs vertically, and" * " separates the signs horizontally, inside a quadrat. The parenthesis make it possible to change the priorities.

Ligatures (TK, WG)

The ligature system proposed here is specific to the current version of TKsesh. Winglyph uses the same notation, but considers the bindings commes composite hieroglyphs. In particular, for winglyph, a binding is known or unknown. For Tksesh, a binding is a particular way to group hieroglyphs. If the binding is defined, tksesh will group them in a special way, if not, it will behave as if the signs were separated by '*'. the symbol being used to bind hiéroglyphes is ' & '. the modifiers like \, \t, etc are not taken into account. The grammatical marks are on the other hand significant.


simple ligature of signs A and t
ligature of three signs. One can bind as many signs as he wants.
k-A &p-w
ligature of two signs not belonging to the same word

Grammar of the manuel de codage

We give here a formal grammar of the handbook of coding. This grammar is intended for programmers who would like to write software reading and writing texts in Manuel de Codage handbook format.

The description of this grammar uses the following notations:

text between hooks is optional
parenthesis group text
text followed by a star can be repeated 0 or more times;
text followed by « plus » can be repeated 1 or more times;
the bar means "or" : text A or text B
text between quotation marks is found as is in the document. For instance, "*" means the star sign.

mdcfile ::= [WORDEND] [SEPARATOR] (textitem [SEPARATOR])* textitem [SEPARATOR]
textitem ::= uppercadrat | PAGEEND | LINEEND | textsuper
uppercadrat ::= cadrat [SHADING]
::= TEXT
cadrat ::= (subcadrat ":")* subcadrat
subcadrat ::= (inhierlist "*")* inhierlist
inhierlist ::= sign
::= "(" cadrats ")"
cadrats ::= uppercadrat ([SEPARATOR] uppercadrat)*
sign ::= hieroglyphs | lig
hieroglyphs ::= hieroglyph | hieroglyph OVERWRITE hieroglyph
lig ::= hieroglyph ("&" hieroglyph)+

Note: notice that the text in Latin characters does not intervene as element of quadrat (+lsic+s:n for example, is illegal). It's unclear whether or not it is permitted to use this text between parenthesis, like this (g-+lsic+s-t):pt. The current system tends to disallow it.

Lexical elements

ESPSO ::= (" "|\t|\n|\015|"_"|"-")*
( Tk ) optional Spaces. This token makes it possible to bring a certain robustness to the parser. It is legal after the majority of the others tokens, where a space is irrelevant.
WORDEND ::= " ", " " (deux espaces), "_", "__"
a "space" or an "underline" marks the end of a word. Two spaces or two "underlines" mark the end of a sentence.

(Tk) Note: notice that the marker immediately follows the last sign of the word. Thus, the sequence: "tA:Aa32*(t:xAst)" must be coded "tA:Aa32*(t:xAst_)" and not " tA:Aa32*(t:xAst)_ ".

SHADING ::= "#"["1"]["2"]["3"]["4"]
( WG , included in Tk ) simple shading system. The quadrat is separated in four quadrants, as follows:
1 2
3 4
each quarter can be shaded or not. Thus, "# 124" mean that quarters 1,2, and 4 are shaded.
PAGEEND ::= "-!!"
PAGEEND ::= "-!!" [ ENTIER "%" ] ESPSO (WG)
LINEEND ::= "-!"
LINEEND ::= "-!" [ ENTIER "%" ] ESPSO (WG)
marks the end of a page (resp. of a line). The optional integer which follows indicates a vertical space in winglyph.
Separate two quadrats.
OVERWRITE ::= "#" | "##"
Superposition of two signs. The first version is to be avoided, it should be removed from the MDC, because it is ambiguous.
TOGGLE ::= ("-#-" | "#b" | "#e" | "$r" | "$b" | "?" | "??" | "^" | "$" ) ESPSO
These separators are "flags". They modify the text which follows them, and the state of the text in question remains modified until another TOGGLE change it again. For example, the text after "$r" is in red until "$b" passes it in black.
TEXT ::= ([^+])*
Le texte dans un passage en caractères alphabétiques. Notez que le symbole "+" ne peut être utilisé. On pourra le remplacer par "+"
The text in a passage in alphabetical characters. Note that the symbol "+" cannot be used. One could replace it by "+"
a line number is indicated by a "|"followed by the text to put over the bar. It ends with a "-"


Cartouches, serekh, etc... are built with "<" and ">".

"<"["S"|"F"|"H"]["b"|"m"|"e"] ESPSO... ">"
Cartouche. The first letter indicates which type of cartouche it is: the second letter specifies which part of the cartouche must be drawn:
"<"["s"|"f"|"h"][0123] ESPSO... ["s"|"f"|"h"][0123]">"
(WG) System similar to the precedent. Note that the cartouche-type names are lowercase. The numbers have the following significances:

Philological comments

These constructions are:


Modifiers (MODIFIER in the previous grammar) allow to fix or change certain characteristics of the signs. Currently, the recognized modifiers are:

sets the size of the sign, expressed as a percentage of its "normal" size. The significance of "normal size" is not fixed a priori.
reverses un signe. If the text is oriented left/right, it will draw the sign as oriented right/left. Example : C2\-A30
"\r" ("1"|"2"|"3"|"4")
counter clockwise rotation (thus in the trigonometrical orientation). Each unit corresponds to 90 degrees.
"\t" ("1"|"2"|"3"|"4")
turning and clockwise rotation in the direction of the needles of a watch. Each unit corresponds to 90 degrees. The signs is reversed before.
"\R" ["-"]ENTIER
clockwise rotation, in degrees.
(Tk) sign to ignore. This sign is there only for space-filling purposes. The system may either not draw it at all, or draw it in dotted lines, pale gray, etc... The sign won't be taken into account in lexicographical applications either.

The interest of this sign is to solve the problem which arises when the logic design of the text and the graphic structure of the quadrats do not match. For a real-world example, see for example Hornung, Amduat, III, p. 780. A column starts as follows: i\i*(n\i:U36*(1:n) because i-n is already written on the preceding page.

The idea of this extension owes much to Spencer Tasker, in the preliminary versions of ETML.
"\inword" NUMBER
( PROP ) The sign belongs to the word whose number follows. This is an alternative method to delimit the words. Two possibilities are to be studied. From a logical point of view, this system functions, but makes complex the scan for the words of a text. One can plan to add a marker of beginning of word and a marker of end of word. That would make it possible of more than have a certain locality of the numbers, if it is wished. Example: H\fw1-Xr\fw2\ew2 -b\ew1 allows to keep track of the two different words which form the group Xry-Hbd.

A problem of this approach is that we would have two designations of the words, compatible, but difficult to combine.

General form for modifiers

For the sake of compatibility between applications, we suggest that any sequence of the form "\" LETTER* DIGIT* be recognized as a modifier, and ignored if not understood.

Other elements

Half space : a quater-quadrat.
SPACE ::= ".."
space. a quadrat long
SHADG= "//""
shading the size of a whole quadrat
SHADV= "v/"
vertical shading half a quadrat wide
SHADH= "h/"
horizontal shading half a quadrat high
SHADT= "/"
shading the size of a quater quater


Jan BUURMAN, Nicolas GRIMAL, Michael HAINSWORTH, Jochen HALLOF et Dirk VAN DER PLAS. « Inventaire des signes hiéroglyphiqures en vue de leur saisie informatique » ; Mémoires de l'Académie des Inscriptions et Belles Lettres. Institut de France, Paris, 1988.

Serge Rosmorduc