OK, the aforementioned conference is over, which hopefully means I’ll be able to spend a bit more time on this series.
Let’s start off with something reassuringly vague and all-encompassing:
What are writing systems? How do they work?
I’ll begin by attempting my own definition:
writing systems allow spoken language to be recorded visually such that someone else can read back the same text verbatim.
Of course, this definition immediately runs into edge cases. Most obviously, it’s possible to create a system in which text can be written, but not read back unambiguously. For one example, many shorthand systems approach this situation: the writer can usually read their shorthand easily, but others often find it far more difficult or even impossible to read. More generally, this is the case for any personal system of transcription. (Though most of the time, it’s still theoretically possible to teach a personal system to others.)
It is also possible for preexisting writing systems to become so ambiguous they are essentially impossible to read back. For instance, one extreme case was detailed in Donohue’s
draft grammar of Skou (p100), which is worth quoting nearly in full:
Donohue wrote:
The received wisdom on the subject of a Skou orthography was that it was no problem to write the language, but that there was no point in doing so, since neither you nor anyone else could then read what you had written. This apparent paradox has its roots in the representation for the non-back rounded vowels, and suprasegmentals.
[…]
While [in the locally-developed orthography] the grapheme
ê is used a lot, it is equally clear that its use is not random. It serves several distinct and easily defined functions. This letter+diacritic
ê is used:
- to mark the non-back rounded vowels in all environments;
- to mark the falling tone in all environments;
and
- to mark nasalisation on a non-low, non-high vowel.
While consistent, and certainly not hard to learn, this orthography does suffer from the fact that, of the 39 contrasting rimes in Skou, 23 of them are represented by the same grapheme
ê. this led, as mentioned above, to a writing system that is easy to learn, but pointless to apply: you can write things down with no difficulty, but noone can then read your composition. An example of this can be seen in the very plausible sentence below:
(87) Written: <Hê pe tê> for
- Hòe
- sago
- pe=tue.
- 3SG.F=3SG.F.do
‘She cooked sago’.
[…]
Possible plausible interpretations for <hê pe tê>: [omitting glosses] ‘She yawned’, ‘She accused’, ‘She bled/menstruated’, ‘She did something else’, ‘She make a roof’, ‘She whistled’, ‘She cooked sago’, ‘She hammered’, ? ‘She had sex with a woman’, ‘She considered’.
Arguably, something similar occurred historically in Book Pahlavi (about which I hope to say more later). Generally speaking, writing systems which become this ambiguous seem to fall out of use pretty quickly.
(On the other hand, it does seem like orthographies can tolerate a high amount of ambiguity: consider English and Chinese, for instance. Is there a maximum amount of tolerable ambiguity? Probably, but I have no idea what it is. In fact, until now I’ve never even considered the question.)
We can also consider the reverse case: a system which can be unambiguously read, but not written. Obviously, this case is practically nonexistent in natural scripts — but some particularly perverse conscripts have come very close. A quick scroll through
Omniglot reveals some cases: for instance
Timescript (which is animated),
Betamaze (which is maze-based),
Undine (based on cephalopod skin), and
Pipeline 3D (which is, well, a 3D pipeline). Without a computer, these would be almost completely unusable for humans.
In any case, the overwhelming majority of scripts can easily be both written and read. This implies two key facts: it must be easy to physically create glyphs using the writing materials on hand, and there must be a reasonably straightforward correspondence between those glyphs and the spoken sounds. We can identify these as the
form and
function of the script respectively.
Note that these are more or less orthogonal concepts. It is common for one script to have many different stylistic variants: these usually change the form of glyphs without changing their functions. Conversely, when a script is adapted to a different language, the glyph forms can stay the same, but acquire a very different correspondence to the spoken language. (For instance, this occurred dramatically in the use of Hebrew/Aramaic script for Yiddish.)
At this point I’ve used the term
glyph a few times, so I’d better clarify what that means. To be honest, it feels to me like a somewhat ambiguous term, alongside its close relative
grapheme. Consulting Wikipedia, it defines a ‘grapheme’ as the smallest meaningful unit of writing (analogous to ‘phoneme’ and ‘morpheme’), and a ‘glyph’ as its specific written realisation (analogous to ‘phone’). This seems reasonable to me, so let’s go with that.
However, it would also be nice to have a higher-level term, since writing systems very often group graphemes together to form higher-level units. This is the case, for example, for Latin-script diacritics around letters, for Indic or Hebrew consonant+vowel combinations, and for Hangeul syllable blocks. Since I’m not aware of any formal name for these, I’ll just call them ‘blocks’ or ‘units’.
On this basis we can create a functional
classification of writing systems, in terms of which phonological units are represented by their graphemes, and in what way those units are represented. For those who aren’t yet familiar with this, the most common classification scheme is as follows:
- Abjads, in which one grapheme corresponds to one consonant;
- Alphabets, in which one grapheme corresponds to one phoneme;
- Syllabaries, in which one grapheme corresponds to one syllable;
- Abugidas, in which consonantal graphemes can be used on their own or with a vocalic modifier to form syllabic blocks; and
- Logographies, in which one grapheme corresponds to one word.
Arguably, ‘semi-syllabaries’ can also be added to the list, though they seem distinctly rarer than the other systems. Possibly ‘semi-abugidas’ too, but at this point the different categories start to become hard to distinguish.
I won’t be saying very much about logographies in this series — partly because they’re just so different from the other systems, but mostly because zompist’s
ALC already has an excellent chapter on the subject. However, I’ll be aiming to talk a little bit about each of these categories, as well as some scripts which don’t fit well into any of them.
At a slightly lower level, we can talk about the
orthography of individual languages: that is, the specific rules to map graphemes to sounds. While some languages have highly regular orthographies, almost all display some degree of irregularity (generally as a result of historical sound change). This can often end up distorting the nice picture presented by the high-level classification: for instance, by removing the inherent vowel of an abugida, or by creating many-to-many correspondences in alphabets. (I’ve even seen a serious argument that English has become a near-logography; alas, I seem to have lost the paper.)
Turning now to the form of glyphs, there’s less that one can say synchronically. Basically any shape is usable for a glyph, as long as it’s easy enough to write. Like I said, different writing instruments do tend to produce different shapes, but talking only goes so far: by far the best way to understand that is to try them out yourself.
However, form is a central issue when considering the
evolution of writing systems over time. As I see it, the basic drivers are similar to the rest of historical linguistics: the desire to clearly distinguish glyphs is counterbalanced by the requirement to make writing quick and easy. Areal effects can also play an important role, though possibly less so than in spoken languages. However, there are also a few factors which are unique to written scripts: most notably, a change in writing tool will generally cause a dramatic change in glyph form.
Another unique phenomenon is the high prevalence of
conscripts (alternately,
neographies). It is vanishingly rare for conlangs to acquire native speakers, but there have been many instances throughout history when writing systems have been invented from whole cloth. (Most famously Hangeul and Cherokee, but also Canadian Aboriginal Syllabics, N'Ko, Adlam, Thaana, Ol Chiki, Vai, Kpelle and a host of others.)
In fact, since all writing is to some extent a conscious activity, it can be difficult to determine a boundary between neographies and ‘naturally’ evolved scripts. Major improvements in writing systems, such as the inventions of the abjad and alphabet, are often considered to be human inventions, rather than spontaneous evolution.
The final area I want to cover in this series is
typography. I will use this term in a rather broad sense, to encompass anything involving the arrangement of glyph units on a page to form coherent texts and documents. This ranges from details such as writing direction, to the punctuation and spacing required to make texts readable, to the layout of structured text on pages. Typography in this sense overlaps with both form and function, but isn’t particularly closely connected with any one particular writing system.
In general, the most intricate typographical traditions have developed in the West, so I’ll be focussing on that a lot. (Unlike the other areas, which are equally relevant for all writing systems.) But I will try to talk about what I know of other cultures, when applicable.
…and with that, I’ve noticed the time is 2am, so I should probably stop around here. Do let me know if I missed anything important.
Next up: abjads and alphabets, most probably. (Unless I change my mind, of course.)