Alphabets: basics
In the last post, I defined
alphabets as writing systems where one grapheme corresponds to one phoneme. This gives them a particularly simple structure: ideally, it should be possible to read out a sentence simply by reading each grapheme in turn. An example should illustrate the principle well (using the first sentence of the UHDR in Georgian):
ყველა ადამიანი იბადება თავისუფალი და თანასწორი თავისი ღირსებითა და უფლებებით.
- ყ
- qʼ
- ვ
- v
- ე
- e
- ლ
- l
- ა
- a
- ა
- a
- დ
- d
- ა
- a
- მ
- m
- ი
- i
- ა
- a
- ნ
- n
- ი
- i
- ი
- i
- ბ
- b
- ა
- a
- დ
- d
- ე
- e
- ბ
- b
- ა
- a
- თ
- t̪ʰ
- ა
- a
- ვ
- v
- ი
- i
- ს
- s
- უ
- u
- ფ
- pʰ
- ა
- a
- ლ
- l
- ი
- i
- დ
- d
- ა
- a
- თ
- t̪ʰ
- ა
- a
- ნ
- n
- ა
- a
- ს
- s
- წ
- tsʼ
- ო
- o
- რ
- r
- ი
- i
- თ
- t̪ʰ
- ა
- a
- ვ
- v
- ი
- i
- ს
- s
- ი
- i
- ღ
- ɣ
- ი
- i
- რ
- r
- ს
- s
- ე
- e
- ბ
- b
- ი
- i
- თ
- t̪ʰ
- ა
- a
- დ
- d
- ა
- a
- უ
- u
- ფ
- pʰ
- ლ
- l
- ე
- e
- ბ
- b
- ე
- e
- ბ
- b
- ი
- i
- თ
- t̪ʰ
Romanisation: q'vela adamiani ibadeba tavisupali da tanasts'ori tavisi ghirsebita da uplebebit.
Other examples of modern-day alphabets include Latin, Greek, Cyrillic, Armenian, Mongolian, N'ko, Adlam and Osage. Ancient examples include Coptic, Avestan, Futhark and Ogham.
Of course, not everything that is called an ‘alphabet’ satisfies the definition as strictly as Georgian. Instead, most display cases in which graphemes do
not correspond to single phonemes. This can occur in several ways.
The most straightforward violation of the definition occurs when single graphemes denote
phoneme sequences. This seems to occur most commonly in Greek and derived alphabets. The Ancient Greek alphabet includes ⟨ζ ψ ξ⟩ /zd~dz ps ks/, though in Modern Greek ⟨ζ⟩ has been simplified to /z/. Coptic has inherited both ⟨ⲯ ⲝ⟩ /ps ks/, as well as deriving ⟨ⲋ⟩ /dz/ from an ⟨στ⟩ ligature, and ⟨Ϯ⟩ /ti/ from Demotic Egyptian. Latin inherited ⟨x⟩ /ks/ from a local Greek variant, but lost ⟨ξ⟩. Additionally, the unrelated Armenian alphabet has ⟨և⟩ /(j)ɛv/, derived from a ligature of ⟨եւ⟩.
(Note that in all these cases, only a few graphemes represent phoneme sequences. If this occurs systematically for a large number of graphemes, then the writing system is better analysed as an abugida, syllabary or semi-syllabary, rather than an alphabet.)
A related phenomenon is observed in Cyrillic, in which many graphemes add palatalisation to the preceding consonant. This is the case for vowel graphemes ⟨Е Ё Ю Я⟩,which palatalise the previous consonant if there is one, and represent the phoneme sequences /je ju ja/ otherwise. Historically, ⟨Е Ю Я⟩ derive from ligatures ⟨Ѥ Ю Ꙗ⟩, though only ⟨Ю⟩ has retained the ligated shape. There is also the
soft sign ⟨Ь⟩, which palatalises the previous consonant while representing no phoneme of its own, and similarly the
hard sign ⟨ъ⟩ which depalatises the previous phoneme.
(Again, Cyrillic can still be called an alphabet because this only affects a small number of graphemes. A writing system where many graphemes are used only to mark subphonemic features might be better analysed as a featural system.)
A similar case occurs in Irish, where many vowel graphemes serve only to mark the secondary articulation of an adjacent consonant. All vowel graphemes are considered either
caol ‘slender’ ⟨e é i í⟩, indicating palatalisation, or
leathan ‘broad’ ⟨a á o ó u ú⟩, indicating velarisation. Thus, for instance, ⟨dáil⟩ ‘assembly’ is /d̪ˠaːlʲ/: the ⟨i⟩ only marks the palatalisation of the following consonant. Presumably this system originated by transcribing the offglides of consonants.
Many alphabets also contain
multigraphs: grapheme sequences which are treated and read as a single unit. (Two graphemes form a
digraph, three a
trigraph, and so on.) Examples include ⟨ch sh th ng⟩ /tʃ ʃ θ~ð ŋ(ɡ)/ in English, ⟨ch cz sz rz⟩ /x t͡ʂ ʂ ʐ/ in Polish, ⟨ch dd ff ll ng ph rh si th⟩ /χ ð f ɬ ŋ(ɡ) f r̥ ʃ θ/ in Welsh, and ⟨αι ει οι ου υι γγ τσ τζ γκ μπ ντ⟩ /e̞ i i u i (ŋ)ɡ ts dz ɡ b d/ in Greek. Particularly close-knit digraphs sometimes tend to ligate into a single grapheme, as seen e.g. with
⟨IJ⟩ in Dutch.
Interestingly, it seems that writing systems often have one or two preferred strategies they use to form multigraphs. This can be seen in the examples: English adds ⟨h⟩, Polish adds ⟨z⟩, Welsh adds ⟨h⟩ or doubles the letter, and Greek adds ⟨ɩ⟩ or combines a voiced continuant with a voiceless stop. Why this happens, I’m not entirely sure, but we can speculate on some historical drivers:
- Shared inheritance: for instance Ancient Greek ⟨φ θ χ⟩ /pʰ tʰ kʰ/ were represented in Latin as ⟨ph th ch⟩, before the Greek letters shifted to /f θ χ/, forcing the Latin digraphs to be treated as single units. Similarly Greek ⟨αι ει οι ου υι⟩ became single units via sound change.
- Analogy: given an existing digraph, forming new digraphs using the same strategy is intuitively reasonable.
- Disambiguation: more of a functional motivation, but it seems plausible that multigraphs might become easier to distinguish from ordinary letters when given some regular marker.
Finally, most (if not all) writing systems display some degree of
irregularity. In an alphabet, this will of course obscure the grapheme–phoneme correspondence. In extreme cases, such as English, I’ve seen people seriously argue that this approaches a logography — the spelling of homophones such as ⟨by⟩/⟨bye⟩/⟨buy⟩ is related to semantics, rather than phonology. But I’d suggest such a system is still primarily alphabetic, since the spelling still corresponds systematically to phonology. (Of course, it might be a different story if we were to spell /ba͡i/ as ⟨xaz⟩.)
Next up: beyond the definition, what properties are common in alphabetic scripts? And what happens when those are violated?