Graphemic space

alice · Post by **alice** » Tue Oct 19, 2021 3:13 am

By analogy with the tendency of vowel qualities to spread out evenly across the possible vocalic space, is there anything comparable for graphemes ("letters of the alphabet")? This might explain why conlangers who try to create conscripts based on cursive handwriting have trouble coming up with graphemes which are satisfactorily different from those of the Roman or Cyrillic alphabets, for example. It does, however, necessitate a taxonomy of graphemes which probably doesn't exist yet; any thoughts?

bradrn · Post by **bradrn** » Tue Oct 19, 2021 3:35 am

alice wrote: ↑Tue Oct 19, 2021 3:13 am By analogy with the tendency of vowel qualities to spread out evenly across the possible vocalic space, is there anything comparable for graphemes ("letters of the alphabet")? This might explain why conlangers who try to create conscripts based on cursive handwriting have trouble coming up with graphemes which are satisfactorily different from those of the Roman or Cyrillic alphabets, for example. It does, however, necessitate a taxonomy of graphemes which probably doesn't exist yet; any thoughts?

I have wondered about this idea myself. I think zompist has also suggested something similar. But my understanding of real-world writing systems suggests otherwise: writing systems seem to have no problem with graphemes being excessively close to each other. Thai is an especially prominent example here, but writing systems as diverse as Kurrent, Geʼez, Tangut and Cherokee all show the same phenomenon. The most that could be said is that writing systems tend to evolve to make letters more distinguishable. (This article on Thai is especially interesting here.)

Creyeditor · Post by **Creyeditor** » Tue Oct 19, 2021 4:42 am

I think it's similar for graphemes and vowel phonemes. There is a tendency for them to be as different as possible (by exploiting given parameters) but it's just a tendency. Vowel phoneme clouds overlap (even in languages with small vowel phoneme inventories) and graphemes are somtimes surprisingly similar. Yet, no language only has /a/ vs. /æ/ and no script looks like Tengwar.

bradrn · Post by **bradrn** » Tue Oct 19, 2021 5:15 am

Creyeditor wrote: ↑Tue Oct 19, 2021 4:42 am I think it's similar for graphemes and vowel phonemes. There is a tendency for them to be as different as possible (by exploiting given parameters) but it's just a tendency. Vowel phoneme clouds overlap (even in languages with small vowel phoneme inventories) and graphemes are somtimes surprisingly similar. Yet, no language only has /a/ vs. /æ/ and no script looks like Tengwar.

I don’t think they’re quite that comparable. For one thing, vowels tend to be fairly far apart, even in large systems: I know of no language which genuinely has both [a] and [æ], or any other distinction as small (with the sole exception of Kensiu). But writing systems with distinctions as small as those of Tengwar are not difficult to find, despite the fact that there are far less writing systems than languages: Lampung and Hangeul are probably most comparable to Tengwar, and then there’s Thai, Cherokee etc. as I mentioned above.

Pabappa · Post by **Pabappa** » Tue Oct 19, 2021 6:55 am

i started with an alphabet that i acknowledged was Romanesque .... but as I worked with it over the years, I realized there were a lot of shapes that had just never occurred to me. We have only one Q, one G, etc .... but letter shapes with bumps and bolts in other places would still look Romanesque. However, I have no cursive form of the script and don't intend to create one.

Creyeditor · Post by **Creyeditor** » Tue Oct 19, 2021 12:49 pm

bradrn wrote: ↑Tue Oct 19, 2021 5:15 am
Creyeditor wrote: ↑Tue Oct 19, 2021 4:42 am I think it's similar for graphemes and vowel phonemes. There is a tendency for them to be as different as possible (by exploiting given parameters) but it's just a tendency. Vowel phoneme clouds overlap (even in languages with small vowel phoneme inventories) and graphemes are somtimes surprisingly similar. Yet, no language only has /a/ vs. /æ/ and no script looks like Tengwar.
I don’t think they’re quite that comparable. For one thing, vowels tend to be fairly far apart, even in large systems: I know of no language which genuinely has both [a] and [æ], or any other distinction as small (with the sole exception of Kensiu). But writing systems with distinctions as small as those of Tengwar are not difficult to find, despite the fact that there are far less writing systems than languages: Lampung and Hangeul are probably most comparable to Tengwar, and then there’s Thai, Cherokee etc. as I mentioned above.

I think graphemes and vowel phonemes are similar in that they are not points in space but clouds. Each realization is a bit different. And vowel clouds frequently overlap even in languages with three vowel phonemes, e.g. Kabardian.
And grapheme clouds can overlap, too. In German contemporary handwriting some realizations of <u> look identical to some realizations of <n>.

Also, I was referring to an improbable vowel phoneme inventory only consisting of /a/ and /æ/ without any other vowels. Fine vowel quality distinctions in larger phoneme inventories are common, I think. German lax high fromt vowels and tense mid vowels mostky differ in length, which is arguably neutralized in some contexts. And Moro (Kordofanian) has two schwas.

Moose-tache · Post by **Moose-tache** » Tue Oct 19, 2021 7:41 pm

There are real context differences between speaking and reading that may affect this question. For a long time, reading and writing meant entering a very special mode, which for some people made up only a small fraction of their language use and for others was a highly ritualized career tool. Text is sometimes stylized to make letters deliberately more similar, in a way that rarely happens to vowels (you don't see people in legal courts switching to all centralized voiceless vowels, for example). The purpose, ambiguity tolerance, required shared education, and aesthetic considerations for writing are all very different than speaking.

Post by **zompist** » Tue Oct 19, 2021 9:09 pm

FWIW, my statement in the LCK was modal-- it was about best practices, not about what natlang writing systems actually do. Actual writing systems can be terrible, with multiple letters merging (as in Arabic) or barely distinguishable (as in traditional Hebrew fonts). Medieval European calligraphy ("Gothic" letters) seem to aim for a forest of undifferentiated stalks and serifs. And if you do have elegant and mnemonic glyphs, users will distort them into unrecognizability in a few centuries.

bradrn · Post by **bradrn** » Tue Oct 19, 2021 9:36 pm

Creyeditor wrote: ↑Tue Oct 19, 2021 12:49 pm I think graphemes and vowel phonemes are similar in that they are not points in space but clouds. Each realization is a bit different. And vowel clouds frequently overlap even in languages with three vowel phonemes, e.g. Kabardian.

Of course, but I wasn’t claiming that vowel clouds never overlap. Just that they tend to spread out as far as possible in vowel space.

Also, I was referring to an improbable vowel phoneme inventory only consisting of /a/ and /æ/ without any other vowels. Fine vowel quality distinctions in larger phoneme inventories are common, I think. German lax high fromt vowels and tense mid vowels mostky differ in length, which is arguably neutralized in some contexts. And Moro (Kordofanian) has two schwas.

I don’t think such fine distinctions are all that common when you look at vowel systems phonetically. Vanishingly few languages have two vowels separated by only length, or only one height level. There’s a reason why e.g. English has [iː] and [ɪ], rather than [i] and [ɪ], or [iː] and [i]. My dialect does actually have a genuine example of a length distinction in [e̞] vs [e̞ː] (DRESS vs SQUARE), but even then the latter tends to be slightly diphthongised as something like [e̞ˑə̆], and most other dialects separate them even more.

On the other hand, my point is that writing systems are far less resistant to such clashes. Almost any writing system you might think of will have at least two letters which are difficult to tell apart, and some reach levels of confusion which are unheard-of in vowel systems. (The standout example here is undoubtedly Book Pahlavi, which underwent such extreme mergers that it only had 13 distinguishable graphemes. Though admittedly it fell out of use quite quickly.)

Moose-tache wrote: ↑Tue Oct 19, 2021 7:41 pm There are real context differences between speaking and reading that may affect this question. For a long time, reading and writing meant entering a very special mode, which for some people made up only a small fraction of their language use and for others was a highly ritualized career tool. Text is sometimes stylized to make letters deliberately more similar, in a way that rarely happens to vowels (you don't see people in legal courts switching to all centralized voiceless vowels, for example). The purpose, ambiguity tolerance, required shared education, and aesthetic considerations for writing are all very different than speaking.

I agree that this is a key point. An additional point underlying all of this is that writing is a far more conscious process than speaking: all scripts are, ultimately, conscripts, and people can consciously control their writing far more easily than they can their speaking.

Nortaneous · Post by **Nortaneous** » Tue Oct 19, 2021 10:46 pm

zompist wrote: ↑Tue Oct 19, 2021 9:09 pm FWIW, my statement in the LCK was modal-- it was about best practices, not about what natlang writing systems actually do. Actual writing systems can be terrible, with multiple letters merging (as in Arabic) or barely distinguishable (as in traditional Hebrew fonts).

Tocharian monks decided to adapt a full Brahmic script to a language with one stop series and then merge the letters for /t/ and /n/, to the point where arguments from sound change were needed to revise the reading of the verbal ending -mntär from earlier -mttär (specifically parallelism with -mc- > -mñc-).

Then again I don't think I reliably distinguish <a e o u> in cursive.

Creyeditor · Post by **Creyeditor** » Wed Oct 20, 2021 4:09 am

bradrn wrote: ↑Tue Oct 19, 2021 9:36 pm
Creyeditor wrote: ↑Tue Oct 19, 2021 12:49 pm I think graphemes and vowel phonemes are similar in that they are not points in space but clouds. Each realization is a bit different. And vowel clouds frequently overlap even in languages with three vowel phonemes, e.g. Kabardian.
Of course, but I wasn’t claiming that vowel clouds never overlap. Just that they tend to spread out as far as possible in vowel space.

Also, I was referring to an improbable vowel phoneme inventory only consisting of /a/ and /æ/ without any other vowels. Fine vowel quality distinctions in larger phoneme inventories are common, I think. German lax high fromt vowels and tense mid vowels mostky differ in length, which is arguably neutralized in some contexts. And Moro (Kordofanian) has two schwas.
I don’t think such fine distinctions are all that common when you look at vowel systems phonetically. Vanishingly few languages have two vowels separated by only length, or only one height level. There’s a reason why e.g. English has [iː] and [ɪ], rather than [i] and [ɪ], or [iː] and [i]. My dialect does actually have a genuine example of a length distinction in [e̞] vs [e̞ː] (DRESS vs SQUARE), but even then the latter tends to be slightly diphthongised as something like [e̞ˑə̆], and most other dialects separate them even more.

On the other hand, my point is that writing systems are far less resistant to such clashes. Almost any writing system you might think of will have at least two letters which are difficult to tell apart, and some reach levels of confusion which are unheard-of in vowel systems. (The standout example here is undoubtedly Book Pahlavi, which underwent such extreme mergers that it only had 13 distinguishable graphemes. Though admittedly it fell out of use quite quickly.)

I agree that graphemes and vowel phonemes are different. There is more pressure for uniformity in writing. But I think we have to agree to disagree on fine phonetic vowel quality distinctions.

Travis B. · Post by **Travis B.** » Wed Oct 20, 2021 1:37 pm

bradrn wrote: ↑Tue Oct 19, 2021 9:36 pm I don’t think such fine distinctions are all that common when you look at vowel systems phonetically. Vanishingly few languages have two vowels separated by only length, or only one height level. There’s a reason why e.g. English has [iː] and [ɪ], rather than [i] and [ɪ], or [iː] and [i]. My dialect does actually have a genuine example of a length distinction in [e̞] vs [e̞ː] (DRESS vs SQUARE), but even then the latter tends to be slightly diphthongised as something like [e̞ˑə̆], and most other dialects separate them even more.

Just using another English example, though, the English here has separate vowel quality and vowel quantity systems, such that ladder and latter, and madder and matter, are distinguished solely by vowel length, and vowel quantity is derived from historical consonant quality and consonant elision (which tends to make vowels longer by merging or lengthening them whenever hiatus is not possible) while vowel quality is derived from historical vowel quality/quantity.

bradrn · Post by **bradrn** » Wed Oct 20, 2021 5:55 pm

Travis B. wrote: ↑Wed Oct 20, 2021 1:37 pm
bradrn wrote: ↑Tue Oct 19, 2021 9:36 pm I don’t think such fine distinctions are all that common when you look at vowel systems phonetically. Vanishingly few languages have two vowels separated by only length, or only one height level. There’s a reason why e.g. English has [iː] and [ɪ], rather than [i] and [ɪ], or [iː] and [i]. My dialect does actually have a genuine example of a length distinction in [e̞] vs [e̞ː] (DRESS vs SQUARE), but even then the latter tends to be slightly diphthongised as something like [e̞ˑə̆], and most other dialects separate them even more.
Just using another English example, though, the English here has separate vowel quality and vowel quantity systems, such that ladder and latter, and madder and matter, are distinguished solely by vowel length, and vowel quantity is derived from historical consonant quality and consonant elision (which tends to make vowels longer by merging or lengthening them whenever hiatus is not possible) while vowel quality is derived from historical vowel quality/quantity.

What strange dialect do you speak? I have the [æː] vs [æ] distinction, and it even seems to be a purely length-based distinction, but the difference between ladder/latter is voicing (and tapping in the former word) rather than any sort of length.

kodé · Post by **kodé** » Wed Oct 20, 2021 7:40 pm

A different difference, as it were, between vowel phonemes and graphemes, is that graphemes can have several different allographs—but often in a different ways than phonemes have allophones. Allophony is contextual based on phonological context, I.e., other phonemes (or phonological structure). Some allography is based on graphic context, such as traditional typesetting of sequences like ‘fi’, or on structure, like with initial vs. medial vs. final vs. isolation forms of many Arabic letters. But other allography is based on non-graphical features, like non-italicized vs. italicized graphs, or lowercase vs. capital. These features can be syntactic (capitalization, sometimes), lexical (capitalization, other times), discourse-sensitive (CRUISE CONTROL FOR COOL), or sociolinguistic. I’m sure you could find allophony based on lexical or discourse factors, but I’m also pretty sure it isn’t systematic in the way allography is. You couuuuuld argue that font variation is similar to dialectal or register variation, but it bears a lot more thinking out.

As far as easily confusable graphemes, I’m a bit surprised that no one’s brought up Armenian (as an Armenian, I’m required to never shut up about Armenian). Even in clear script, Չ Ջ Ձ Զ are hard to distinguish—and have been hard for me for almost three decades. Certain printed fonts are pretty unreadable, and handwriting can be baaad.

Travis B. · Post by **Travis B.** » Wed Oct 20, 2021 8:19 pm

bradrn wrote: ↑Wed Oct 20, 2021 5:55 pm
Travis B. wrote: ↑Wed Oct 20, 2021 1:37 pm
bradrn wrote: ↑Tue Oct 19, 2021 9:36 pm I don’t think such fine distinctions are all that common when you look at vowel systems phonetically. Vanishingly few languages have two vowels separated by only length, or only one height level. There’s a reason why e.g. English has [iː] and [ɪ], rather than [i] and [ɪ], or [iː] and [i]. My dialect does actually have a genuine example of a length distinction in [e̞] vs [e̞ː] (DRESS vs SQUARE), but even then the latter tends to be slightly diphthongised as something like [e̞ˑə̆], and most other dialects separate them even more.
Just using another English example, though, the English here has separate vowel quality and vowel quantity systems, such that ladder and latter, and madder and matter, are distinguished solely by vowel length, and vowel quantity is derived from historical consonant quality and consonant elision (which tends to make vowels longer by merging or lengthening them whenever hiatus is not possible) while vowel quality is derived from historical vowel quality/quantity.
What strange dialect do you speak? I have the [æː] vs [æ] distinction, and it even seems to be a purely length-based distinction, but the difference between ladder/latter is voicing (and tapping in the former word) rather than any sort of length.

The strange dialect I speak is that of Milwaukee, WI. All in all, the diachronics are pretty simple. First, all phonemic vowel length was lost, reducing vowel distinctions to quality alone. Then, all vowels before fortis obstruents (with or without an intervening sonorant) turned short, and all other vowels turned long - simple vowel length allophony at this point. Then some voicing contrasts were lost, such as the distinction between intervocalic /t/ and /d/ and the voicing of /t/ versus /d/ before another plosive (in this case, though, the preglottalization distinction is still preserved). Additionally, many intervocalic consonants and even consonant clusters were lost; where hiatus was not possible, and in some case where it was possible, the preceding and following vowels merged either into diphthongs or lengthened vowels; if the preceding vowel was short, the resulting diphthong or lengthened vowel is long, and if the preceding vowel was long, the resulting diphthong or lengthened vowel is overlong. Note that here vowel nasalization was preserved, as if either original vowel was nasalized, the resulting diphthong or lengthened vowel is also nasalized.

bradrn · Post by **bradrn** » Wed Oct 20, 2021 8:38 pm

kodé wrote: ↑Wed Oct 20, 2021 7:40 pm A different difference, as it were, between vowel phonemes and graphemes, is that graphemes can have several different allographs—but often in a different ways than phonemes have allophones. Allophony is contextual based on phonological context, I.e., other phonemes (or phonological structure). Some allography is based on graphic context, such as traditional typesetting of sequences like ‘fi’, or on structure, like with initial vs. medial vs. final vs. isolation forms of many Arabic letters. But other allography is based on non-graphical features, like non-italicized vs. italicized graphs, or lowercase vs. capital. These features can be syntactic (capitalization, sometimes), lexical (capitalization, other times), discourse-sensitive (CRUISE CONTROL FOR COOL), or sociolinguistic. I’m sure you could find allophony based on lexical or discourse factors, but I’m also pretty sure it isn’t systematic in the way allography is. You couuuuuld argue that font variation is similar to dialectal or register variation, but it bears a lot more thinking out.

I’d argue that there are four main motivations for allography:

Free variation: decision between allographs is purely at the whims of the writer: e.g. Latin ⟨a~ɑ⟩, serif vs sans-serif
Stylistic: decision between allographs affects only emphasis and tone: e.g. italicisation, full-caps
Contextual: decision between allographs depends on context: e.g. initial/final forms in Arabic/Hebrew/Greek, sentence-initial capitalisation, ligatures
Semantic: there are minimal pairs between allographs and these have different meanings: e.g. sentence-internal capitalisation

Perhaps more relevantly for this thread, I’d also argue that there’s two entirely different forms of allography. As usual, I prefer to analyse it in terms of prototypes:

Polytypicality: a single grapheme with multiple prototypes: e.g. ⟨a⟩ vs ⟨ɑ⟩, or ⟨צ⟩ vs ⟨ץ⟩
Variation within the prototype: e.g. ⟨a⟩ vs ⟨a⟩, or ⟨ש⟩ vs ⟨ש⟩

I hypothesise that variation within the prototype is mostly associated with free variation, whereas polytypicality is mostly associated with other motivations for allography.

As far as easily confusable graphemes, I’m a bit surprised that no one’s brought up Armenian (as an Armenian, I’m required to never shut up about Armenian).

Only because I didn’t know about it! I’ll add it to my list of examples, thanks.

Zompist Bboard Again

Graphemic space

Graphemic space

Re: Graphemic space

Re: Graphemic space

Re: Graphemic space

Re: Graphemic space

Re: Graphemic space

Re: Graphemic space

Re: Graphemic space

Re: Graphemic space

Re: Graphemic space

Re: Graphemic space

Re: Graphemic space

Re: Graphemic space

Re: Graphemic space

Re: Graphemic space

Re: Graphemic space