A guide to writing systems

xxx · Post by **xxx** » Thu Dec 28, 2023 5:52 am

zompist wrote: ↑Wed Dec 27, 2023 5:28 pm I don't know what you're trying to say. If they're unknown, they're not objective

the objective fact is that alphabets are read as logographies...
no matter how...
I'm not interested in hypotheses about the inner workings of the black box...
I might as well abandon all human sciences and wait to reduce them to cerebral determinism...

let's stick to the subject, the validity and limits of different types of writing...
here I was pointing out that logographies relegated to the infancy of writing
are in fact much more natural and remain central to the use of purely phonetic systems,
which never really achieve their goal, and are perhaps an impoverishment rather than a positive evolution...

bradrn · Post by **bradrn** » Thu Dec 28, 2023 6:20 am

It feels like xxx’s posts (and the resulting discussion) are drifting well off-topic for this thread. Is there any chance of that being split off into a separate thread? (‘The relative merits of logographies’, perhaps?)

xxx · Post by **xxx** » Thu Dec 28, 2023 6:37 am

Sorry to derail the thread,
I tend to get too interested in the subject's limits,
and push it to its edges...
here the limits of the writing systems categories...
I'll try to interfere less...

bradrn · Post by **bradrn** » Thu Dec 28, 2023 7:18 am

Terminology, mark 2

From the preceding discussion, it seems clear that my original post was less clear than it should be. Let me try to present my terminology again, in a more systematic way.

The basic unit of a writing system is the grapheme. In different writing systems, graphemes can denote phonemes, syllables or words. Note that graphemes are abstract units, like phonemes: they are realised in written form as one or more allographs.

Many writing systems assemble graphemes into distinct graphical units, which for want of a standard term I have been calling blocks. Very often each block writes a single syllable, in which case I call them syllable blocks. Not all scripts have blocks as a distinct unit: they are missing in scripts such as Latin, Tifinagh and Meroitic. (And it is a moot point whether syllabaries should be considered as having one-grapheme-long syllable blocks or not.)

The next level of the hierarchy is the line of text. (Admittedly I use this word wiith some trepidation here, since text can also be written in spirals, circles, etc.) This is formed by arranging blocks next to each other in sequence, or individual graphemes for those scripts which lack blocks. Invariably, they are arranged linearly, in the same order as spoken, in a specific chosen direction: usually left-to-right, right-to-left, or top-to-bottom.

Finally, at the highest level we have the whole text, formed by arranging lines in sequence, generally arranged perpendicularly to their long axis. In some ancient texts, every second line is flipped (i.e. written boustrophedon). Other texts can have additional levels above this, e.g. by arranging groups of lines into columns.

All that being said… having written this, I think I see why masako got confused earlier: my concept of ‘linearity’ conflated different levels of the hierarchy. Specifically, I was calling a script ‘linear’ if its blocks were written in the same direction as its lines, or if it had no blocks at all.

And of course, this was confused for precisely the reasons people have been telling me. Lines of text are always linear — no matter whether their components are graphemes or blocks of graphemes. (And it’s pretty clear that we process them in one direction, too, which is presumably where all the ‘psychology’ stuff came from.) Meanwhile, blocks are tightly-bound units, for which the relative positioning of their constituent graphemes is fixed, and has nothing much to do with linearity at the next level up (that being the lines they comprise).

What makes this all even more confusing is that sloppy writers have previously been happy to mix up these separate concepts. To take but one example, the only reason why anyone would consider 'Phagspa to be at all similar to an alphabet is if the positioning of graphemes in syllable blocks (top-to-bottom) is conflated with the direction of writing in lines (also top-to-bottom).

But still, I think we can define different levels of ‘block-ness’, depending on the graphical salience of the block relative to the surrounding text. If a block is written in line with the surrounding text, that naturally decreases its salience. This is the case with 'Phagspa, Pollard and Pahawh Hmong — and interestingly, all three add spaces around their blocks to make them more distinct (plus a connecting bar in 'Phagspa). For that matter, Vietnamese does the same for Latin, creating syllable blocks in a script which usually doesn’t have them. But in all of these cases, the blocks are still less immediately obvious than those of, for instance, Thaana, Hangeul and Ge'ez.

(Hmm… if I‘m saying that Vietnamese has syllable blocks, does that also mean that English has ‘word blocks’? I dislike this analysis — largely because English words have no rigidly fixed internal structure or relative positioning, so they behave quite differently to the blocks I’ve identified in other scripts. But it’s an interesting thought!)

In this light, let me revisit the properties of typical alphabets from my last post:

bradrn wrote: ↑Tue Dec 26, 2023 5:34 am

Graphemes are arranged in a single line in one direction

Consonants and vowels are graphically equal

Each phoneme corresponds to one grapheme

It is simpler and more revealing to merge both (1) and (2) together, by saying that alphabets tend not to have blocks. Alphabets which defy this have often left writers confused, as with Pollard, Lao and Thaana.

Similarly, we can say that abugidas tend to have blocks — and indeed people have also been confused by ones which don’t, as with Meroitic.

But I dislike thinking of blocks as part of the definition: rather, I define ‘alphabets’ and ‘abugidas’ by which unit of language their graphemes represent. Whether they have syllable blocks or not is then an entirely orthogonal property.

keenir · Post by **keenir** » Thu Dec 28, 2023 7:36 am

here the limits of the writing systems categories...
I'll try to interfere less...

Nobody is accusing you of interfering.

xxx wrote: ↑Thu Dec 28, 2023 6:37 am Sorry to derail the thread,
I tend to get too interested in the subject's limits,
and push it to its edges...

Lets first make sure we're using the words the same way: you mentioned that logographies are what arise early in a script's development...do you perhaps mean "pictographs" or maybe "ideograms" ?

xxx wrote: ↑Thu Dec 28, 2023 5:52 am
zompist wrote: ↑Wed Dec 27, 2023 5:28 pm I don't know what you're trying to say. If they're unknown, they're not objective
the objective fact is that alphabets are read as logographies...
no matter how...

and how are logographies read?

here I was pointing out that logographies relegated to the infancy of writing
are in fact much more natural and remain central to the use of purely phonetic systems,
which never really achieve their goal, and are perhaps an impoverishment rather than a positive evolution...

xxx · Post by **xxx** » Thu Dec 28, 2023 7:46 am

keenir wrote: ↑Thu Dec 28, 2023 7:36 amNobody is accusing you of interfering.

bradrn does, it's enough...
elsewhere, if you want...

keenir · Post by **keenir** » Thu Dec 28, 2023 8:05 am

xxx wrote: ↑Thu Dec 28, 2023 7:46 am
keenir wrote: ↑Thu Dec 28, 2023 7:36 amNobody is accusing you of interfering.
bradrn does, it's enough...

derailing =/= interfering

Richard W · Post by **Richard W** » Fri Dec 29, 2023 6:20 pm

bradrn wrote: ↑Wed Dec 27, 2023 7:04 am And of course, all syllabaries have syllable blocks by definition! The property ‘having syllable blocks’ is independent of the property ‘is an abugida’ (i.e. ‘has an inherent vowel’), though there is admittedly a strong correlation.

I don't agree that all syllabaries have syllable blocks. A pure CV syllabary (like, I think, Linear B) just has CV elements.

bradrn · Post by **bradrn** » Sat Dec 30, 2023 1:53 am

Richard W wrote: ↑Fri Dec 29, 2023 6:20 pm
bradrn wrote: ↑Wed Dec 27, 2023 7:04 am And of course, all syllabaries have syllable blocks by definition! The property ‘having syllable blocks’ is independent of the property ‘is an abugida’ (i.e. ‘has an inherent vowel’), though there is admittedly a strong correlation.
I don't agree that all syllabaries have syllable blocks. A pure CV syllabary (like, I think, Linear B) just has CV elements.

My later post was more explicit and measured about this:

bradrn wrote: ↑Thu Dec 28, 2023 7:18 am (And it is a moot point whether syllabaries should be considered as having one-grapheme-long syllable blocks or not.)

Richard W · Post by **Richard W** » Sun Dec 31, 2023 6:22 am

bradrn wrote: ↑Thu Dec 28, 2023 7:18 am But I dislike thinking of blocks as part of the definition: rather, I define ‘alphabets’ and ‘abugidas’ by which unit of language their graphemes represent. Whether they have syllable blocks or not is then an entirely orthogonal property.

I commend Bright's term 'alphasyllabary' to you.

The first abugida I learnt was the Thai script for Thai, and that has perhaps shaped my interpretation of other abugidas. Ignoring digraphs and their like, I would say that both consonants and vowels are represented by graphemes. One sometimes has a null grapheme, in contextually influenced environments, both for vowels and the final glottal stop. If we pursue the phoneme analogy, I would say you were thinking of the vowel symbols of abugidas as suprasegmentals.

One aspect that we easily overlook is the independent vowels in Brahmic scripts. The Indian treatment is of them as allographs of the dependent vowels, and this seems to be native, not a view deriving from the handling of vowels in alphabets and abjads. While I see precious little connection between them in the early Brahmi script, it seems as well based as the allomorphy in suppletive paradigms such as go, went, been 'to go (and later return)'. This allography breaks down for vernacular languages when the independent vowel /a/ is adopted as a consonant.

Thinking emically, one item you've not addressed is punctuation, and in particular, aeration and word dividers. While it's largely orthogonal to the development of systems, it does affect how the systems work when it comes to reading. And at least one writing system, though sometimes alleged to be counterfeit, the Ramkhamhaeng orthography for Thai, used spaces before to mark off codas.

bradrn · Post by **bradrn** » Sun Dec 31, 2023 7:33 am

Richard W wrote: ↑Sun Dec 31, 2023 6:22 am
bradrn wrote: ↑Thu Dec 28, 2023 7:18 am But I dislike thinking of blocks as part of the definition: rather, I define ‘alphabets’ and ‘abugidas’ by which unit of language their graphemes represent. Whether they have syllable blocks or not is then an entirely orthogonal property.
I commend Bright's term 'alphasyllabary' to you.

I’ve already seen the term, but don’t see the connection with what I said.

The first abugida I learnt was the Thai script for Thai, and that has perhaps shaped my interpretation of other abugidas. Ignoring digraphs and their like, I would say that both consonants and vowels are represented by graphemes. One sometimes has a null grapheme, in contextually influenced environments, both for vowels and the final glottal stop.

Indeed, I consider both consonants and vowels as being represented by graphemes. (Like we’ve both said, the vowels don’t even have to be diacritical.) In my head, I’ve been analysing it as a system where a consonantal grapheme can also denote a whole CV syllable, but the ‘null grapheme’ analysis could work equally well. Are there any obvious points to recommend one over the other?

If we pursue the phoneme analogy, I would say you were thinking of the vowel symbols of abugidas as suprasegmentals.

Not so. For that matter, I’m not even sure what the equivalent of a ‘suprasegmental’ would be for a writing system — typographic emphasis, perhaps? But the analogy feels like it’s getting strained at this point.

One aspect that we easily overlook is the independent vowels in Brahmic scripts. The Indian treatment is of them as allographs of the dependent vowels, and this seems to be native, not a view deriving from the handling of vowels in alphabets and abjads.

That’s quite interesting; do you have a reference?

This allography breaks down for vernacular languages when the independent vowel /a/ is adopted as a consonant.

Wait, what‽ Independent /a/ has been used as a consonant? Where?

Thinking emically, one item you've not addressed is punctuation, and in particular, aeration and word dividers. While it's largely orthogonal to the development of systems, it does affect how the systems work when it comes to reading.

Yet another thing I plan to cover later! (And in as much detail as I can manage, given that it’s one of those areas which is often neglected.) My plan was to cover the main functional types of writing systems, then move on to punctuation and other typographical issues, though clearly that seems to be taking much longer than I imagined…

(That being said, my last post did talk a little bit about the role of spacing in establishing syllable blocks.)

And at least one writing system, though sometimes alleged to be counterfeit, the Ramkhamhaeng orthography for Thai, used spaces before to mark off codas.

I hadn’t heard of this before; do you have any more details on how this works?

EDIT: never mind, found it on Ian James’s website:

skyknowledge wrote: Generally, and especially where there may be ambiguities, a syllable’s vowel letter(s) will stick close to the syllable’s initial consonant, and there will appear to be a slight space before any syllabic-final consonant. The clustering of a syllable’s initial consonant with /r/ and /l/ will also be written closely, as will the tone-changing letter /h/ before its consonant.

So, in my view, it looks like this script consistently associates consonant clusters with the following syllable… which makes a lot of sense: why bothering distinguishing CVC+CV from CV+CCV? (I seem to recall Javanese does something like this too, though I’d have to double-check.)

EDIT 2: yes, Javanese does this. In fact it goes further: according to a Javanese person I’ve previously asked about this, it’s grouped with the following vowel even when that’s a different word! (Recall that the grouping is much more graphically obvious in Javanese, too.)

Post by **zompist** » Sun Dec 31, 2023 8:05 pm

bradrn wrote: ↑Thu Dec 28, 2023 7:18 am But still, I think we can define different levels of ‘block-ness’, depending on the graphical salience of the block relative to the surrounding text. If a block is written in line with the surrounding text, that naturally decreases its salience. This is the case with 'Phagspa, Pollard and Pahawh Hmong — and interestingly, all three add spaces around their blocks to make them more distinct (plus a connecting bar in 'Phagspa). For that matter, Vietnamese does the same for Latin, creating syllable blocks in a script which usually doesn’t have them. But in all of these cases, the blocks are still less immediately obvious than those of, for instance, Thaana, Hangeul and Ge'ez.

(Hmm… if I‘m saying that Vietnamese has syllable blocks, does that also mean that English has ‘word blocks’? I dislike this analysis — largely because English words have no rigidly fixed internal structure or relative positioning, so they behave quite differently to the blocks I’ve identified in other scripts. But it’s an interesting thought!)

I think you have a good insight here that's often overlooked. Linguists often have to tell people that spaces are not a linguistic fact, because of course they're not part of spoken language. But you can definitely make a case that spaces, and other conventions like lines and spacing dots, are an important part of written language, on a par with the symbols, and presumably meet some perceived need. (Early Sumerian writing put each word in a separate box, without worrying too much about order within the box.)

And yeah, if this is correct, you should probably not go on to say that "alphabets tend not to have blocks".

Richard W · Post by **Richard W** » Sun Dec 31, 2023 8:41 pm

bradrn wrote: ↑Sun Dec 31, 2023 7:33 am
Richard W wrote: ↑Sun Dec 31, 2023 6:22 am
bradrn wrote: ↑Thu Dec 28, 2023 7:18 am But I dislike thinking of blocks as part of the definition: rather, I define ‘alphabets’ and ‘abugidas’ by which unit of language their graphemes represent. Whether they have syllable blocks or not is then an entirely orthogonal property.
I commend Bright's term 'alphasyllabary' to you.
I’ve already seen the term, but don’t see the connection with what I said.

The first abugida I learnt was the Thai script for Thai, and that has perhaps shaped my interpretation of other abugidas. Ignoring digraphs and their like, I would say that both consonants and vowels are represented by graphemes. One sometimes has a null grapheme, in contextually influenced environments, both for vowels and the final glottal stop.
Indeed, I consider both consonants and vowels as being represented by graphemes. (Like we’ve both said, the vowels don’t even have to be diacritical.) In my head, I’ve been analysing it as a system where a consonantal grapheme can also denote a whole CV syllable, but the ‘null grapheme’ analysis could work equally well. Are there any obvious points to recommend one over the other?

If we pursue the phoneme analogy, I would say you were thinking of the vowel symbols of abugidas as suprasegmentals.
Not so. For that matter, I’m not even sure what the equivalent of a ‘suprasegmental’ would be for a writing system — typographic emphasis, perhaps? But the analogy feels like it’s getting strained at this point.

You said that you distinguish the categories, such as alphabets and abugidas, by what the graphemes represent. But in both alphabets and abugidas, the graphemes principally represent consonants and vowels. In Daniels' definitions, the difference between an alphabet and an abugida is principally whether an absence of a vowel grapheme indicates the lack of a vowel or the default grapheme. What's the difference in what is represented that you had in mind?

One advantage of the notion of a null grapheme (though the concept of being missing also works) is that it makes it easier to switch writing system. For example, there are two major ways of writing Pali in the Thai script. The more formal system is an abugida, where absence of vowel marking implies the implicit vowel, while in the informal system, the absence of vowel marking implies the absence of a vowel. The inherent vowel isn't really inherent in the consonant; there is a script wide rule for what vowel to apply when there is no vowel mark. Now, in languages which have or had vowel registers, the consonant cluster may determine what the default vowel is, but the cluster also switches between two sets of vowel readings. The best known example of this is Khmer. Thai is a bit different, in that open and closed syllables have different default vowels, but all the consonants have the same rules for the default vowel. Thai also has an epenthetic vowel, whose presence is almost predictable, that complicates analysis. It's phonological status is unclear to me; some Thais have denied that it is the same as the implicit vowel. Major authorities, who also deny the existence of the cluster /sr/ (recorded in the definitive Thai dictionary) and the cluster /tʰr/ (found in an Indic loanword, and seeminly as stable in loanwords from English as the acknowledged /tr/) and of falling tone on short checked syllables, do not acknowledge this difference.

bradrn wrote: ↑Sun Dec 31, 2023 7:33 am

One aspect that we easily overlook is the independent vowels in Brahmic scripts. The Indian treatment is of them as allographs of the dependent vowels, and this seems to be native, not a view deriving from the handling of vowels in alphabets and abjads.
That’s quite interesting; do you have a reference?

It's a realisation that came to me when reading Kaccayana's Pali grammar. The most plausible date for it is around 500 AD, although the author is traditionally supposed to have been one of the Buddha's disciples. It's well-argued that the author was very familiar with writing, and thought in terms of writing. Perhaps a commemorative nom de plume? I was reading it in Devanagari, but functionally the Sinhala and Burmese scripts have no relevant difference from Devanagari.

This allography breaks down for vernacular languages when the independent vowel /a/ is adopted as a consonant.
Wait, what‽ Independent /a/ has been used as a consonant? Where?

In Indospheric mainland SE Asia, the main languages have a syllable structure with an obligatory onset, and so the independent vowels are interpreted as including an initial glottal stop. So, the Indic initial /a/ is borrowed as /ʔa/, and so the independent vowel for /a/ is interpreted as being the letter for /ʔ/. The intervocalic glottal stop is quite salient in Thai (stronger than in Estuarine English, and both are different from that in Norfolk English), and glottal stops partake in initial consonant clusters in Khmer. The general rule is to use independent vowels for obvious Indic loans and the consonant for other words. The Thai script subfamily (so including Lao and Tai Viet) is unique in having ditched the other independent vowels.

Another innovation from that area is the use of matres lectionis in an abugida, not an abjad. The subscript letters for Indic /a/, /v/ and /y/ are used for representing vowels (Khmer and Tai languages) and tones (Burmese). I think the starting point for the vowels was to approximate [ua] by writing it as <ava> and then subscripting to indicate that the implicit vowel wasn't there, while the tone sounds a bit like the effect of a glottalised coda. I'm not sure how a glottal stop represents a long vowel, but I've always compared it to Arabic alif for vowels. In the Thai writing system, subscripting stopped a long time ago, and now we even have a Thai parallel to alif otiosum.

The Bengali writing system has also developed a mater lectionis.

bradrn wrote: ↑Sun Dec 31, 2023 7:33 am
Thinking emically, one item you've not addressed is punctuation, and in particular, aeration and word dividers. While it's largely orthogonal to the development of systems, it does affect how the systems work when it comes to reading.
Yet another thing I plan to cover later! (And in as much detail as I can manage, given that it’s one of those areas which is often neglected.) My plan was to cover the main functional types of writing systems, then move on to punctuation and other typographical issues, though clearly that seems to be taking much longer than I imagined…

(That being said, my last post did talk a little bit about the role of spacing in establishing syllable blocks.)

And at least one writing system, though sometimes alleged to be counterfeit, the Ramkhamhaeng orthography for Thai, used spaces before to mark off codas.
I hadn’t heard of this before; do you have any more details on how this works?

EDIT: never mind, found it on Ian James’s website:

skyknowledge wrote: Generally, and especially where there may be ambiguities, a syllable’s vowel letter(s) will stick close to the syllable’s initial consonant, and there will appear to be a slight space before any syllabic-final consonant. The clustering of a syllable’s initial consonant with /r/ and /l/ will also be written closely, as will the tone-changing letter /h/ before its consonant.
So, in my view, it looks like this script consistently associates consonant clusters with the following syllable… which makes a lot of sense: why bothering distinguishing CVC+CV from CV+CCV? (I seem to recall Javanese does something like this too, though I’d have to double-check.)

EDIT 2: yes, Javanese does this. In fact it goes further: according to a Javanese person I’ve previously asked about this, it’s grouped with the following vowel even when that’s a different word! (Recall that the grouping is much more graphically obvious in Javanese, too.)

Writing CVC+CV as CV+CCV goes back to the beginning of the Indic scripts. The general writing of CVC took a while to get established in Brahmi, with two different solutions. The only word-final consonant in Prakrit was the place-neutral nasal, and early Brahmi took the early Latin solution of not writing geminates. I'm not sure about the timing in Kharoshthi.

Spacing is also significant in notating CC in the Sinhala script. The simplest way of writing a 'conjunct' is for the successive consonants to touch. (Confusingly, sometimes the letters visibly don't actually touch.) For Sinhala, the term 'conjunct' is also restricted to those forms where there is a more complicated 'ligature' - sometimes the rightmost arch of a letter will rise and join the start of the following letter, and the other general design is to reduce the first letter to an additional stroke attached on the left of the following letter, when the conjunct is nasal + stop or a geminate cluster. Preceding and following <r> attach as repha and rakar in conjuncts, and following <y> takes a different shape in what is also termed a conjunct.

bradrn · Post by **bradrn** » Sun Dec 31, 2023 10:42 pm

zompist wrote: ↑Sun Dec 31, 2023 8:05 pm Linguists often have to tell people that spaces are not a linguistic fact, because of course they're not part of spoken language. But you can definitely make a case that spaces, and other conventions like lines and spacing dots, are an important part of written language, on a par with the symbols, and presumably meet some perceived need. (Early Sumerian writing put each word in a separate box, without worrying too much about order within the box.)

Indeed, I agree with this. As I’ve discussed with you before, I’ve come to believe that ‘word’ doesn’t exist as a valid concept in most languages… but in written language they’re very important!

And yeah, if this is correct, you should probably not go on to say that "alphabets tend not to have blocks".

Well, as with most linguistic terms, I suppose that it’s more of a continuum. The ideal block has properties such as:

fixed internal structure;
constituents not arranged the same way as the overall writing direction;
is clearly graphically distinguished from surrounding blocks;
denotes a sensible phonological category;

and probably other things I haven’t thought of. This ideal case is found in, for instance, Hangeul and Pahawh Hmong. But then you get cases like Pollard and Vietnamese which satisfy some but not all of the criteria, and English where the blocks are graphically present but have few other properties.

Along these lines, here’s an interesting test I just thought of: what happens if you ask someone to reverse a sentence? With scripts like Thai and Hangeul, presumably people would keep syllable blocks together and just reverse the order of the syllables. But in English, it would be equally valid to reverse all the characters, or to rearrange the words within the sentence. So Thai and Hangeul have more prominent syllable blocks than does English.

It’s probably also relevant that spaces can be present at different levels of the phonological hierarchy. In Pahawh Hmong and Vietnamese, spaces separate syllables; in English, they separate words; and in Thai, they separate clauses. (And letterspacing is used for emphasis in German.) Thus, just because something is spaced out, that doesn’t mean it’s a ‘block’ in the sense I’m using! For instance, Hangeul uses blocks for syllables, but spaces between words.

Richard W wrote: ↑Sun Dec 31, 2023 8:41 pm You said that you distinguish the categories, such as alphabets and abugidas, by what the graphemes represent. But in both alphabets and abugidas, the graphemes principally represent consonants and vowels. In Daniels' definitions, the difference between an alphabet and an abugida is principally whether an absence of a vowel grapheme indicates the lack of a vowel or the default grapheme. What's the difference in what is represented that you had in mind?

Well, what I was thinking is that a consonantal grapheme would represent either a single consonant or a single syllable. But I’m leaning towards the ‘null grapheme’ analysis, for the same kinds of reasons you mentioned. I hadn’t heard of Pali Thai specifically, but I know that Hindi Devanagari does similar things.

Another advantage is that it allows a uniform account of the typology:

In alphabets, both C and V are written explicitly;
In abugidas, C is written explicitly while V may be null
In abjads, C is written explicitly while V is always null.

On the other hand, the other way of thinking has advantages too. For instance it accounts better for cases like Old Persian, where consonantal graphemes genuinely do represent syllables. There’s also scripts like Ethiopic and Tamil, where the vowel diacritics are irregular enough that it approaches a syllabary. Even Khmer may be better analysed like this, given that the inherent vowel is strongly associated with the choice of consonant. It also explains why reading Brahmic letters out loud gives names like ‘ka, kha, ga, gha’. Like I said, either analysis can be valid, and different ones are better for different scripts. (And probably the truth is that it’s ambiguous.)

Thai is a bit different, in that open and closed syllables have different default vowels

I hadn’t known this; however did that evolve?

In Indospheric mainland SE Asia, the main languages have a syllable structure with an obligatory onset, and so the independent vowels are interpreted as including an initial glottal stop. So, the Indic initial /a/ is borrowed as /ʔa/, and so the independent vowel for /a/ is interpreted as being the letter for /ʔ/.

Ah, so that’s where the ‘zero initial’ grapheme came from!

Another innovation from that area is the use of matres lectionis in an abugida, not an abjad. The subscript letters for Indic /a/, /v/ and /y/ are used for representing vowels (Khmer and Tai languages) … The Bengali writing system has also developed a mater lectionis.

I don’t find this particularly obvious when looking at the scripts. Do you have a source for this history?

In the Thai writing system, subscripting stopped a long time ago, and now we even have a Thai parallel to alif otiosum.

Hmm, what’s alif otiosum? Searching the term online turns up nothing helpful.

Writing CVC+CV as CV+CCV goes back to the beginning of the Indic scripts.

Sure, my point is that in Javanese it works across words as well.

(Incidentally, I asked my contact again, and they provided me with an example: aku mangan ayam, written ⟨ꦲꦏꦸꦩꦔꦤ꧀ꦲꦪꦩ꧀⟩. Here ⟨ꦤ꧀ꦲ⟩ is a single syllable nha — it would appear that ⟨ꦲ⟩ ha has become a zero-initial consonant. I asked them when independent vowels are used, but they weren’t sure.)

Spacing is also significant in notating CC in the Sinhala script. The simplest way of writing a 'conjunct' is for the successive consonants to touch. (Confusingly, sometimes the letters visibly don't actually touch.) For Sinhala, the term 'conjunct' is also restricted to those forms where there is a more complicated 'ligature' - sometimes the rightmost arch of a letter will rise and join the start of the following letter, and the other general design is to reduce the first letter to an additional stroke attached on the left of the following letter, when the conjunct is nasal + stop or a geminate cluster. Preceding and following <r> attach as repha and rakar in conjuncts, and following <y> takes a different shape in what is also termed a conjunct.

Very interesting, thanks!

Richard W · Post by **Richard W** » Mon Jan 01, 2024 12:33 pm

bradrn wrote: ↑Sun Dec 31, 2023 10:42 pm
Richard W wrote: ↑Sun Dec 31, 2023 8:41 pm In the Thai writing system, subscripting stopped a long time ago, and now we even have a Thai parallel to alif otiosum.
Hmm, what’s alif otiosum? Searching the term online turns up nothing helpful.

Also known as 'guarding alif' or 'separating alif', which are translations of the Arabic names. When waw is not joined to the previous letter, it is hard to tell whether it belongs with the previous sequence of joined letters or with the next. For clarification, an alif is inserted after waw for the third person plural of the verb. Details are in paragraph 7 remark (a) on p11 of the 3rd edition of Wright's grammar.

The Thai parallel is that in open syllables, Thai writes the glottal stop letter after the normal symbol for the vowel /ɯː/; glottal stops don't follow long vowels. Conceivably the vowel symbol overhung its phonetically leading consonant on the right, misimplying that the next consonant was part of the same phonetic syllable. Compared to another vowel symbol that goes above, this vowel has an extra vertical slope on the right. This extra letter doesn't appear in the corresponding Lao spelling.

bradrn · Post by **bradrn** » Mon Jan 01, 2024 5:45 pm

Richard W wrote: ↑Mon Jan 01, 2024 12:33 pm
bradrn wrote: ↑Sun Dec 31, 2023 10:42 pm
Richard W wrote: ↑Sun Dec 31, 2023 8:41 pm In the Thai writing system, subscripting stopped a long time ago, and now we even have a Thai parallel to alif otiosum.
Hmm, what’s alif otiosum? Searching the term online turns up nothing helpful.
Also known as 'guarding alif' or 'separating alif', which are translations of the Arabic names. When waw is not joined to the previous letter, it is hard to tell whether it belongs with the previous sequence of joined letters or with the next. For clarification, an alif is inserted after waw for the third person plural of the verb. Details are in paragraph 7 remark (a) on p11 of the 3rd edition of Wright's grammar.

The Thai parallel is that in open syllables, Thai writes the glottal stop letter after the normal symbol for the vowel /ɯː/; glottal stops don't follow long vowels. Conceivably the vowel symbol overhung its phonetically leading consonant on the right, misimplying that the next consonant was part of the same phonetic syllable. Compared to another vowel symbol that goes above, this vowel has an extra vertical slope on the right. This extra letter doesn't appear in the corresponding Lao spelling.

Thanks for explaining!

Richard W · Post by **Richard W** » Mon Jan 01, 2024 6:47 pm

bradrn wrote: ↑Sun Dec 31, 2023 10:42 pm Well, what I was thinking is that a consonantal grapheme would represent either a single consonant or a single syllable. But I’m leaning towards the ‘null grapheme’ analysis, for the same kinds of reasons you mentioned. I hadn’t heard of Pali Thai specifically, but I know that Hindi Devanagari does similar things.

Another advantage is that it allows a uniform account of the typology:

In alphabets, both C and V are written explicitly;

In abugidas, C is written explicitly while V may be null

In abjads, C is written explicitly while V is always null.

On the other hand, the other way of thinking has advantages too. For instance it accounts better for cases like Old Persian, where consonantal graphemes genuinely do represent syllables. There’s also scripts like Ethiopic and Tamil, where the vowel diacritics are irregular enough that it approaches a syllabary. Even Khmer may be better analysed like this, given that the inherent vowel is strongly associated with the choice of consonant. It also explains why reading Brahmic letters out loud gives names like ‘ka, kha, ga, gha’. Like I said, either analysis can be valid, and different ones are better for different scripts. (And probably the truth is that it’s ambiguous.)

"Silence is golden." Why haven't you mentioned graphemes for the absence of a vowel?

But in Old Persian, consonant graphemes only genuinely represent syllables if there is no following vowel grapheme, in which case the default vowel is implied.

I've found a few Sinhala CV combinations that are weird enough that they have to be learnt for reading. Tamil CV combinations are generally guessable for reading; they're just a problem for writing.

Khmer is a bit more complicated than your argument assumes. It's not just that some consonants belong to Series 1 and others to Series 2. Within words, there's a thing called consonant governance, or rightward register spreading, by which the vowel reading rule for the second syllable is determined by the initial consonant of the first syllable if the first syllable starts with an occlusive consonant and the second with a resonant. It does have lexically-specified exceptions. The rule also applies to Mon, and to Thai, where it's not the vowel that's affected, but the tone. (Well, the tone is in the onset, isn't it?

In Thai, it chiefly applies to Indic words, as the spelling generally reflects the actual pronunciation in non-Indic words. Oh, and Khmer has a diacritic (treisap) to turn a Series 1 consonant into a Series 2 consonant, and another diacritic (~muusakatoan, = rat's teeth) to turn a series 2 consonant into Series 1.

bradrn wrote: ↑Sun Dec 31, 2023 10:42 pm
Thai is a bit different, in that open and closed syllables have different default vowels
I hadn’t known this; however did that evolve?

I don't know. The sounds in Thai are /a/ and /o/, but I think there are quite a few complicating factors:

Khmer had 2 orthographic vowel notions, <a> (null glyph) and <ā>, to cover 2 degrees of aperture and 2 lengths. They prioritised aperture.
Extreme reluctance to change Indic spellings.
Thai barely has phonetically short vowels in open syllables - the preferred treatment is to declare them as underlyingly terminated in a glottal stop, and delete that as appropriate.
From somewhere there came a notion of doubling a final consonant to indicate vowel quality. This is best known from Old Thai orthography, but also appeared in Angkorian Khmer, which is older.
It may reasonably be suspected that the royal court of Siam spoke Khmer at some stage.

The short 'a' as for Indic words may be written in modern Khmer with the diacritic sanya, which looks very like the vowel sign for /a/ in closed syllables in Thai. In the 19th century, Michell, in his Thai-English dictionary, denied accusations that he was treating this Thai diacritic as a vowel symbol!

The sound of the Thai implicit vowel in orthographically closed syllable is determined by the following consonant letter. Before <r> (which is sounded /n/) it is /ɔː/. Before <h>, which is silent, it is /ɔʔ/, but is written as such, so that form Sanskrit graha we get เคราะห์ /kʰrɔʔ/. Otherwise, it is just /o/. To me, there is another indication that final <h> was pronounced in a relevant form of Siamese, albeit as a generally marginal sound, the tone marks in some Indic loanwords, such as เสน่ห์. The tone mark indicates the tone one would get if the now silent <h> had been a stop consonant. (I think occlusive is actually the relevant phonological category.)

The orthographically open vowel of Thai is occasionally /o/ or /ɔː/, though in most of the words bar letters that I know the latter is shortened to /ɔ/ in normal speech. In the names of letters, it is /ɔː/, and this includes acronyms.

More replies anon.

masako · Post by **masako** » Mon Jan 01, 2024 7:38 pm

"There’s also scripts like Ethiopic and Tamil, where the vowel diacritics are irregular enough that it approaches a syllabary."

Ge'ez and Tamil are both abugidas, and another way of describing an abugida is "alphasyllabary". This synonym has been around for decades, so when you say "it approaches a syllabary" it just sounds redundant and kinda silly.

bradrn · Post by **bradrn** » Mon Jan 01, 2024 8:43 pm

Richard W wrote: ↑Mon Jan 01, 2024 6:47 pm
On the other hand, the other way of thinking has advantages too. For instance it accounts better for cases like Old Persian, where consonantal graphemes genuinely do represent syllables. There’s also scripts like Ethiopic and Tamil, where the vowel diacritics are irregular enough that it approaches a syllabary. Even Khmer may be better analysed like this, given that the inherent vowel is strongly associated with the choice of consonant. It also explains why reading Brahmic letters out loud gives names like ‘ka, kha, ga, gha’. Like I said, either analysis can be valid, and different ones are better for different scripts. (And probably the truth is that it’s ambiguous.)
"Silence is golden." Why haven't you mentioned graphemes for the absence of a vowel?

I don’t know, actually. That’s another good argument for the syllabic analysis!

Khmer is a bit more complicated than your argument assumes. It's not just that some consonants belong to Series 1 and others to Series 2. Within words, there's a thing called consonant governance, or rightward register spreading, by which the vowel reading rule for the second syllable is determined by the initial consonant of the first syllable if the first syllable starts with an occlusive consonant and the second with a resonant. It does have lexically-specified exceptions. The rule also applies to Mon, and to Thai, where it's not the vowel that's affected, but the tone. (Well, the tone is in the onset, isn't it? In Thai, it chiefly applies to Indic words, as the spelling generally reflects the actual pronunciation in non-Indic words.

Oh, very interesting! It makes perfect sense diachronically, of course, but I hadn’t considered the synchronic implications.

(And now I’m wondering if Burmese has anything similar?)

Khmer had 2 orthographic vowel notions, <a> (null glyph) and <ā>, to cover 2 degrees of aperture and 2 lengths. They prioritised aperture.

I don’t quite understand this… consulting Wikipedia, it looks like all Khmer vowel sounds are written separately?

Thai barely has phonetically short vowels in open syllables - the preferred treatment is to declare them as underlyingly terminated in a glottal stop, and delete that as appropriate.

From somewhere there came a notion of doubling a final consonant to indicate vowel quality. This is best known from Old Thai orthography, but also appeared in Angkorian Khmer, which is older.

Now these two points are interesting… they remind me strongly of how consonant-doubling gets used with the Latin script (especially in Germanic). What vowel quality difference does consonant-doubling achieve in Thai?

masako wrote: ↑Mon Jan 01, 2024 7:38 pm "There’s also scripts like Ethiopic and Tamil, where the vowel diacritics are irregular enough that it approaches a syllabary."

Ge'ez and Tamil are both abugidas, and another way of describing an abugida is "alphasyllabary". This synonym has been around for decades, so when you say "it approaches a syllabary" it just sounds redundant and kinda silly.

Alphasyllabaries and syllabaries are still different things, even if their names are similar. So I don’t think there’s any redundancy in saying that these systems have attributes of both.

(There’s also the fact that many abugidas are better analysed as having a null vowel grapheme, as Richard and I have been discussing. From that perspective, the term ‘alphasyllabary’ is perhaps a bit of a misnomer.)

bradrn · Post by **bradrn** » Tue Jan 02, 2024 7:57 am

[This was originally a reply to masako’s now-deleted post, but I think it has some useful points in it so I’ll just post it irrespective]

On this point:

bradrn wrote: ↑Sun Dec 31, 2023 10:42 pm There’s also scripts like Ethiopic and Tamil, where the vowel diacritics are irregular enough that it approaches a syllabary.

For maximum clarity, let me repeat my current working definitions again (as refined by discussion throughout the thread):

Abugidas are segmental scripts which write all consonants and vowels as single graphemes, except for one vowel which is null
Syllabaries are non-segmental scripts which write each syllable as a single fused grapheme

So: an ‘abugida approaching a syllabary’ is a writing system which can be analysed as writing C and V separately, but which also fuses them sufficiently that they could also be analysed as a single CV syllable. I think this well describes the situation of scripts like Ethiopic.

Of course, these definitions are not the only possible ones. I know that many people consider the essential component of abugida-hood to be the writing of C+V in a single ‘syllable block’ (as I’ve been calling it). By this standard, scripts like Lao would be abugidas, while Meroitic and 'Phags-pa would be alphabetic. And, indeed, by this standard, ‘abugidas approaching syllabaries’ would be a somewhat redundant phrase.

But I’ve come to believe that the definitions I gave above are more useful and better-motivated. This is mostly because they have a clear focus on function alone: the functional classification ‘alphabet/abugida/abjad’ becomes fully independent of the more formal one ‘has/lacks syllable blocks’, resulting in a more coherent typology. By contrast, when the definition incorporates formal criteria like syllable blocks, rare cases become more difficult to classify, and one ends up treating them more specially than they deserve. (This is why my earlier posts on Hangeul made it sound more unusual than it actually is: I was struggling with a bad classification. Hangeul is simply an alphabet with syllable blocks.)

Zompist Bboard Again

A guide to writing systems

Re: A guide to writing systems

Re: A guide to writing systems

Re: A guide to writing systems

Re: A guide to writing systems

Re: A guide to writing systems

Re: A guide to writing systems

Re: A guide to writing systems

Re: A guide to writing systems

Re: A guide to writing systems

Re: A guide to writing systems

Re: A guide to writing systems

Re: A guide to writing systems

Re: A guide to writing systems

Re: A guide to writing systems

Re: A guide to writing systems

Re: A guide to writing systems

Re: A guide to writing systems

Re: A guide to writing systems

Re: A guide to writing systems

Re: A guide to writing systems