Representing suprasegmentals in sound change appliers
Posted: Wed Jun 05, 2024 5:01 pm
(Apologies, this post ended up a bit rambly… hope it’s readable!)
The problem described in the title is one I’ve been pondering for a very long time: how does one make a sound change applier which can simulate sound changes involving suprasegmentals? One of my original goals with Brassica was to solve this, but I never did find a way to do so. This problem has taken on new urgency with the new Index Diachronica project, so perhaps it’s time to write up my thoughts and ask if anyone else has got any ideas.
The first thing to establish is that this is actually not one problem, but two:
Of course, these two go together quite often (e.g. in sound changes involving stress or tone movement). But this is not always the case. It’s quite easy to find examples of phonemic features which are autosegmental but not suprasegmental: for instance, long-distance consonant harmony effects, as in Guaraní nasal harmony. The opposite seems rarer, but e.g. tones in Mandarin Chinese are suprasegmental features which don’t seem to participate in many autosegmental processes.
So then, how does one go about simulating such things in a sound change applier?
Suprasegmentals are easier, because they’re not really a fundamental problem. One can quite easily ‘attach’ them (so to speak) to defined segments, e.g. the syllable nucleus. Then, instead of writing sound changes like (say) e → o / _ u, one can instead write [e é è] → [o ó ò] / _ [u ú ù], and so on.
The problem is one of ergonomics: doing this is an utter pain. The seemingly obvious solution is therefore to get the SCA to do it for you: automatically interpret e as [e é è], and similarly for all the other suprasegmental features. This is what Brassica does, and to a certain extent it actually does work.
Alas, with this approach it’s non-obvious when a certain character triggers a category rather than an individual grapheme, and that makes it thoroughly confusing to use. (I’ve had at least one ‘bug report’ which turned out to be a non-obvious error of this type.) It seems clear to me that the ‘proper’ way to solve this problem would be to represent feature tags separately from the grapheme itself in some way. But I’m really not sure what the best approach would be to do that. And it has its own issues which need to be solved… like, how does the user see the features associated with a grapheme? How are features transferred from input to output? And so on. I’m sure there’s some nice way to do it, but I’m not yet sure what it is.
That said, on to the next problem: autosegmentals. This problem is less obvious, but more fundamental. Put simply, most (if not all) SCAs operate on individual segments next to each other. This is fine most of the time, but makes ‘long-distance’ changes very, very difficult.
There’s varying ways to work around this problem, of course. Brassica has a few characters which can repeat or skip graphemes, such that rules can match varying amounts of material. Lexurgy has (IIRC) a way to ‘filter out’ characters, such that a sound change applies e.g. only to the vowels of a word.
But really, those are just workarounds. I know of no sound change applier in which one can directly specify, for instance, ‘apply this rule if there are any nasals in the word’, or ‘move each tone one syllable to the right’. Sometimes these can be simulated via indirect means, but it’s complicated and obscures the sound change which is actually being applied. And, unlike the previous problem, this one is fundamental: it means there are certain sound changes which cannot be specified at all.
(Note that this is a practical problem, not just a theoretical problem. Some weeks ago a linguist asked me if I could use Brassica to run some reversed sound changes for him. I managed to do almost everything he wanted, except one sound change: that one was a long-distance change involving word-level nasalisation, which I’m pretty sure cannot be expressed in the current version of Brassica at all.)
The one thing I’m aware of which approaches a solution to this problem is an idea summarised in a presentation by Tresoldi (2020, slide 28). The details are slightly unclear because his references appear inaccessible, but as I understand it, his suggestion is to maintain automatically computed feature tiers for each segment, in order to encode global and suprasegmental information. Thus, one can have a tier nasal_in_word, and simply apply a sound change when this tier is True on any particular segment.
However, it’s surely relevant here that his SCA is embedded in Python. I’m not sure if there’s any way to encode these sorts of computations without ending up with a full-blown programming language — which is precisely what I wanted to avoid in the first place by creating special sound-change syntax. If there does exist any way to do that, I think it would be my favoured solution.
So… those are my thoughts on the topic. Does anyone else have any thoughts? Ideas? Suggestions? I would be eager to hear, if so!
The problem described in the title is one I’ve been pondering for a very long time: how does one make a sound change applier which can simulate sound changes involving suprasegmentals? One of my original goals with Brassica was to solve this, but I never did find a way to do so. This problem has taken on new urgency with the new Index Diachronica project, so perhaps it’s time to write up my thoughts and ask if anyone else has got any ideas.
The first thing to establish is that this is actually not one problem, but two:
- Suprasegmentals: what is an ergonomic way to represent phonemic features which exist beyond the level of individual segments?
- Autosegmentals: how does one describe sound changes which take place at a ‘longer range’ than individual neighbouring segments?
Of course, these two go together quite often (e.g. in sound changes involving stress or tone movement). But this is not always the case. It’s quite easy to find examples of phonemic features which are autosegmental but not suprasegmental: for instance, long-distance consonant harmony effects, as in Guaraní nasal harmony. The opposite seems rarer, but e.g. tones in Mandarin Chinese are suprasegmental features which don’t seem to participate in many autosegmental processes.
So then, how does one go about simulating such things in a sound change applier?
Suprasegmentals are easier, because they’re not really a fundamental problem. One can quite easily ‘attach’ them (so to speak) to defined segments, e.g. the syllable nucleus. Then, instead of writing sound changes like (say) e → o / _ u, one can instead write [e é è] → [o ó ò] / _ [u ú ù], and so on.
The problem is one of ergonomics: doing this is an utter pain. The seemingly obvious solution is therefore to get the SCA to do it for you: automatically interpret e as [e é è], and similarly for all the other suprasegmental features. This is what Brassica does, and to a certain extent it actually does work.
Alas, with this approach it’s non-obvious when a certain character triggers a category rather than an individual grapheme, and that makes it thoroughly confusing to use. (I’ve had at least one ‘bug report’ which turned out to be a non-obvious error of this type.) It seems clear to me that the ‘proper’ way to solve this problem would be to represent feature tags separately from the grapheme itself in some way. But I’m really not sure what the best approach would be to do that. And it has its own issues which need to be solved… like, how does the user see the features associated with a grapheme? How are features transferred from input to output? And so on. I’m sure there’s some nice way to do it, but I’m not yet sure what it is.
That said, on to the next problem: autosegmentals. This problem is less obvious, but more fundamental. Put simply, most (if not all) SCAs operate on individual segments next to each other. This is fine most of the time, but makes ‘long-distance’ changes very, very difficult.
There’s varying ways to work around this problem, of course. Brassica has a few characters which can repeat or skip graphemes, such that rules can match varying amounts of material. Lexurgy has (IIRC) a way to ‘filter out’ characters, such that a sound change applies e.g. only to the vowels of a word.
But really, those are just workarounds. I know of no sound change applier in which one can directly specify, for instance, ‘apply this rule if there are any nasals in the word’, or ‘move each tone one syllable to the right’. Sometimes these can be simulated via indirect means, but it’s complicated and obscures the sound change which is actually being applied. And, unlike the previous problem, this one is fundamental: it means there are certain sound changes which cannot be specified at all.
(Note that this is a practical problem, not just a theoretical problem. Some weeks ago a linguist asked me if I could use Brassica to run some reversed sound changes for him. I managed to do almost everything he wanted, except one sound change: that one was a long-distance change involving word-level nasalisation, which I’m pretty sure cannot be expressed in the current version of Brassica at all.)
The one thing I’m aware of which approaches a solution to this problem is an idea summarised in a presentation by Tresoldi (2020, slide 28). The details are slightly unclear because his references appear inaccessible, but as I understand it, his suggestion is to maintain automatically computed feature tiers for each segment, in order to encode global and suprasegmental information. Thus, one can have a tier nasal_in_word, and simply apply a sound change when this tier is True on any particular segment.
However, it’s surely relevant here that his SCA is embedded in Python. I’m not sure if there’s any way to encode these sorts of computations without ending up with a full-blown programming language — which is precisely what I wanted to avoid in the first place by creating special sound-change syntax. If there does exist any way to do that, I think it would be my favoured solution.
So… those are my thoughts on the topic. Does anyone else have any thoughts? Ideas? Suggestions? I would be eager to hear, if so!