Representing suprasegmentals in sound change appliers

bradrn · Post by **bradrn** » Wed Jun 05, 2024 5:01 pm

(Apologies, this post ended up a bit rambly… hope it’s readable!)

The problem described in the title is one I’ve been pondering for a very long time: how does one make a sound change applier which can simulate sound changes involving suprasegmentals? One of my original goals with Brassica was to solve this, but I never did find a way to do so. This problem has taken on new urgency with the new Index Diachronica project, so perhaps it’s time to write up my thoughts and ask if anyone else has got any ideas.

The first thing to establish is that this is actually not one problem, but two:

Suprasegmentals: what is an ergonomic way to represent phonemic features which exist beyond the level of individual segments?
Autosegmentals: how does one describe sound changes which take place at a ‘longer range’ than individual neighbouring segments?

(Yes, I’m using these terms in a slightly different way to usual, but it’s close enough.)

Of course, these two go together quite often (e.g. in sound changes involving stress or tone movement). But this is not always the case. It’s quite easy to find examples of phonemic features which are autosegmental but not suprasegmental: for instance, long-distance consonant harmony effects, as in Guaraní nasal harmony. The opposite seems rarer, but e.g. tones in Mandarin Chinese are suprasegmental features which don’t seem to participate in many autosegmental processes.

So then, how does one go about simulating such things in a sound change applier?

Suprasegmentals are easier, because they’re not really a fundamental problem. One can quite easily ‘attach’ them (so to speak) to defined segments, e.g. the syllable nucleus. Then, instead of writing sound changes like (say) e → o / _ u, one can instead write [e é è] → [o ó ò] / _ [u ú ù], and so on.

The problem is one of ergonomics: doing this is an utter pain. The seemingly obvious solution is therefore to get the SCA to do it for you: automatically interpret e as [e é è], and similarly for all the other suprasegmental features. This is what Brassica does, and to a certain extent it actually does work.

Alas, with this approach it’s non-obvious when a certain character triggers a category rather than an individual grapheme, and that makes it thoroughly confusing to use. (I’ve had at least one ‘bug report’ which turned out to be a non-obvious error of this type.) It seems clear to me that the ‘proper’ way to solve this problem would be to represent feature tags separately from the grapheme itself in some way. But I’m really not sure what the best approach would be to do that. And it has its own issues which need to be solved… like, how does the user see the features associated with a grapheme? How are features transferred from input to output? And so on. I’m sure there’s some nice way to do it, but I’m not yet sure what it is.

That said, on to the next problem: autosegmentals. This problem is less obvious, but more fundamental. Put simply, most (if not all) SCAs operate on individual segments next to each other. This is fine most of the time, but makes ‘long-distance’ changes very, very difficult.

There’s varying ways to work around this problem, of course. Brassica has a few characters which can repeat or skip graphemes, such that rules can match varying amounts of material. Lexurgy has (IIRC) a way to ‘filter out’ characters, such that a sound change applies e.g. only to the vowels of a word.

But really, those are just workarounds. I know of no sound change applier in which one can directly specify, for instance, ‘apply this rule if there are any nasals in the word’, or ‘move each tone one syllable to the right’. Sometimes these can be simulated via indirect means, but it’s complicated and obscures the sound change which is actually being applied. And, unlike the previous problem, this one is fundamental: it means there are certain sound changes which cannot be specified at all.

(Note that this is a practical problem, not just a theoretical problem. Some weeks ago a linguist asked me if I could use Brassica to run some reversed sound changes for him. I managed to do almost everything he wanted, except one sound change: that one was a long-distance change involving word-level nasalisation, which I’m pretty sure cannot be expressed in the current version of Brassica at all.)

The one thing I’m aware of which approaches a solution to this problem is an idea summarised in a presentation by Tresoldi (2020, slide 28). The details are slightly unclear because his references appear inaccessible, but as I understand it, his suggestion is to maintain automatically computed feature tiers for each segment, in order to encode global and suprasegmental information. Thus, one can have a tier nasal_in_word, and simply apply a sound change when this tier is True on any particular segment.

However, it’s surely relevant here that his SCA is embedded in Python. I’m not sure if there’s any way to encode these sorts of computations without ending up with a full-blown programming language — which is precisely what I wanted to avoid in the first place by creating special sound-change syntax. If there does exist any way to do that, I think it would be my favoured solution.

So… those are my thoughts on the topic. Does anyone else have any thoughts? Ideas? Suggestions? I would be eager to hear, if so!

Post by **zompist** » Wed Jun 05, 2024 6:32 pm

bradrn wrote: ↑Wed Jun 05, 2024 5:01 pmThis is what Brassica does, and to a certain extent it actually does work.

Hmm, my offhand reaction is that isn't much less tedious than defining separate categories (but this is a matter of taste). Making "a" have two meanings (category or simple vowel) seems to me to invite errors.

Do any SCAs know features inherently? E.g. they just know that eéè are related, or that nñŋ are all nasals?

But really, those are just workarounds. I know of no sound change applier in which one can directly specify, for instance, ‘apply this rule if there are any nasals in the word’,

That doesn't seem hard... define some syntax that means "iterate on all segments". Let's say it's €. So you can have a condition €[+Nasal] or €N or something. This is probably not general enough, though.

or ‘move each tone one syllable to the right’.

That sounds tricky. I expect this sort of thing is going to turn rules into algorithms. I ran into this problem trying to make a robust generative grammar gadget: either you make the base program know an awful lot about linguistics (so it can interpret very abstract rules like the above), or you implement a mini-programming language.

(I started writing rules for one before remembering that "syllable" isn't a universally defined concept...)

bradrn · Post by **bradrn** » Wed Jun 05, 2024 6:54 pm

zompist wrote: ↑Wed Jun 05, 2024 6:32 pm Making "a" have two meanings (category or simple vowel) seems to me to invite errors.

And indeed, this is precisely the problem! (Maybe I didn’t emphasise that point enough.)

Do any SCAs know features inherently? E.g. they just know that eéè are related, or that nñŋ are all nasals?

Yes, there are many which do allow for this sort of thing. (I already linked Lexurgy, which is one of them.) The problem is that you end up with a very long list of repetitive feature declarations — so it doesn’t get rid of the annoyance, it just moves it elsewhere.

But really, those are just workarounds. I know of no sound change applier in which one can directly specify, for instance, ‘apply this rule if there are any nasals in the word’,
That doesn't seem hard... define some syntax that means "iterate on all segments". Let's say it's €. So you can have a condition €[+Nasal] or €N or something. This is probably not general enough, though.

Yes, generality is one issue here. Another is: precisely how is it interpreted? Every other lexeme has semantics, ‘match some number of characters starting at this point in the input stream’, but this would not.

or ‘move each tone one syllable to the right’.
That sounds tricky. I expect this sort of thing is going to turn rules into algorithms. I ran into this problem trying to make a robust generative grammar gadget: either you make the base program know an awful lot about linguistics (so it can interpret very abstract rules like the above), or you implement a mini-programming language.

Well, the thing is, natural languages are generally quite constrained in what they allow. In papers, the vast majority of phonological changes can be fully described in less than one line (usually much less). And even then a lot of changes are disallowed — as I recall, even tonal movement by two syllables is quite rare. So this gives me hope that some ergonomic solution can be found which is less than fully general.

(Generative grammar, I don’t know so much about. I suspect the rules there can get more complicated, because they need to operate on trees rather than lists.)

(I started writing rules for one before remembering that "syllable" isn't a universally defined concept...)

Yeah, it would be nice to avoid relying on such concepts.

Ketsuban · Post by **Ketsuban** » Thu Jun 06, 2024 4:06 am

It wouldn't work for all cases (e.g. languages where the tone-bearing unit is the mora rather than the syllable, languages that go all in on syllabic consonants) but I feel like it ought to be possible to define what a syllable is so the machine can automatically syllabify a word from its representation (maybe with two alternates for ambiguities like Latin volucris) and sound change rules can refer to properties of a preceding or following syllable.

bradrn · Post by **bradrn** » Thu Jun 06, 2024 4:52 am

Ketsuban wrote: ↑Thu Jun 06, 2024 4:06 am It wouldn't work for all cases (e.g. languages where the tone-bearing unit is the mora rather than the syllable, languages that go all in on syllabic consonants) but I feel like it ought to be possible to define what a syllable is so the machine can automatically syllabify a word from its representation (maybe with two alternates for ambiguities like Latin volucris) and sound change rules can refer to properties of a preceding or following syllable.

In fact, this was the very first approach I tried when I started to make Brassica. Unfortunately, it runs into a lot of non-obvious issues. For instance: what happens when sound changes alter syllable structure? Or: how does one parse sounds into syllables? (A particularly difficult problem if you allow ambiguity.) Or, for that matter, how does it cope with languages which don’t have syllables in the usual sense? I’m not sure that there’s any solution to these problems which doesn’t severely impact usability.

Ketsuban · Post by **Ketsuban** » Thu Jun 06, 2024 9:49 am

I should have been more explicit that the user supplies the definition of a syllable for their language if there is one. If there isn't one then you can't syllabify words automatically (and thus can't apply things like stress rules which rely on being able to syllabify words automatically) and sound changes which rely on syllables being defined don't run.

bradrn · Post by **bradrn** » Thu Jun 06, 2024 9:55 am

Ketsuban wrote: ↑Thu Jun 06, 2024 9:49 am I should have been more explicit that the user supplies the definition of a syllable for their language if there is one. If there isn't one then you can't syllabify words automatically (and thus can't apply things like stress rules which rely on being able to syllabify words automatically) and sound changes which rely on syllables being defined don't run.

I assumed that the user would define the notion of ‘syllable’ in any case. It doesn’t make the problems go away — if anything, it adds to them, since now you need to add a sublanguage which can cope with all the different varieties of syllables.

alice · Post by **alice** » Thu Jun 06, 2024 3:17 pm

In case this is interesting and/or useful: my SCA distinguishes between "base tokens" and "modifiers" and provides a few ways to match and replace combinations thereof. So you could define some base tokens for vowels and some modifiers to represent tones, and do things like:

Code: Select all

a! > e! _     # convert all /a/ to /e/, keeping tones
a&12/3 a/1 _  # remove tone '1' from all /a/ which also have tone 2

assuming, of course, that your language features languages where vowels can take several tones, but the principle is what's important.

Or, you can work at the level of features:

Code: Select all

V[+rising] V[+level] _  # convert rising tones to level
V[f=2] V[f=1]           # convert value '2' of feature 'f' to '1'; more abstract

More generally, this leads to the question of exactly how to represent a "phoneme" in an SCA, and aat what level of phonetic detail. Those of you with long memories may recall me asking several questions about this a while ago. We're just waiting for someone to define the definitive set of phonological features which apply to all known or possible languages.

Post by **zompist** » Thu Jun 06, 2024 4:28 pm

bradrn wrote: ↑Wed Jun 05, 2024 6:54 pm
zompist wrote: ↑Wed Jun 05, 2024 6:32 pm
Do any SCAs know features inherently? E.g. they just know that eéè are related, or that nñŋ are all nasals?
Yes, there are many which do allow for this sort of thing. (I already linked Lexurgy, which is one of them.) The problem is that you end up with a very long list of repetitive feature declarations — so it doesn’t get rid of the annoyance, it just moves it elsewhere.

I looked at Lexurgy's documentation, and so far as I can see you have to define your features explicitly. That's a lot of work, especially if you happen to need a particular feature for just one rule— it is then easier to do something ad hoc.

What I meant was, do any SCAs come with a set of feature definitions? That is, the program itself knows about e's, and nasals, and so on.

This doesn't seem that hard to do— I mean, looking at a standard IPA chart, you have maybe four dozen features. (And that's without trying to do anything clever to reduce the number.)

Obviously the user should be able to redefine features, or define their own.

You could even have the default behavior be to ignore rules that result in nonexistent phonemes. E.g. if you don't have /ʒ/, then rule that voices intervocalic consonants wouldn't apply to /ʃ/, without having to state an exception. (That would mean you have to define what phonemes exist, but that's probably not a big ask, since we need to do that anyway.)

In general I like the idea of tokens having features, without having to define the features, or use categories to do so.

Probably where it all falls apart is notation.

Do people want to be concise, or readable? Maybe ideally you can have both.

bradrn · Post by **bradrn** » Thu Jun 06, 2024 6:05 pm

alice wrote: ↑Thu Jun 06, 2024 3:17 pm In case this is interesting and/or useful: my SCA distinguishes between "base tokens" and "modifiers" and provides a few ways to match and replace combinations thereof.

This is the kind of thing I was looking for! Some questions:

In practice, how many / which kinds of issues do you run into?
What happens if you specify a phoneme without any modifiers at all? Is the bang necessary?
Can a phoneme have multiple modifiers?
Precisely how are ‘modifiers’ related to ‘features’?
Does this go any way towards solving the problem of autosegmental or long-distance sound changes?

More generally, this leads to the question of exactly how to represent a "phoneme" in an SCA, and aat what level of phonetic detail. Those of you with long memories may recall me asking several questions about this a while ago. We're just waiting for someone to define the definitive set of phonological features which apply to all known or possible languages.

I’ve come to feel that SCAs exist on a scale in this regard. On the one hand, you can define everything very rigorously in terms of well-defined linguistic features, using all the latest theories and so on. (I know one person who tried to make an SCA using feature trees.) On the other, you can ignore the linguistics and treat it as simply a search/replace algorithm on lists of various kinds.

Brassica tends very strongly towards the latter approach. I’ve been trying to keep it as atheoretical as possible, to increase its generality. I think this is a good approach, and want to continue on like this. It’s proven particularly useful for the Index Diachronica project: it lets us represent all kinds of sound changes from all kinds of different papers, without needing to alter their chosen transcription conventions.

zompist wrote: ↑Thu Jun 06, 2024 4:28 pm I looked at Lexurgy's documentation, and so far as I can see you have to define your features explicitly. That's a lot of work, especially if you happen to need a particular feature for just one rule— it is then easier to do something ad hoc.

What I meant was, do any SCAs come with a set of feature definitions? That is, the program itself knows about e's, and nasals, and so on.

Yes, this approach is quite common as well. (Indeed, I believe alice’s SCA is one which works like this.) It’s not particularly difficult — it’s just a matter of supplying a long list of default features, rather than making the user input them all.

The tradeoff is that it greatly limits flexibility. Essentially, you end up needing to do everything in IPA (or whichever transcription convention it’s predefined), using only the defined features. Like I said above, this would be very unsuitable for Brassica, especially if I want to keep on using it as the basis for the new ID.

alice · Post by **alice** » Fri Jun 07, 2024 2:57 pm

bradrn wrote: ↑Thu Jun 06, 2024 6:05 pm

In practice, how many / which kinds of issues do you run into?

What happens if you specify a phoneme without any modifiers at all? Is the bang necessary?

Can a phoneme have multiple modifiers?

Precisely how are ‘modifiers’ related to ‘features’?

Does this go any way towards solving the problem of autosegmental or long-distance sound changes?

Ask me again when I've actually used it For reasons too boring to go into here I haven't done much with it; I was hoping other users would provide me with feedback. There are trivial details like keeping the modifiers in the correct order and avoiding incompatible modifiers, plus the syntax is a bit awkward.
A phoneme without modifiers is just that. The pling (that's what *I* call it) in the replacement part means "keep whatever modifiers it currently has", and elsewhere it means "with any modifiers or none".
Yes; you can do things like "kʷʰ", which some people might find useful occasionally.
At a low level, when you define a modifier you specify a feature and value within the values allowable for that feature. Adding a feature to a base phoneme then effectively assigns the value of the modifier to the phoneme.
After all this, no; but my SCA would handle these with the "match anything" and repeat specifiers; "a e _.*e" means "change /a/ to /e/ if another /e/ follows".

bradrn wrote: ↑Thu Jun 06, 2024 6:05 pm I’ve come to feel that SCAs exist on a scale in this regard. On the one hand, you can define everything very rigorously in terms of well-defined linguistic features, using all the latest theories and so on. (I know one person who tried to make an SCA using feature trees.)

You can make that two!

bradrn wrote: ↑Thu Jun 06, 2024 6:05 pm On the other, you can ignore the linguistics and treat it as simply a search/replace algorithm on lists of various kinds.

I've gone back and forth between the two many times. My SCA is probably best regarded as a glorified search-and-replace, with linguistics-oriented extras.

bradrn wrote: ↑Thu Jun 06, 2024 6:05 pm Yes, this approach is quite common as well. (Indeed, I believe alice’s SCA is one which works like this.) It’s not particularly difficult — it’s just a matter of supplying a long list of default features, rather than making the user input them all.

"works like this" is giving it too much credit; it comes with a default set of settings which make it appear to do so.

bradrn wrote: ↑Thu Jun 06, 2024 6:05 pm The tradeoff is that it greatly limits flexibility. Essentially, you end up needing to do everything in IPA (or whichever transcription convention it’s predefined), using only the defined features.

Or, you can do what I did and make it configurable enough to keep everyone happy

bradrn · Post by **bradrn** » Fri Jun 07, 2024 6:47 pm

alice wrote: ↑Fri Jun 07, 2024 2:57 pm 2. A phoneme without modifiers is just that. The pling (that's what *I* call it) in the replacement part means "keep whatever modifiers it currently has", and elsewhere it means "with any modifiers or none".

Hmm… so simply writing, say, a > e isn’t enough to match á, à, etc. I can see the reasoning behind this, but it means this isn’t really the solution I’m looking for.

3. Yes; you can do things like "kʷʰ", which some people might find useful occasionally.
4. At a low level, when you define a modifier you specify a feature and value within the values allowable for that feature. Adding a feature to a base phoneme then effectively assigns the value of the modifier to the phoneme.

From this, it sounds like your ‘modifiers’ aren’t necessarily suprasegmental at all — they’re just a convenient way to ease the specification of some repetitive features.

5. After all this, no; but my SCA would handle these with the "match anything" and repeat specifiers; "a e _.*e" means "change /a/ to /e/ if another /e/ follows".

OK, so this is very similar to what Brassica does. I’ve already mentioned why I don’t consider this a complete solution.

bradrn wrote: ↑Thu Jun 06, 2024 6:05 pm The tradeoff is that it greatly limits flexibility. Essentially, you end up needing to do everything in IPA (or whichever transcription convention it’s predefined), using only the defined features.
Or, you can do what I did and make it configurable enough to keep everyone happy

There’s limits, though: I’d be unhappy even with the configurability. Needing to write down features for every single phoneme is too annoying, and using a predefined feature set is too constraining.

Man in Space · Post by **Man in Space** » Sat Jun 08, 2024 7:04 pm

zompist wrote: ↑Thu Jun 06, 2024 4:28 pmWhat I meant was, do any SCAs come with a set of feature definitions? That is, the program itself knows about e's, and nasals, and so on.

phonix (by J.S. Bangs) does, though I find its implementation to be a little disingenuous.

alice · Post by **alice** » Sun Jun 09, 2024 2:15 pm

bradrn wrote: ↑Fri Jun 07, 2024 6:47 pm

bradrn wrote: ↑Thu Jun 06, 2024 6:05 pm The tradeoff is that it greatly limits flexibility. Essentially, you end up needing to do everything in IPA (or whichever transcription convention it’s predefined), using only the defined features.
Or, you can do what I did and make it configurable enough to keep everyone happy
There’s limits, though: I’d be unhappy even with the configurability. Needing to write down features for every single phoneme is too annoying, and using a predefined feature set is too constraining.

Ideally, of course, someone will eventually sort everything out, and we'l have definitive sets of phonemes and features. Although by that time advances in "AI" and brain-technology interfaces will mean that we'll be able to model arbitrarily complex sound-changes by the power of thought alone.

methor · Post by **methor** » Tue Jun 11, 2024 11:31 pm

bradrn wrote: ↑Wed Jun 05, 2024 5:01 pm But really, those are just workarounds. I know of no sound change applier in which one can directly specify, for instance, ‘apply this rule if there are any nasals in the word’, or ‘move each tone one syllable to the right’. Sometimes these can be simulated via indirect means, but it’s complicated and obscures the sound change which is actually being applied. And, unlike the previous problem, this one is fundamental: it means there are certain sound changes which cannot be specified at all.

I must be misunderstanding you because both of those are easy in Lexugy. The first is the environment

// $ [-nasal]* _ [-nasal]* $

And the second is the rule

shift-tone rtl:
<syl> => [$tone] / <syl>&[$tone] _

bradrn · Post by **bradrn** » Wed Jun 12, 2024 4:20 am

methor wrote: ↑Tue Jun 11, 2024 11:31 pm I must be misunderstanding you because both of those are easy in Lexugy.

This is the kind of thing I was interested in knowing, thanks!

The first is the environment
// $ [-nasal]* _ [-nasal]* $

I’m unfamiliar with Lexurgy, so I’m not sure I understand how this works. To me, it seems that this environment would match if the word consists only of non-nasals before and after the target. This seems like the opposite of the desired rule, which is ‘apply if at least one nasal exists anywhere in the word’.

And the second is the rule
shift-tone rtl:
<syl> => [$tone] / <syl>&[$tone] _

OK, this is very interesting… it looks like the key is being able to match features between the environment and the replacement, correct? That seems like a very useful capability in general, and one which I could easily implement in Brassica.

As for the issue of suprasegmentals, it would appear that Lexurgy implements this using automated syllabification plus syllable-level features — that is, it uses the same design I originally intended for Brassica. I’ve already explained why I decided against this approach. (In short, because it isn’t general enough, and because I want to stay as far away as possible from such theoretical notions.)

bradrn · Post by **bradrn** » Sat Jun 15, 2024 8:24 pm

I’ve been thinking a bit more about this problem of ‘long-range’ sound changes, and I think the first step should be to make a taxonomy of the difficult cases. Here’s what I have so far:

Harmony: changes where phonemes must agree in a certain feature to the left or right of a source.
Examples: ‘all vowels agree in roundedness with the first vowel’; ‘consonants before a nasal must be nasalised, until a voiceless stop is reached’
Footing and stress: changes where a word is divided into segments using a pattern
Examples: ‘stress every second syllable, with the last instance taking primary stress’; ‘stress every first syllable restarting at long vowels’
Obligatoriness and culminativity: changes which must ensure a word contains at least / at most one instance of something.
Examples: ‘delete all high tones after the first’; ‘add an ⟨n⟩ after at least one nasal vowel’ (a baroque but actual case I had to deal with)
Autosegments: many-to-many (or one-to-many, etc.) relationships between features and phonemes.
Examples: ‘spread each tone to the right forming contours’, ‘convert consecutive HH autosegments to HL’ (the ‘Obligatory Contour Principle’)

Now, there’s probably not much that can be done for the last item on this list. In a segmental SCA, there’s simply no room for autosegments: t gives no obvious way to say, for instance, ‘these two acute accents represent a single H tone’. On the other hand, I suspect that this won’t be a huge problem in practise, and that in almost all cases the sound change can be rephrased in segmental terms. (I find a similar thing happens with ostensibly syllabification-sensitive sound changes.)

The other cases are less easily dismissable. They appear quite often, and they’re difficult to rephrase in other terms. Sometimes it’s possible, but it often ends up depending on internal details of the SCA algorithm, which is less than ideal and makes them difficult to write.

On reflection, I think I can identify the commonality between these sound changes. It is that they start at some reference point which is not modified, and then modify things around that point depending on it. This is precisely the opposite of how we usually specify sound changes, which is by specifying what is modified and then qualifying that with conditions.

Thus, for instance, take an example from the top of this thread: ‘delete all high tones after the first’. This is phrased in terms of what is not modified, which is the first high tone. It can be rephrased equivalently in terms of what is modified: ‘delete a high tone if it is preceded by another high tone’. But that feels like a much less natural formulation.

Or consider: ‘consonants before a nasal must be nasalised, until a voiceless stop is reached’. It can be rephrased: ‘nasalise every consonant which is followed by a sequence of non-voiceless-stops and then a nasal’. But again, that’s much less natural.

So perhaps what is needed is not just a few new symbols for different types of matching, but rather an alternative type of sound change. I’m not sure precisely what that would look like, though. I’ll need to continue thinking about it.

A complication is that many of these sound changes are also bound up with iteration, which is difficult with most if not all current SCAs. Indeed, iteration is probably the main factor for footing and stress rules. But it’s also present in some of the other sound changes I’ve listed — in some way it may even be relevant to all of them. I’m not sure precisely how this affects the situation, and it will have to be another thing for me to continue thinking about.

Post by **zompist** » Sun Jun 16, 2024 12:20 am

bradrn wrote: ↑Sat Jun 15, 2024 8:24 pm On reflection, I think I can identify the commonality between these sound changes. It is that they start at some reference point which is not modified, and then modify things around that point depending on it. This is precisely the opposite of how we usually specify sound changes, which is by specifying what is modified and then qualifying that with conditions.

That's a good observation. However, I think we're not so much replacing the idea of the location to be changed, as broadening it.

E.g. we have a rule where F changes to G after A and before B:

F/G/A_B

Your rules are like an expansion of the _. We can change any series of tokens within that environment. I'll represent this * (a combination of the usual "you are here" underline with the * for an arbitrary string):

F/B/A*B

Obviously I'm using the notation I'm familiar with; the point is the same if you specify e.g. [+front] for F.

Going through your list:

Harmony: changes where phonemes must agree in a certain feature to the left or right of a source.
Examples: ‘all vowels agree in roundedness with the first vowel’; ‘consonants before a nasal must be nasalised, until a voiceless stop is reached’

F/B/B*
B/F/F*

C/N/*(P)…N

[*] Footing and stress: changes where a word is divided into segments using a pattern
Examples: ‘stress every second syllable, with the last instance taking primary stress’; ‘stress every first syllable restarting at long vowels’

I think you need a notation for this:
* first, a way to indicate the first or last vowel (or other nucleus), like $1 / $ℓ. ($ is often used to mark syllable boundaries.)
* then, a way to indicate offsets from this, like $ℓ[-2] for antepenult
* then, counting syllables, ugh. In this case you could get away with odd vs even syllables.

All complicated by questions of what is a syllable. Probably the user has to define this.

[*] Obligatoriness and culminativity: changes which must ensure a word contains at least / at most one instance of something.
Examples: ‘delete all high tones after the first’; ‘add an ⟨n⟩ after at least one nasal vowel’ (a baroque but actual case I had to deal with)

H/L/H*

I'm not sure I get the second one... where do you add the n?

[*] Autosegments: many-to-many (or one-to-many, etc.) relationships between features and phonemes.
Examples: ‘spread each tone to the right forming contours’, ‘convert consecutive HH autosegments to HL’ (the ‘Obligatory Contour Principle’)

tone($)/tone($[-1])/$*

H/L/H_ (I must be missing something on this one)

Obviously this is terribly ad hoc, but the idea is to look up the feature (here, tone) for each syllable. Even trickier, I assume (though I'm not sure) that you mean the original tones spread rightward. So you'd want to start at the right.

A complication is that many of these sound changes are also bound up with iteration, which is difficult with most if not all current SCAs. Indeed, iteration is probably the main factor for footing and stress rules. But it’s also present in some of the other sound changes I’ve listed — in some way it may even be relevant to all of them. I’m not sure precisely how this affects the situation, and it will have to be another thing for me to continue thinking about.

I'm not sure what you mean here... my SCA works by iterating through the word, and sometimes has to sub-iterate e.g. in a rule like AB/CD/_X.

bradrn · Post by **bradrn** » Sun Jun 16, 2024 7:20 am

zompist wrote: ↑Sun Jun 16, 2024 12:20 am Your rules are like an expansion of the _. We can change any series of tokens within that environment. I'll represent this * (a combination of the usual "you are here" underline with the * for an arbitrary string):

F/B/A*B

I’m not sure I understand exactly what this means. How is this different from a sound change rule F/B/A…*…B? (Not sure if SCA² supports wildcards in those positions, but at least Brassica does.)

But in any case, this isn’t actually a solution to the problem I have; see below for details.

Harmony: changes where phonemes must agree in a certain feature to the left or right of a source.
Examples: ‘all vowels agree in roundedness with the first vowel’; ‘consonants before a nasal must be nasalised, until a voiceless stop is reached’
F/B/B*
B/F/F*

C/N/*(P)…N

Unless I’m misunderstanding the notation, this last change is misimplemented. Assuming that (P) is an optional P like in SCA², this would nasalise consonants even when followed by a voiceless stop — precisely the opposite of what is desired.

What is really needed here is a wildcard which can match everything except a certain class of sounds. In Brassica, this would be something like [C V -VlStop]*. The sound change would then be, ‘nasalise consonants when followed by any number of non-voiceless-stops, followed by a nasal’.

The ultimate problem here is that, like I said, this isn’t a natural way of thinking about these sound changes. Instead of directly saying, ‘stop applying the sound change at a voiceless stop’, one must negate it: ‘apply when followed by non-voiceless-stops’. Your notation makes it easier to write some sound changes, but it doesn’t actually solve the problem of needing to think in this indirect way.

[*] Footing and stress: changes where a word is divided into segments using a pattern
Examples: ‘stress every second syllable, with the last instance taking primary stress’; ‘stress every first syllable restarting at long vowels’
I think you need a notation for this:
* first, a way to indicate the first or last vowel (or other nucleus), like $1 / $ℓ. ($ is often used to mark syllable boundaries.)
* then, a way to indicate offsets from this, like $ℓ[-2] for antepenult
* then, counting syllables, ugh. In this case you could get away with odd vs even syllables.

All complicated by questions of what is a syllable. Probably the user has to define this.

Well… as it happens, you don’t actually need syllables for this. You just need to know what the syllable nucleus is, and mark stress on that. The precise syllable boundaries don’t usually matter (and when they do, they’re predictable).

Once you realise that, it becomes possible (though not necessarily easy) to implement most stress rules using existing machinery. Here’s the footing and stress rules (in Brassica) for my current language:

Code: Select all

/ | / _ C V C #
/ | / _ C Vl #
/ | / _ C V (C) C Vs #
/ | / _ C V C |
/ | / _ C Vl |
/ | / _ C V (C) C Vs |
/ | / _ C V C | // | _
/ | / _ C Vl | // | _
/ | / _ C V (C) C Vs | // | _
| C Unstr / C PriS
PriS / SndS / _ ^PriS

Here, | is a foot boundary, determined based on syllable weight. Then unstressed vowels are given primary or secondary stress depending on their location.

The big annoyance here is iteration: I need to manually repeat the rules three times, so this only works on words which have less than six morae. It would be much easier to say, ‘apply these rules RTL as many times as needed’. Brassica actually does have some capability to do this, although not enough for this case. I note that Lexurgy has a more capable directive, which I may steal if I can’t find any more principled way to do this.

I'm not sure I get the second one... where do you add the n?

This is quite a tricky one: it’s a reversed sound change taking one output to multiple inputs.

Some broader context might help here. I was asked by a linguist if Brassica is able to simulate some sound changes in reverse. I was able to simulate almost everything they wanted, but got stuck on this particular sound change. In the forwards direction, it nasalises all vowels in a word when the word contains at least one nasal. Therefore, in the reverse direction, it needs to produce all words where at least one nasal is introduced after at least one nasal vowel. That is, it can take Ṽ to both {Vn, V} if the word already contains at least one nasal sound, but Ṽ must turn into Vn if the word does not yet contain any nasals.

tone($)/tone($[-1])/$*

[…]

Obviously this is terribly ad hoc, but the idea is to look up the feature (here, tone) for each syllable. Even trickier, I assume (though I'm not sure) that you mean the original tones spread rightward. So you'd want to start at the right.

It’s not easy to know what $[-1] must do in the general case… but I really like what methor said about doing this in Lexurgy, which is that it can match stuff between the environment and the replacement. In hypothetical Brassica syntax, it would look something like -rtl V / @last V / @last V C* _. (As you say, this needs to be RTL, but that’s something Brassica can do.)

H/L/H_ (I must be missing something on this one)

What you’re missing is that one H tone can be attached to more than one vowel. So you could have a word which looks like /pátáká/, but is actually:

Code: Select all

pa ta ka
 |  | /
 H  H

In which case this sound change would take it to /pátaka/, not /pátaká/. You might call this contrived, but it does seem to be the case in a lot of the languages which use this Obligatory Contour Principle.

A complication is that many of these sound changes are also bound up with iteration, which is difficult with most if not all current SCAs. Indeed, iteration is probably the main factor for footing and stress rules. But it’s also present in some of the other sound changes I’ve listed — in some way it may even be relevant to all of them. I’m not sure precisely how this affects the situation, and it will have to be another thing for me to continue thinking about.
I'm not sure what you mean here... my SCA works by iterating through the word, and sometimes has to sub-iterate e.g. in a rule like AB/CD/_X.

The problem with this is that the iteration is implicit: in most cases, it’s an internal detail of the SCA algorithm. And that’s fine, because most sound changes don’t really care about how the iteration is applied; or if they do, they need only crude controls like RTL vs LTR application.

But these are sound changes where describing the iteration is a key part of the change itself. So, to implement them, you need to have a reasonably good understanding about the SCA internals, which isn’t ideal. And of course, sometimes the form of iteration implemented by the SCA isn’t suitable for the sound change at all.

For instance, consider a progressive vowel harmony rule. In Brassica, you might think to implement this as Vu / Vr / Vr C* _. But this actually doesn’t work: Brassica doesn’t (yet) allow overlapping rule applications, so this turns e.g. päteke into *pätëke, but can’t continue to use that output ⟨ë⟩ as input for further interactions.

In fact — and somewhat to my surprise — it turns out that this sound change is not expressible in Brassica at all. I need to do the iteration manually, by repeating the rule as many times as I can have syllables (similarly to the footing sound change above). But it requires a considerable understanding of Brassica’s internals to understand why this is necessary. It would be much easier to simply say, ‘repeat this sound change from the beginning to the end of the word’. But I’m not sure of the most general way to express such things.

bradrn · Post by **bradrn** » Mon Jun 17, 2024 1:20 pm

Having thought about it, I believe that zompist’s * notation actually does solve my problem! I’m not sure if my interpretation is precisely the same as what zompist meant, but it does indeed seem to cover all the cases.

To summarise the dea: take a rule target / replacement / LHS * RHS. This would look for spans delimited by LHS and RHS, and match any number of instances of the target which might be embedded in these spans (even if separated by other graphemes).

The application algorithm would be as follows, highlighting in bold where it’s different from the usual algorithm:

Start at the beginning of the word.
Search forward until the LHS matches.
Continue searching (possibly skipping graphemes) until the target or RHS matches:
- If the RHS matches, go to step 4.
- If the target matches, replace it with the replacement and repeat step 3.
Repeat steps 2–3 until reaching the end of the word.

(You’d probably want a RTL version of this too, but that should hopefully be a straightforward modification.)

This neatly accounts for almost every one of the sound changes I’ve suggested so far (except the autosegmental ones, which as mentioned are out of scope). And, what’s more, it does so in a way which agrees with how you’d naturally talk about them, which is important:

‘all vowels agree in roundedness with the first vowel’: Vunrounded / Vrounded / Vrounded * and vice versa
‘consonants before a nasal must be nasalised, until a voiceless stop is reached’: -rtl Vnonnas / Vnas / [Stop -Voiced] * Nasal
‘stress every second syllable’: C* V C* V C* / C* V C* V́ C* (doesn’t actually require iteration control, if the SCA has non-overlapping targets)
‘stress every first syllable restarting at long vowels’: Vlong / V́long, then C* V C* V C* / C* V́ C* V C* / * V́
Pre-Eŋes stress rule (code block in the last post): Vlong / V́long and V / V́ / _ C [C #], then -rtl C* V C* V C* / C* V́ C* V C* / V́ *
‘delete all high tones after the first’: V́ / V / V́ *
‘add an ⟨n⟩ after at least one nasal vowel’: this is the only sound change where this form doesn’t help. Instead, I think the best fix for this would be to add a notion of output constraints (as in rsca), in which case it would require something like forbid # [C -Nasl V]* Ṽ [C -Nasl V]* # (i.e., ‘forbid a word with a nasal vowel and no nasal consonants’).

This new kind of sound change could even be generalised further, by giving the environment its usual interpretation (being immediately adjacent to the target), and splitting off the ‘iteration bounds’ into its own section: something like target / replacement / LHS _ RHS / LHS * RHS. But I think this is an unnecessary complication.

So, if we take this problem as solved, the remaining problem is how to represent suprasegmentals. I’m starting to think that this problem is bound up with that of representing features more generally, which is an area where Brassica is less complete than I’d like. I’ll have to further ponder whether there’s any way of doing this which doesn’t require endless feature lists for every single phoneme.

Zompist Bboard Again

Representing suprasegmentals in sound change appliers

Representing suprasegmentals in sound change appliers

Re: Representing suprasegmentals in sound change appliers

Re: Representing suprasegmentals in sound change appliers

Re: Representing suprasegmentals in sound change appliers

Re: Representing suprasegmentals in sound change appliers

Re: Representing suprasegmentals in sound change appliers

Re: Representing suprasegmentals in sound change appliers

Re: Representing suprasegmentals in sound change appliers

Re: Representing suprasegmentals in sound change appliers

Re: Representing suprasegmentals in sound change appliers

Re: Representing suprasegmentals in sound change appliers

Re: Representing suprasegmentals in sound change appliers

Re: Representing suprasegmentals in sound change appliers

Re: Representing suprasegmentals in sound change appliers

Re: Representing suprasegmentals in sound change appliers

Re: Representing suprasegmentals in sound change appliers

Re: Representing suprasegmentals in sound change appliers

Re: Representing suprasegmentals in sound change appliers

Re: Representing suprasegmentals in sound change appliers

Re: Representing suprasegmentals in sound change appliers