The Index Diachronica

Ares Land · Post by **Ares Land** » Fri Nov 29, 2024 9:14 am

My first impression is that the interests of conlangers vs. actual linguists are completely at odds here

For actual linguistics, and the process of finding large-scale trends, the cognate sets approach would certainly be most useful.
As a conlanger, I think using reconstructions and sound changes, no matter how speculative, is most useful and cognate sets would leave me stumped.

As for coding by degree of confidence -- sure! but that would leave the 'directly attested' dataset with a heavy European bias, I believe.

Lērisama · Post by **Lērisama** » Fri Nov 29, 2024 10:25 am

I don't think the cognate set method is entirely empirical – the decision whether to consider a set cognate is also just the opinion of whoever wrote the paper. Admittedly it is often a trivial decision, but certainly not always¹, and it relies on the same correspondence patterns that reconstruction does (obviously) – the reconstruction is just a convenient way of writing it. I'd suggest both, including reconstructions and sound changes but also at least examples of the data that they are based on.

bradrn · Post by **bradrn** » Fri Nov 29, 2024 6:37 pm

Ares Land wrote: ↑Fri Nov 29, 2024 9:14 am My first impression is that the interests of conlangers vs. actual linguists are completely at odds here

I don’t necessarily think this was so (and I probably made it sound more binary than it is). I think we’re both interested in the two goals of (a) investigating large-scale trends, and (b) creating a reference resource for individual languages/families. From there it’s a matter of working out how to prioritise those goals, and what level of rigour to use — which are relevant problems for conlangers as much as linguists.

For actual linguistics, and the process of finding large-scale trends, the cognate sets approach would certainly be most useful.
As a conlanger, I think using reconstructions and sound changes, no matter how speculative, is most useful and cognate sets would leave me stumped.

I may have explained it poorly. (My excuse is that I was writing at 2am.) The idea is that you take those cognate sets and decompose them into synchronic phoneme correspondences. Thus, take a set like this one from Austronesian:

mata / moto / matəh / mtɔ / maka ‘eye’

Our approach would be to rely on the reconstruction (here PAN *mata), and write down the sound changes people have inferred from that: *a→o, *a→əh/_#, and so on. Alex’s approach would be to simply index the synchronic correspondences: thus you’d get something like m↔m, a↔o↔əh↔ɔ↔∅, t↔k. This gives you a lot less information, but preserves the most important data of which phonemes connect to each other. The advantages are, firstly, that it’s far easier to understand where the data comes from; and, secondly, that it’s much more generally applicable.

To be clear, I’m not necessarily in favour of this approach, at least not alone. I think a combined approach, with this plus reconstructed sound changes, could be very powerful. But this is something to debate.

Lērisama wrote: ↑Fri Nov 29, 2024 10:25 am I don't think the cognate set method is entirely empirical – the decision whether to consider a set cognate is also just the opinion of whoever wrote the paper. Admittedly it is often a trivial decision, but certainly not always¹, and it relies on the same correspondence patterns that reconstruction does (obviously) – the reconstruction is just a convenient way of writing it. I'd suggest both, including reconstructions and sound changes but also at least examples of the data that they are based on.

This is a very important point. I recall discussing something similar — that cognates are identified on the basis of their sound correspondences, so inferring the sound correspondences from the cognate sets runs a risk of circular reasoning. But is it better or worse than blindly trusting the judgement of the linguist who wrote down a sound change? I don’t know…

Zju · Post by **Zju** » Sat Nov 30, 2024 3:14 pm

Ares Land wrote: ↑Fri Nov 29, 2024 9:14 am My first impression is that the interests of conlangers vs. actual linguists are completely at odds here

For actual linguistics, and the process of finding large-scale trends, the cognate sets approach would certainly be most useful.
As a conlanger, I think using reconstructions and sound changes, no matter how speculative, is most useful and cognate sets would leave me stumped.

As for coding by degree of confidence -- sure! but that would leave the 'directly attested' dataset with a heavy European bias, I believe.

As conlanger, I'm interested in both. Cognate sets beget reconstructions. If I wanted to know an innovative way to obtain /q/, I'd be interested in "all" the cognates of "all" the words that contain /q/.

It'd be extra effort, but I think it's best to contain both sets of information: correspondances, and best established reconstructions.

bradrn · Post by **bradrn** » Sat Nov 30, 2024 6:52 pm

Zju wrote: ↑Sat Nov 30, 2024 3:14 pm It'd be extra effort, but I think it's best to contain both sets of information: correspondances, and best established reconstructions.

Yes: this is basically what I’m thinking too.

(It’s actually not a particularly huge amount of extra effort, since the hope is that we can calculate phoneme correspondances in an automated way. In my estimation, handling the reconstructions would require most of the work.)

Lērisama · Post by **Lērisama** » Sun Dec 01, 2024 5:04 am

bradrn wrote: ↑Sat Nov 30, 2024 6:52 pm
Zju wrote: ↑Sat Nov 30, 2024 3:14 pm It'd be extra effort, but I think it's best to contain both sets of information: correspondances, and best established reconstructions.
Yes: this is basically what I’m thinking too.

I would like to also support this, for the reasons I mentioned earlier

Man in Space · Post by **Man in Space** » Mon Dec 09, 2024 6:07 pm

Another citation for the Index.

Zju · Post by **Zju** » Tue Dec 10, 2024 3:49 pm

Thanks for sharing - seems like an interesting paper.

bradrn · Post by **bradrn** » Fri Jan 17, 2025 5:37 am

bradrn wrote: ↑Fri Nov 29, 2024 9:06 am By contrast, Alex suggested a fundamentally different approach, based on his EvoSem database (which is genuinely really useful, go check it out!). In brief, the idea would be to take synchronic cognate sets, and compile sets of phoneme correspondences from those, completely ignoring how some linguist or another may have reconstructed the original situation. Of course, this would lose a large amount of information about the precise nature of diachronic sound change. In exchange, we get far more raw data, including from language families where reconstructions are poor or absent. And of course, that data would be far more reliable and empirically grounded.

There is progress on this! I wrote a Python program which can take in a database of cognates, align them, and count how often each sound correspondence is attested. Using the ACD as a convenient data set, the top 20 inferred sound correspondences are:

Code: Select all

{g,k}: 91
{e,i}: 87
{i,o}: 77
{b,p}: 74
{h,j}: 72
{e,o}: 72
{aː,e}: 70
{l,r}: 62
{e,ə}: 61
{aː,o}: 61
{e,u}: 58
{h,t}: 58
{j,r}: 56
{h,n}: 56
{g,ʔ}: 54
{h,r}: 53
{h,l}: 53
{p,v}: 51
{h,s}: 51
{c,s}: 51

(The counts next to them are, basically, the number of disjoint groupings of languages which attest that sound correspondence at least once. But we still need to work out the best method of ranking sound correspondences, because it makes a big difference to the results.)

On the whole, I think this list looks pretty promising. Even at such an early stage, the program already produces a list of plausible-looking synchronic correspondences in a reasonable-looking order. (Keeping in mind, of course, that this is from Austronesian, where sound changes such as *C→h are very common.) I’m personally feeling very optimistic about how useful this resource could become.

Lērisama · Post by **Lērisama** » Fri Jan 17, 2025 10:17 am

bradrn wrote: ↑Fri Jan 17, 2025 5:37 am [snip – it's the post above and quite long]

I quite like this, and I think it would be useful, but I don't think it's the Index Diachronica. A major part of the usefulness of the current index is that it shows how¹ these synchronic correspondences arose, and while this is useful, it's a different kind of resource built from the same material. This feels like a different thing – an Index Synchronica. The best-supported idea seemed to be including both though, so maybe this is intended as part of the index. For what it's worth my vote would be to make them two separate, but related, indices.

¹ Insert caveats here

bradrn · Post by **bradrn** » Fri Jan 17, 2025 6:51 pm

Lērisama wrote: ↑Fri Jan 17, 2025 10:17 am For what it's worth my vote would be to make them two separate, but related, indices.

Yes, I agree with this. I thought at one point that they might be mergeable, but I don’t think so now.

Glass Half Baked · Post by **Glass Half Baked** » Fri Aug 01, 2025 8:25 pm

I am working my way through the Muskogean sound changes, and I finally have the PM to Creek changes done. This is all from Booker 2005.

[edit: phpBB has ruined the subscripts]

0. These changes can probably be projected back to Proto-Eastern-Muskogean.
ts, tʃ, ʃ > ts/tʃ
s > s/ʃ
θ > ɬ

1. Weak consonants do not disappear all at once. A glottal stop between two identical vowels is lost before other changes.
V0CaV0 > V0

2. K is lost in a number of ways. It is impossible to order the changes to kʷ, but they are listed here since some of the changes are similar, and may be related. The question of whether the preservation of preconsonantal k takes place before or after the loss of intervocalic k moves the placement of the penultimate syllable is not recoverable, in Booker 2005 or any other paper I’ve read on Muskogean historical phonology.
k > : / _n, _m
k > 0 / _C (except after penultimate syllables)
V1k(ʷ)V2 > V2 (root-finally in roots of four syllables)
V1k(ʷ)V2 > V2: (root-finally in roots of three syllables)
kʷ > p / V_V, k_ (due to voicing rules, this change could have been kʷ > b, which then merged with p)
kʷ > k everywhere else

3. The other weak consonants are lost at some point after step 2.
V1C[weak]V2 > V2

4. Muskogee has geminate consonants. Booker, like most linguists, assume a degemination at some point in the history of Muskogee. But every example I have ever seen can be accounted for by the loss of preconsonantal k. Therefore, there is no actual need for degemination. And degemination raises more questions than it answers anyway, given the changes below, all of which are geminates today.
kl > kk
yy > ll
ht > tt
tl > tt (this change occurs after the rhythmic loss of vowels between morphemes, causing havoc for the active suffix -li)

5. Changes to X
V0xV0 > V0: (There are no cognates that allow us to order this change based on its effect on which syllable is penultimate)
x > w / a_o
x > h

6. Changes to L
l > 0 / _x
ali > a: root-finally
ali > ay > when not root-final or root-initial, also in the 1sg suffix -ali
ila > iya (>ya word-initially)
ili > i:

7. Other Changes, which are impossible to order.
At some point, plosives and affricates between vowels voice.
At some point, final long vowels shorten.
o:mV > õ:wV (Between a long vowel and another vowel, m merges with w, with compensatory nasalization)
xʷ > ɸ could happen at any time. But there is no clear evidence that the phoneme in Proto-Muskogean was xʷ and not ɸ to begin with, so the notation used is arbitrary.

Zompist Bboard Again

The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica