Zompist Bboard Again

Posted: **Fri Nov 29, 2024 9:14 am**

My first impression is that the interests of conlangers vs. actual linguists are completely at odds here

For actual linguistics, and the process of finding large-scale trends, the cognate sets approach would certainly be most useful.
As a conlanger, I think using reconstructions and sound changes, no matter how speculative, is most useful and cognate sets would leave me stumped.

As for coding by degree of confidence -- sure! but that would leave the 'directly attested' dataset with a heavy European bias, I believe.

Posted: **Fri Nov 29, 2024 10:25 am**

I don't think the cognate set method is entirely empirical – the decision whether to consider a set cognate is also just the opinion of whoever wrote the paper. Admittedly it is often a trivial decision, but certainly not always¹, and it relies on the same correspondence patterns that reconstruction does (obviously) – the reconstruction is just a convenient way of writing it. I'd suggest both, including reconstructions and sound changes but also at least examples of the data that they are based on.

Posted: **Fri Nov 29, 2024 6:37 pm**

Ares Land wrote: ↑Fri Nov 29, 2024 9:14 am My first impression is that the interests of conlangers vs. actual linguists are completely at odds here

I don’t necessarily think this was so (and I probably made it sound more binary than it is). I think we’re both interested in the two goals of (a) investigating large-scale trends, and (b) creating a reference resource for individual languages/families. From there it’s a matter of working out how to prioritise those goals, and what level of rigour to use — which are relevant problems for conlangers as much as linguists.

For actual linguistics, and the process of finding large-scale trends, the cognate sets approach would certainly be most useful.
As a conlanger, I think using reconstructions and sound changes, no matter how speculative, is most useful and cognate sets would leave me stumped.

I may have explained it poorly. (My excuse is that I was writing at 2am.) The idea is that you take those cognate sets and decompose them into synchronic phoneme correspondences. Thus, take a set like this one from Austronesian:

mata / moto / matəh / mtɔ / maka ‘eye’

Our approach would be to rely on the reconstruction (here PAN *mata), and write down the sound changes people have inferred from that: *a→o, *a→əh/_#, and so on. Alex’s approach would be to simply index the synchronic correspondences: thus you’d get something like m↔m, a↔o↔əh↔ɔ↔∅, t↔k. This gives you a lot less information, but preserves the most important data of which phonemes connect to each other. The advantages are, firstly, that it’s far easier to understand where the data comes from; and, secondly, that it’s much more generally applicable.

To be clear, I’m not necessarily in favour of this approach, at least not alone. I think a combined approach, with this plus reconstructed sound changes, could be very powerful. But this is something to debate.

Lērisama wrote: ↑Fri Nov 29, 2024 10:25 am I don't think the cognate set method is entirely empirical – the decision whether to consider a set cognate is also just the opinion of whoever wrote the paper. Admittedly it is often a trivial decision, but certainly not always¹, and it relies on the same correspondence patterns that reconstruction does (obviously) – the reconstruction is just a convenient way of writing it. I'd suggest both, including reconstructions and sound changes but also at least examples of the data that they are based on.

This is a very important point. I recall discussing something similar — that cognates are identified on the basis of their sound correspondences, so inferring the sound correspondences from the cognate sets runs a risk of circular reasoning. But is it better or worse than blindly trusting the judgement of the linguist who wrote down a sound change? I don’t know…

Posted: **Sat Nov 30, 2024 3:14 pm**

Ares Land wrote: ↑Fri Nov 29, 2024 9:14 am My first impression is that the interests of conlangers vs. actual linguists are completely at odds here

For actual linguistics, and the process of finding large-scale trends, the cognate sets approach would certainly be most useful.
As a conlanger, I think using reconstructions and sound changes, no matter how speculative, is most useful and cognate sets would leave me stumped.

As for coding by degree of confidence -- sure! but that would leave the 'directly attested' dataset with a heavy European bias, I believe.

As conlanger, I'm interested in both. Cognate sets beget reconstructions. If I wanted to know an innovative way to obtain /q/, I'd be interested in "all" the cognates of "all" the words that contain /q/.

It'd be extra effort, but I think it's best to contain both sets of information: correspondances, and best established reconstructions.

Posted: **Sat Nov 30, 2024 6:52 pm**

Zju wrote: ↑Sat Nov 30, 2024 3:14 pm It'd be extra effort, but I think it's best to contain both sets of information: correspondances, and best established reconstructions.

Yes: this is basically what I’m thinking too.

(It’s actually not a particularly huge amount of extra effort, since the hope is that we can calculate phoneme correspondances in an automated way. In my estimation, handling the reconstructions would require most of the work.)

Posted: **Sun Dec 01, 2024 5:04 am**

bradrn wrote: ↑Sat Nov 30, 2024 6:52 pm
Zju wrote: ↑Sat Nov 30, 2024 3:14 pm It'd be extra effort, but I think it's best to contain both sets of information: correspondances, and best established reconstructions.
Yes: this is basically what I’m thinking too.

I would like to also support this, for the reasons I mentioned earlier

Posted: **Mon Dec 09, 2024 6:07 pm**

Another citation for the Index.

Posted: **Tue Dec 10, 2024 3:49 pm**

Thanks for sharing - seems like an interesting paper.

Posted: **Fri Jan 17, 2025 5:37 am**

bradrn wrote: ↑Fri Nov 29, 2024 9:06 am By contrast, Alex suggested a fundamentally different approach, based on his EvoSem database (which is genuinely really useful, go check it out!). In brief, the idea would be to take synchronic cognate sets, and compile sets of phoneme correspondences from those, completely ignoring how some linguist or another may have reconstructed the original situation. Of course, this would lose a large amount of information about the precise nature of diachronic sound change. In exchange, we get far more raw data, including from language families where reconstructions are poor or absent. And of course, that data would be far more reliable and empirically grounded.

There is progress on this! I wrote a Python program which can take in a database of cognates, align them, and count how often each sound correspondence is attested. Using the ACD as a convenient data set, the top 20 inferred sound correspondences are:

Code: Select all

{g,k}: 91
{e,i}: 87
{i,o}: 77
{b,p}: 74
{h,j}: 72
{e,o}: 72
{aː,e}: 70
{l,r}: 62
{e,ə}: 61
{aː,o}: 61
{e,u}: 58
{h,t}: 58
{j,r}: 56
{h,n}: 56
{g,ʔ}: 54
{h,r}: 53
{h,l}: 53
{p,v}: 51
{h,s}: 51
{c,s}: 51

(The counts next to them are, basically, the number of disjoint groupings of languages which attest that sound correspondence at least once. But we still need to work out the best method of ranking sound correspondences, because it makes a big difference to the results.)

On the whole, I think this list looks pretty promising. Even at such an early stage, the program already produces a list of plausible-looking synchronic correspondences in a reasonable-looking order. (Keeping in mind, of course, that this is from Austronesian, where sound changes such as *C→h are very common.) I’m personally feeling very optimistic about how useful this resource could become.

Posted: **Fri Jan 17, 2025 10:17 am**

bradrn wrote: ↑Fri Jan 17, 2025 5:37 am [snip – it's the post above and quite long]

I quite like this, and I think it would be useful, but I don't think it's the Index Diachronica. A major part of the usefulness of the current index is that it shows how¹ these synchronic correspondences arose, and while this is useful, it's a different kind of resource built from the same material. This feels like a different thing – an Index Synchronica. The best-supported idea seemed to be including both though, so maybe this is intended as part of the index. For what it's worth my vote would be to make them two separate, but related, indices.

¹ Insert caveats here

Posted: **Fri Jan 17, 2025 6:51 pm**

Lērisama wrote: ↑Fri Jan 17, 2025 10:17 am For what it's worth my vote would be to make them two separate, but related, indices.

Yes, I agree with this. I thought at one point that they might be mergeable, but I don’t think so now.

Zompist Bboard Again

The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica