The Index Diachronica

For the Index Diachronica project
Ares Land
Posts: 3112
Joined: Sun Jul 08, 2018 12:35 pm

Re: The Index Diachronica

Post by Ares Land »

My first impression is that the interests of conlangers vs. actual linguists are completely at odds here :)

For actual linguistics, and the process of finding large-scale trends, the cognate sets approach would certainly be most useful.
As a conlanger, I think using reconstructions and sound changes, no matter how speculative, is most useful and cognate sets would leave me stumped.

As for coding by degree of confidence -- sure! but that would leave the 'directly attested' dataset with a heavy European bias, I believe.
Lērisama
Posts: 168
Joined: Fri Oct 18, 2024 9:51 am

Re: The Index Diachronica

Post by Lērisama »

I don't think the cognate set method is entirely empirical – the decision whether to consider a set cognate is also just the opinion of whoever wrote the paper. Admittedly it is often a trivial decision, but certainly not always¹, and it relies on the same correspondence patterns that reconstruction does (obviously) – the reconstruction is just a convenient way of writing it. I'd suggest both, including reconstructions and sound changes but also at least examples of the data that they are based on.
LZ – Lēri Ziwi
PS – Proto Sāzlakuic (ancestor of LZ)
PRk – Proto Rākēwuic
XI – Xú Iạlan
VN – verbal noun
SUP – supine
DIRECT – verbal directional
My language stuff
bradrn
Posts: 6460
Joined: Fri Oct 19, 2018 1:25 am

Re: The Index Diachronica

Post by bradrn »

Ares Land wrote: Fri Nov 29, 2024 9:14 am My first impression is that the interests of conlangers vs. actual linguists are completely at odds here :)
I don’t necessarily think this was so (and I probably made it sound more binary than it is). I think we’re both interested in the two goals of (a) investigating large-scale trends, and (b) creating a reference resource for individual languages/families. From there it’s a matter of working out how to prioritise those goals, and what level of rigour to use — which are relevant problems for conlangers as much as linguists.
For actual linguistics, and the process of finding large-scale trends, the cognate sets approach would certainly be most useful.
As a conlanger, I think using reconstructions and sound changes, no matter how speculative, is most useful and cognate sets would leave me stumped.
I may have explained it poorly. (My excuse is that I was writing at 2am.) The idea is that you take those cognate sets and decompose them into synchronic phoneme correspondences. Thus, take a set like this one from Austronesian:

mata / moto / matəh / mtɔ / maka ‘eye’

Our approach would be to rely on the reconstruction (here PAN *mata), and write down the sound changes people have inferred from that: *a→o, *a→əh/_#, and so on. Alex’s approach would be to simply index the synchronic correspondences: thus you’d get something like m↔m, a↔o↔əh↔ɔ↔∅, t↔k. This gives you a lot less information, but preserves the most important data of which phonemes connect to each other. The advantages are, firstly, that it’s far easier to understand where the data comes from; and, secondly, that it’s much more generally applicable.

To be clear, I’m not necessarily in favour of this approach, at least not alone. I think a combined approach, with this plus reconstructed sound changes, could be very powerful. But this is something to debate.
Lērisama wrote: Fri Nov 29, 2024 10:25 am I don't think the cognate set method is entirely empirical – the decision whether to consider a set cognate is also just the opinion of whoever wrote the paper. Admittedly it is often a trivial decision, but certainly not always¹, and it relies on the same correspondence patterns that reconstruction does (obviously) – the reconstruction is just a convenient way of writing it. I'd suggest both, including reconstructions and sound changes but also at least examples of the data that they are based on.
This is a very important point. I recall discussing something similar — that cognates are identified on the basis of their sound correspondences, so inferring the sound correspondences from the cognate sets runs a risk of circular reasoning. But is it better or worse than blindly trusting the judgement of the linguist who wrote down a sound change? I don’t know…
Conlangs: Scratchpad | Texts | antilanguage
Software: See http://bradrn.com/projects.html
Other: Ergativity for Novices

(Why does phpBB not let me add >5 links here?)
Zju
Posts: 934
Joined: Fri Aug 03, 2018 4:05 pm

Re: The Index Diachronica

Post by Zju »

Ares Land wrote: Fri Nov 29, 2024 9:14 am My first impression is that the interests of conlangers vs. actual linguists are completely at odds here :)

For actual linguistics, and the process of finding large-scale trends, the cognate sets approach would certainly be most useful.
As a conlanger, I think using reconstructions and sound changes, no matter how speculative, is most useful and cognate sets would leave me stumped.

As for coding by degree of confidence -- sure! but that would leave the 'directly attested' dataset with a heavy European bias, I believe.
As conlanger, I'm interested in both. Cognate sets beget reconstructions. If I wanted to know an innovative way to obtain /q/, I'd be interested in "all" the cognates of "all" the words that contain /q/.

It'd be extra effort, but I think it's best to contain both sets of information: correspondances, and best established reconstructions.
/j/ <j>

Ɂaləɂahina asəkipaɂə ileku omkiroro salka.
Loɂ ɂerleku asəɂulŋusikraɂə seləɂahina əɂətlahɂun əiŋɂiɂŋa.
Hərlaɂ. Hərlaɂ. Hərlaɂ. Hərlaɂ. Hərlaɂ. Hərlaɂ. Hərlaɂ.
bradrn
Posts: 6460
Joined: Fri Oct 19, 2018 1:25 am

Re: The Index Diachronica

Post by bradrn »

Zju wrote: Sat Nov 30, 2024 3:14 pm It'd be extra effort, but I think it's best to contain both sets of information: correspondances, and best established reconstructions.
Yes: this is basically what I’m thinking too.

(It’s actually not a particularly huge amount of extra effort, since the hope is that we can calculate phoneme correspondances in an automated way. In my estimation, handling the reconstructions would require most of the work.)
Conlangs: Scratchpad | Texts | antilanguage
Software: See http://bradrn.com/projects.html
Other: Ergativity for Novices

(Why does phpBB not let me add >5 links here?)
Lērisama
Posts: 168
Joined: Fri Oct 18, 2024 9:51 am

Re: The Index Diachronica

Post by Lērisama »

bradrn wrote: Sat Nov 30, 2024 6:52 pm
Zju wrote: Sat Nov 30, 2024 3:14 pm It'd be extra effort, but I think it's best to contain both sets of information: correspondances, and best established reconstructions.
Yes: this is basically what I’m thinking too.
I would like to also support this, for the reasons I mentioned earlier
LZ – Lēri Ziwi
PS – Proto Sāzlakuic (ancestor of LZ)
PRk – Proto Rākēwuic
XI – Xú Iạlan
VN – verbal noun
SUP – supine
DIRECT – verbal directional
My language stuff
Zju
Posts: 934
Joined: Fri Aug 03, 2018 4:05 pm

Re: The Index Diachronica

Post by Zju »

Thanks for sharing - seems like an interesting paper.
/j/ <j>

Ɂaləɂahina asəkipaɂə ileku omkiroro salka.
Loɂ ɂerleku asəɂulŋusikraɂə seləɂahina əɂətlahɂun əiŋɂiɂŋa.
Hərlaɂ. Hərlaɂ. Hərlaɂ. Hərlaɂ. Hərlaɂ. Hərlaɂ. Hərlaɂ.
bradrn
Posts: 6460
Joined: Fri Oct 19, 2018 1:25 am

Re: The Index Diachronica

Post by bradrn »

bradrn wrote: Fri Nov 29, 2024 9:06 am By contrast, Alex suggested a fundamentally different approach, based on his EvoSem database (which is genuinely really useful, go check it out!). In brief, the idea would be to take synchronic cognate sets, and compile sets of phoneme correspondences from those, completely ignoring how some linguist or another may have reconstructed the original situation. Of course, this would lose a large amount of information about the precise nature of diachronic sound change. In exchange, we get far more raw data, including from language families where reconstructions are poor or absent. And of course, that data would be far more reliable and empirically grounded.
There is progress on this! I wrote a Python program which can take in a database of cognates, align them, and count how often each sound correspondence is attested. Using the ACD as a convenient data set, the top 20 inferred sound correspondences are:

Code: Select all

{g,k}: 91
{e,i}: 87
{i,o}: 77
{b,p}: 74
{h,j}: 72
{e,o}: 72
{aː,e}: 70
{l,r}: 62
{e,ə}: 61
{aː,o}: 61
{e,u}: 58
{h,t}: 58
{j,r}: 56
{h,n}: 56
{g,ʔ}: 54
{h,r}: 53
{h,l}: 53
{p,v}: 51
{h,s}: 51
{c,s}: 51
(The counts next to them are, basically, the number of disjoint groupings of languages which attest that sound correspondence at least once. But we still need to work out the best method of ranking sound correspondences, because it makes a big difference to the results.)

On the whole, I think this list looks pretty promising. Even at such an early stage, the program already produces a list of plausible-looking synchronic correspondences in a reasonable-looking order. (Keeping in mind, of course, that this is from Austronesian, where sound changes such as *C→h are very common.) I’m personally feeling very optimistic about how useful this resource could become.
Conlangs: Scratchpad | Texts | antilanguage
Software: See http://bradrn.com/projects.html
Other: Ergativity for Novices

(Why does phpBB not let me add >5 links here?)
Lērisama
Posts: 168
Joined: Fri Oct 18, 2024 9:51 am

Re: The Index Diachronica

Post by Lērisama »

bradrn wrote: Fri Jan 17, 2025 5:37 am [snip – it's the post above and quite long]
I quite like this, and I think it would be useful, but I don't think it's the Index Diachronica. A major part of the usefulness of the current index is that it shows how¹ these synchronic correspondences arose, and while this is useful, it's a different kind of resource built from the same material. This feels like a different thing – an Index Synchronica. The best-supported idea seemed to be including both though, so maybe this is intended as part of the index. For what it's worth my vote would be to make them two separate, but related, indices.

¹ Insert caveats here
LZ – Lēri Ziwi
PS – Proto Sāzlakuic (ancestor of LZ)
PRk – Proto Rākēwuic
XI – Xú Iạlan
VN – verbal noun
SUP – supine
DIRECT – verbal directional
My language stuff
bradrn
Posts: 6460
Joined: Fri Oct 19, 2018 1:25 am

Re: The Index Diachronica

Post by bradrn »

Lērisama wrote: Fri Jan 17, 2025 10:17 am For what it's worth my vote would be to make them two separate, but related, indices.
Yes, I agree with this. I thought at one point that they might be mergeable, but I don’t think so now.
Conlangs: Scratchpad | Texts | antilanguage
Software: See http://bradrn.com/projects.html
Other: Ergativity for Novices

(Why does phpBB not let me add >5 links here?)
Post Reply