The Index Diachronica

bradrn · Post by **bradrn** » Thu Sep 01, 2022 7:51 pm

Moose-tache wrote: ↑Thu Sep 01, 2022 10:09 am What I mean is, even if it's cited, one single ZBBer might just have a hobby horse for, say, glottalic theory or something. I feel much more confident in the result if multiple contributors have looked at it, possibly made tweaks, and then said "Yup, that's good." I mean, if nobody is going to check my work, I could cite Edo Nyland for my Basque-to-Modern-Xhosa sound changes. Ideally we should have more than one person who can provide some review to each entry.

Yes, of course — we definitely need to have some review and approval process before changes can be added. Sorry for not explicitly stating that earlier.

fusijui · Post by **fusijui** » Thu Sep 01, 2022 9:45 pm

I could do something covering Tungusic pretty readily, a little less readily for Mongolic, and with some actual inconvenience/legwork, I suppose, Salishan, Turkic, Bodic, maybe some others.

bradrn · Post by **bradrn** » Thu Sep 01, 2022 10:54 pm

fusijui wrote: ↑Thu Sep 01, 2022 9:45 pm I could do something covering Tungusic pretty readily …

Yes please! And I know Moose-tache has also mentioned they know something about Tungusic, which might make it a bit easier.

Man in Space · Post by **Man in Space** » Fri Sep 02, 2022 3:55 pm

OK. So how do we want to organize things? One thread per language family where we compile everything and discuss?

Are there any FOSS platforms we can use to construct the site? My coding skills leave much to be desired (Bootstrap and HTML are about as much as I can do in these regards, and even that’s stretching it.)

bradrn · Post by **bradrn** » Fri Sep 02, 2022 10:01 pm

Man in Space wrote: ↑Fri Sep 02, 2022 3:55 pm OK. So how do we want to organize things? One thread per language family where we compile everything and discuss?

Sounds good to me. Also one thread for discussion about the software and database organisation.

Are there any FOSS platforms we can use to construct the site? My coding skills leave much to be desired (Bootstrap and HTML are about as much as I can do in these regards, and even that’s stretching it.)

Our problem domain is fairly specialised, so I strongly suspect we’ll need to make our own website. I can write Haskell, and have made a website using it before; it was bad, but I’m a lot better now. Of rather more significance is that I’ve also made a sound change applier with it, and I can report that Haskell is pretty good at representing sound changes. (We could even use that SCA directly, as a library, though I doubt all our sound changes will fit into its format yet.) But I’m fine to use any other language if that’s what other people find easier — although we do seem to have rather a lot of Haskell developers here…

dhok · Post by **dhok** » Sat Sep 03, 2022 3:35 pm

I think we should think a little bit about how sound changes should be represented. I'm going to go ahead and say that I don't think everything should be normalized to IPA—for languages in the Americas, for example, Americanist transcription is fine and (IMO) looks a lot better; Indo-European's in-house transcription is so ubiquitous that it would be flat-out bizarre (and misleading given e.g. the uncertainty surrounding the laryngeals) to try and normalize it.

We may wish to restrict ourselves to a handful of transcription systems, with all pages stating explicitly what transcription they are using. Ad hoc cover symbols (for series of stops at the same POA, for example) should be defined rigorously.

bradrn · Post by **bradrn** » Sat Sep 03, 2022 8:23 pm

dhok wrote: ↑Sat Sep 03, 2022 3:35 pm I think we should think a little bit about how sound changes should be represented. I'm going to go ahead and say that I don't think everything should be normalized to IPA—for languages in the Americas, for example, Americanist transcription is fine and (IMO) looks a lot better; Indo-European's in-house transcription is so ubiquitous that it would be flat-out bizarre (and misleading given e.g. the uncertainty surrounding the laryngeals) to try and normalize it.

We may wish to restrict ourselves to a handful of transcription systems, with all pages stating explicitly what transcription they are using. Ad hoc cover symbols (for series of stops at the same POA, for example) should be defined rigorously.

I agree this is something we need to consider. (The example that came to mind for me was Proto-Austronesian, with its /*R *D *T *C *N *S/.) But I really like your suggestion of having each page specify its own transcription: it seems like a very sensible compromise to me.

As for ad-hoc symbols, I agree that they should be defined rigorously — though I note that in many cases they can be entirely replaced with featural specifications, which are less problematic.

dhok · Post by **dhok** » Sun Sep 04, 2022 7:18 am

bradrn wrote: ↑Sat Sep 03, 2022 8:23 pm
dhok wrote: ↑Sat Sep 03, 2022 3:35 pm I think we should think a little bit about how sound changes should be represented. I'm going to go ahead and say that I don't think everything should be normalized to IPA—for languages in the Americas, for example, Americanist transcription is fine and (IMO) looks a lot better; Indo-European's in-house transcription is so ubiquitous that it would be flat-out bizarre (and misleading given e.g. the uncertainty surrounding the laryngeals) to try and normalize it.

We may wish to restrict ourselves to a handful of transcription systems, with all pages stating explicitly what transcription they are using. Ad hoc cover symbols (for series of stops at the same POA, for example) should be defined rigorously.
I agree this is something we need to consider. (The example that came to mind for me was Proto-Austronesian, with its /*R *D *T *C *N *S/.) But I really like your suggestion of having each page specify its own transcription: it seems like a very sensible compromise to me.

As for ad-hoc symbols, I agree that they should be defined rigorously — though I note that in many cases they can be entirely replaced with featural specifications, which are less problematic.

I'm not so sure they are in all instances. Does [+velar] include the palato-velar series of IE and Athabaskan? Is /ð/ a voiced fricative or an approximant? Is /a/ a front vowel or a back one?

In general, I think theoretical phonology is too heavily wedded to synchronic spherical-cow-in-a-vacuum analysis. What sort of a sound a phoneme or phone is is mostly dictated by its phonetics, but around the edges it's also (I think) determined by what it does, and that's usually a question of diachronics in the end. If /ð/ patterns with /θ/ (with regards to POA) or with /v z ʒ/ (with regards to MOA) it is a fricative; if it patterns with /r l/ it is an approximant. This could be true even if the phonetics of /ð/ are identical in two languages—its phonology could be wildly different.

I have definitely been annoyed in the past by the use of featural representations on the Index that didn't specify exactly what they meant. I am almost tempted to say the ideal format for sound changes should be a heavily-annotated Brassica file: all categories and features must be specified before they can be used.

This gets even more particularly true for discussion of proto-languages, where it may be possible to determine certain features of a sound quite precisely due to what sort of changes it undergoes or induces in the daughters, but pinpointing its exact identity is not possible.

Also, it's always worth remembering that sound changes (at least initially) apply to phones and not phonemes, and to the extent a change acts on a feature it may only cover certain allophones of a phoneme. Since we usually want to write sound changes phonemically, we usually solve this by adding an environment, but the change itself may be effectively unconditional.

bradrn · Post by **bradrn** » Sun Sep 04, 2022 10:00 am

dhok wrote: ↑Sun Sep 04, 2022 7:18 am
bradrn wrote: ↑Sat Sep 03, 2022 8:23 pm
dhok wrote: ↑Sat Sep 03, 2022 3:35 pm I think we should think a little bit about how sound changes should be represented. I'm going to go ahead and say that I don't think everything should be normalized to IPA—for languages in the Americas, for example, Americanist transcription is fine and (IMO) looks a lot better; Indo-European's in-house transcription is so ubiquitous that it would be flat-out bizarre (and misleading given e.g. the uncertainty surrounding the laryngeals) to try and normalize it.

We may wish to restrict ourselves to a handful of transcription systems, with all pages stating explicitly what transcription they are using. Ad hoc cover symbols (for series of stops at the same POA, for example) should be defined rigorously.
I agree this is something we need to consider. (The example that came to mind for me was Proto-Austronesian, with its /*R *D *T *C *N *S/.) But I really like your suggestion of having each page specify its own transcription: it seems like a very sensible compromise to me.

As for ad-hoc symbols, I agree that they should be defined rigorously — though I note that in many cases they can be entirely replaced with featural specifications, which are less problematic.
I'm not so sure they are in all instances.

I agree, which is why I said ‘many cases’, not ‘all cases’. That being said, your point that features need to be well-defined too is a good one.

Is /a/ a front vowel or a back one?

Neither; it’s central.

I have definitely been annoyed in the past by the use of featural representations on the Index that didn't specify exactly what they meant. I am almost tempted to say the ideal format for sound changes should be a heavily-annotated Brassica file: all categories and features must be specified before they can be used.

As it happens, I’ve been thinking along precisely the same lines (and not just because it’s my own SCA!). I’ve already mentioned the possibility of using Brassica itself as a backend for this project; perhaps at some point we might even be able to make some kind of auto-export functionality. Then again, that may be doomed to failure, since I have a horrible suspicion that even the most detailed of well-researched sound changes are thoroughly incomplete — unless, of course, that’s a problem an auto-exporter could help solve…

bradrn · Post by **bradrn** » Tue Sep 06, 2022 11:52 pm

In the hope of inspiring further discussion, I have created two threads for specific language families: one for Tungusic (because we have at least two people here who are knowledgeable about that family), and one for Anglic (because I’ve already started collecting changes for that). But feel free to make more if you want!

bradrn · Post by **bradrn** » Fri Sep 09, 2022 10:21 am

Znex made what struck me as some very relevant points in the Anglic thread, on how to get a fully correct list of ordered sound changes:

Znex wrote: ↑Thu Sep 08, 2022 10:27 am … if you're not sure, you want to find multiple pages and examples of sound changes to corroborate any particular sound change.

If sound changes happen in any particular order, it's not too hard to figure out by comparing examples and seeing what words are affected by some change and not others. And there's plenty of information out there to compare, especially considering it's English we're talking about here.

In particular, I’m not quite sure how we should deal with the latter point: it’s indisputably true that comparing examples helps to verify changes, but that seems slightly too close to ‘original research’ for my comfort. Besides, how can we do anything like this with language families we’re not familiar with (which, let’s face it, is most of them)?

In fact, now that I look through through this thread again, I note dhok said pretty much the same thing:

dhok wrote: ↑Wed Aug 31, 2022 10:32 am Also, some measure of allowing for "original research" might not be the worst idea...the problem is that to really know the context for a historical phonology paper you do want to have some measure of familiarity with the language or its family.

But we don’t seem to have properly discussed this point yet. (Currently I’m leaning towards Kuchigakatai’s idea of including argumentation for specific sound changes when something isn’t obvious from the reference alone, which would allow this ‘original research’ to be included in a principled way, but I’m still not quite sure.)

Nortaneous · Post by **Nortaneous** » Tue Oct 11, 2022 9:43 pm

bradrn wrote: ↑Wed Aug 31, 2022 11:48 pm However, I think we have some room to include less well-vetted sources too. This is why I like the idea of a ‘tiers’ system so much: it lets us include more data, while still maintaining reliability. Then, if the user only wants vetted sources, they can filter for ‘most reliable’ or whatever we end up calling it. As long as they’re not actively wrong, I see no reason to exclude sources just because they don’t comprehensively present a complete history.

cf. Index Phonemica, which mostly takes the caveat-emptor approach of just indexing and making searchable data from papers (incl. SIL phonology sketches, which are often quite bad... but which are also often the only source on a language), but which has editorial conventions (as I understand it mostly for tractability - trying to minimize diphthong inventories by positing glides, that sort of thing)

and Pyysalo's "System PIE" web app, built around the idea that the sound changes from Pyysalo-Indo-European to the various languages were complete enough that the program could compute the regular form and highlight irregularities. Building a searchable dictionary of every language is probably out of scope, but maybe we could import extant databases somehow? (ABVD, the online Proto-Siouan dictionary, etc... unfortunately for Indo-European there isn't a better resource than Wiktionary)

bradrn wrote: ↑Sat Sep 03, 2022 8:23 pm I agree this is something we need to consider. (The example that came to mind for me was Proto-Austronesian, with its /*R *D *T *C *N *S/.)

Austronesian will be tricky - I don't think there's a consensus inventory or transcription system the way there is for PIE. Using Blust's reconstruction and transcription system is defensible enough, but some of the graphical values are misleading (e.g. *N), and that can't capture correspondences relating to extensions to Blust's inventory. Maybe it'd be better to start with subgroups, but I don't know of any comprehensive overviews of even as well-studied a subgroup as Polynesian.

In cases where there's no single authoritative source, an introduction and bibliography might be necessary, although I'm not sure how exactly that should be handled.

bradrn · Post by **bradrn** » Wed Oct 12, 2022 2:35 am

Nortaneous wrote: ↑Tue Oct 11, 2022 9:43 pm
bradrn wrote: ↑Wed Aug 31, 2022 11:48 pm However, I think we have some room to include less well-vetted sources too. This is why I like the idea of a ‘tiers’ system so much: it lets us include more data, while still maintaining reliability. Then, if the user only wants vetted sources, they can filter for ‘most reliable’ or whatever we end up calling it. As long as they’re not actively wrong, I see no reason to exclude sources just because they don’t comprehensively present a complete history.
cf. Index Phonemica, which mostly takes the caveat-emptor approach of just indexing and making searchable data from papers (incl. SIL phonology sketches, which are often quite bad... but which are also often the only source on a language), but which has editorial conventions (as I understand it mostly for tractability - trying to minimize diphthong inventories by positing glides, that sort of thing)

Interesting point — although I’m not terribly familiar with IP, so I don’t know how well this works in practise. (I know that at least PHOIBLE can be rather suspect, but that one has different indexing conventions…) I imagine we’ll need some editorial conventions either way.

and Pyysalo's "System PIE" web app, built around the idea that the sound changes from Pyysalo-Indo-European to the various languages were complete enough that the program could compute the regular form and highlight irregularities.

I’m even less familiar with this one, though I confess that similar ideas have occured to me. How well does it work, though?

Building a searchable dictionary of every language is probably out of scope, but maybe we could import extant databases somehow? (ABVD, the online Proto-Siouan dictionary, etc... unfortunately for Indo-European there isn't a better resource than Wiktionary)

This would definitely be achievable (modulo licensing restrictions of course), by web-scraping if nothing else, but it might perhaps be somewhat out of scope for a new ID. As a follow-up project, perhaps?

bradrn wrote: ↑Sat Sep 03, 2022 8:23 pm I agree this is something we need to consider. (The example that came to mind for me was Proto-Austronesian, with its /*R *D *T *C *N *S/.)
Austronesian will be tricky - I don't think there's a consensus inventory or transcription system the way there is for PIE. Using Blust's reconstruction and transcription system is defensible enough, but some of the graphical values are misleading (e.g. *N), and that can't capture correspondences relating to extensions to Blust's inventory. Maybe it'd be better to start with subgroups, but I don't know of any comprehensive overviews of even as well-studied a subgroup as Polynesian.

I wasn’t aware that the transcription system for Austronesian was unstandardised. Personally, I incline to just using the one corresponding to the reconstruction on which we end up relying most heavily.

(Also, I’m surprised to hear that even Polynesian doesn’t have any comprehensive overview. It shouldn’t be to hard to collect one, though…)

In cases where there's no single authoritative source, an introduction and bibliography might be necessary, although I'm not sure how exactly that should be handled.

By adding introduction and bibliography sections to the web interface, perhaps? I was thinking we’d need those in any case.

Actually, that reminds me — I’ve been meaning to make a mockup of the web interface for the Middle English sound changes, those being the only ones which are anything like complete at the moment. It’s escaped my attention due to my being utterly swamped with other work at the moment, but hopefully I’ll get around to it sometime in the next month or two.

Zju · Post by **Zju** » Wed Oct 12, 2022 12:23 pm

Nortaneous wrote: ↑Tue Oct 11, 2022 9:43 pm ....and Pyysalo's "System PIE" web app, built around the idea that the sound changes from Pyysalo-Indo-European....

Uuuurrrghhhhhhhhh........ Is System PIE still a thing? Now, about that famous dinosaur conlang of mine...

Nortaneous wrote: ↑Tue Oct 11, 2022 9:43 pm ...unfortunately for Indo-European there isn't a better resource than Wiktionary

Do you mean to tell me that I've been using the best PIE resource there is for quite a while?

Nortaneous · Post by **Nortaneous** » Wed Oct 12, 2022 7:49 pm

bradrn wrote: ↑Wed Oct 12, 2022 2:35 am
and Pyysalo's "System PIE" web app, built around the idea that the sound changes from Pyysalo-Indo-European to the various languages were complete enough that the program could compute the regular form and highlight irregularities.
I’m even less familiar with this one, though I confess that similar ideas have occured to me. How well does it work, though?

I haven't spent much time with the web app, but it seems almost decent - the main technical problem is that there's no real search, but this is typical for academic projects. (see also: PHOIBLE)

The trouble is that the reconstruction makes no sense - the two laryngeals (*h *ɦ) can only occur next to the low vowel *ɑ, which itself is typically lost between two consonants, so, for example, Albanian derë, which in the usual reconstruction goes back to *dhwōr-eh2, is given as descending from... *dɦɑoirēɑh.

Which can't even be mechanically converted to the standard model, because it's saying the /e/ comes from *oi > *ai > e rather than *ō > *ø > (v)e. (I'm not even sure if Pyysalo recognizes this rule - the only example I can think of where v- is preserved is vesh 'ear', which I can't find. I'd search the 'show all' page - you can only search on the page you're looking at, which is truly bizarre - but my computer doesn't have enough free memory to render the page.

(To be fair, good search is a difficult technical and design problem... but adequate search would be doable with a decent working knowledge of SQL. Which every web developer ought to know. Especially the academic ones - I've seen people do things with pandas that should be illegal.)

Man in Space · Post by **Man in Space** » Wed Oct 12, 2022 8:56 pm

Nortaneous wrote: ↑Wed Aug 31, 2022 8:58 pm(Is "Index Diachronica" valid Latin? Cf. "Index Librorum Prohibitorum", etc. I talked to the Index Phonemica guy a few times and he was pretty unhappy about this.)

I just realized I never responded to this…anyway, the answer, at least if you believe Wiktionary, is yes.

Travis B. · Post by **Travis B.** » Thu Oct 13, 2022 11:27 am

Nortaneous wrote: ↑Wed Oct 12, 2022 7:49 pm
bradrn wrote: ↑Wed Oct 12, 2022 2:35 am
and Pyysalo's "System PIE" web app, built around the idea that the sound changes from Pyysalo-Indo-European to the various languages were complete enough that the program could compute the regular form and highlight irregularities.
I’m even less familiar with this one, though I confess that similar ideas have occured to me. How well does it work, though?
I haven't spent much time with the web app, but it seems almost decent - the main technical problem is that there's no real search, but this is typical for academic projects. (see also: PHOIBLE)

The trouble is that the reconstruction makes no sense - the two laryngeals (*h *ɦ) can only occur next to the low vowel *ɑ, which itself is typically lost between two consonants, so, for example, Albanian derë, which in the usual reconstruction goes back to *dhwōr-eh2, is given as descending from... *dɦɑoirēɑh.

Which can't even be mechanically converted to the standard model, because it's saying the /e/ comes from *oi > *ai > e rather than *ō > *ø > (v)e. (I'm not even sure if Pyysalo recognizes this rule - the only example I can think of where v- is preserved is vesh 'ear', which I can't find. I'd search the 'show all' page - you can only search on the page you're looking at, which is truly bizarre - but my computer doesn't have enough free memory to render the page.

(To be fair, good search is a difficult technical and design problem... but adequate search would be doable with a decent working knowledge of SQL. Which every web developer ought to know. Especially the academic ones - I've seen people do things with pandas that should be illegal.)

Es scheint, dass man Deutsch kennen muss, diese Webanwendung zu benutzen.

Raphael · Post by **Raphael** » Thu Oct 13, 2022 11:32 am

Travis B. wrote: ↑Thu Oct 13, 2022 11:27 am
Es scheint so, als ob man Deutsch kennen muss, um diese Webanwendung zu benutzen.

(Sorry, I don't remember how to make words red right now.)

Man in Space · Post by **Man in Space** » Sat Oct 22, 2022 3:44 pm

I just realized I ought also to mention SquiDark’s errata compilation.

Also: How much of the old Correspondence Library proper (that is to say, the OG changes from the old forum and KneeQuickie) would it be feasible to rely on while maintaining quality?

bradrn · Post by **bradrn** » Sat Oct 22, 2022 7:37 pm

Man in Space wrote: ↑Sat Oct 22, 2022 3:44 pm I just realized I ought also to mention SquiDark’s errata compilation.

You (or someone else) already mentioned it on the other thread — but thanks for reminding us about it!

Also: How much of the old Correspondence Library proper (that is to say, the OG changes from the old forum and KneeQuickie) would it be feasible to rely on while maintaining quality?

Wait— I thought the ID was the Correspondence Library… where do the sound changes in tge former come from, then?

Zompist Bboard Again

The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica

Re: The Index Diachronica