Page 12 of 12
Re: The Index Diachronica
Posted: Wed May 22, 2024 4:19 am
by Darren
- Examples — it would be nice to have examples for each sound changes. This shouldn’t be too hard for any halfway reliable source, although it would make for more work. Given the hyperlinked nature of the new ID, he suggested that example words could e.g. be linked to the corresponding Wiktionary entry when present. (I think it shouldn’t be very hard to retrofit this into the existing data schema).
That sounds like a good idea. Please badger me to do this with all the families I've done so far.
The rest of the suggestions I fully support but they sound more like coding problems which are way beyond me :/
Re: The Index Diachronica
Posted: Wed May 22, 2024 4:47 am
by bradrn
Darren wrote: ↑Wed May 22, 2024 4:19 am
That sounds like a good idea.
Great, then I’ll implement it when I get time.
Please badger me to do this with all the families I've done so far.
No need: I’m working from the same papers, so I can add the examples myself.
The rest of the suggestions I fully support but they sound more like coding problems which are way beyond me :/
Graphs are really just a data analysis problem. Maps are similar, with the added task of requiring location metadata. Neither is feasible with what we have right now, but once we have search capabilities they should be straightforward.
Improving the representation of suprasegmentals, on the other hand, is a bigger problem. It’s probably something which needs to be fixed within Brassica itself, not just in the
ID. I’ve been thinking about it for quite some time, and like I said, I’m unsure how to solve it. (Probably I should make a dedicated discussion thread for the issue at some point.)
Re: The Index Diachronica
Posted: Tue Aug 27, 2024 8:07 am
by Neonnaut
Would be great to see Proto Korean to Modern Korean.
Re: The Index Diachronica
Posted: Tue Aug 27, 2024 8:13 am
by bradrn
Neonnaut wrote: ↑Tue Aug 27, 2024 8:07 am
Would be great to see Proto Korean to Modern Korean.
If you know anything about it, please feel free to start a thread for Koreanic and start writing up changes!
(The website component of this project is on hold at the moment, pending improvements to Brassica… which I suspect will be particularly important here, given the tonal nature of Middle Korean (IIRC). I’ll resume transferring changes to the website once I release Brassica v1.0.0.)
Re: The Index Diachronica
Posted: Tue Aug 27, 2024 6:19 pm
by fusijui
Neonnaut wrote: ↑Tue Aug 27, 2024 8:07 am
Would be great to see Proto Korean to Modern Korean.
I know professional Koreanists who feel the same way.
Re: The Index Diachronica
Posted: Tue Aug 27, 2024 8:33 pm
by bradrn
fusijui wrote: ↑Tue Aug 27, 2024 6:19 pm
Neonnaut wrote: ↑Tue Aug 27, 2024 8:07 am
Would be great to see Proto Korean to Modern Korean.
I know professional Koreanists who feel the same way.
Though, more seriously… if you know any, could they perhaps be persuaded to get involved?
Re: The Index Diachronica
Posted: Wed Aug 28, 2024 5:27 am
by fusijui
I have tried pointing some likely pros in this direction, but it's been quite a while -- I should give it another go, as the opportunities present themselves. Thanks for the reminder!
Re: The Index Diachronica
Posted: Wed Aug 28, 2024 5:29 am
by bradrn
fusijui wrote: ↑Wed Aug 28, 2024 5:27 am
I have tried pointing some likely pros in this direction, but it's been quite a while -- I should give it another go, as the opportunities present themselves. Thanks for the reminder!
That would be great if you could, thanks!
Re: The Index Diachronica
Posted: Sun Nov 10, 2024 7:20 pm
by bradrn
After a hiatus, I’m starting to get back into this project. Just now I finished updating the code to work with Brassica 1.0.0. From the user’s side that doesn’t yield any huge changes, but now it should be a bit easier to write changes involving tones and suchlike. Next, when I get the time (and the inclination), I’m going to go over some of the new entries in this subforum and start transferring them over to the website. I also want to experiment with writing some code for analysis, to prove that we can do something with this data more interesting than ‘format the sound changes on a website’.
In other news: in the background, myself and Man in Space have been talking to some linguists who are very enthusiastic about having a sound change database for their own purposes. We haven’t agreed on very much yet, but hopefully we can put together a design which will be useful for everyone involved.
(In the course of that discussion Man in Space linked
this article. It’s really great, and I strongly suggest that everyone here read it.)
Re: The Index Diachronica
Posted: Wed Nov 13, 2024 9:25 pm
by bradrn
I think it’s time I wrote down the design principles I’ve been implicitly (or explicitly) assuming:
- No original research. It could easily take a whole PhD to work out a canonical set of sound changes for a language family, and we don’t have that time nor the manpower. So we should avoid doing original research by staying as close to the secondary sources as we can. (And if we ever do find ourselves conducting original research, that fact should be made very clear in the published version and in the data.)
- Reference everything. A corollary of the above: every piece of data should be accompanied by a citation pointing back to where we got that data from. (This is of course standard academic practice. Also very similar to Wikipedia’s approach.)
- Use a structured format. Currently, this is Brassica syntax for sound changes. This helps to keep everything consistent, and makes it possible to process sound changes on a computer.
- Avoid editorialising. It’s easy to misunderstand papers, and that makes the data unreliable (see: WALS). The easiest way to avoid this problem is to write down the data as transparently as possible. Thus, if a source writes down a sound change in a certain way, we should strive to write it as similarly as possible, using the same symbols and expressing it the same way. If something can’t be exactly represented in our sound change format, we should get as close as possible, then note down the rest in English text (ideally using direct quotes from the source).
- Map transcriptions to IPA. A corollary of the above is that inconsistencies in transcription between sources get carried over into our database. Mapping transcriptions to a common standard makes it possible to compare sound changes between sources, which is vital. It also exposes ambiguities where a symbol could refer to several different phones. If a source is unclear, we should note that fact but still make a best guess. (This is editorialising, but it’s fine as long as it’s identified as such.)
And, of course, the overall goal which drives all of these choices: to have a database of sound changes which is
reliable and
useful for as wide a range of people as possible.
I know that some of these principles have been controversial, especially (1). I’m writing them down here so that we can have a proper conversation about them. Feel free to disagree if you can think of a better approach — or conversely, mention if you have any more suggestions to add to the list.
bradrn wrote: ↑Sun Nov 10, 2024 7:20 pm
I also want to experiment with writing some code for analysis, to prove that we can do something with this data more interesting than ‘format the sound changes on a website’.
Meanwhile, I’ve made a start on this. I wrote some code (in the ‘analysis’ branch on GitHub) to extract input→output phoneme pairs from our data so far. The output is easy enough to graph with
Gephi:
- diachronica-graph.png (154.86 KiB) Viewed 441 times
I colourised it using Gephi’s cluster analysis. The results aren’t very good, I think because there’s not much data. (I’ve gotten more interesting results using the data from the old
Index Diachronica, but of course the quality of that data isn’t very good.)
Re: The Index Diachronica
Posted: Thu Nov 14, 2024 6:28 pm
by Man in Space
I’m getting the hang of Brassica (or trying to, at least).
Two questions:
- Will we be writing these lists inline in a Brassica document (presumably using the ; flag), or will the Brassica file be like a supplement to some sort of comprehensive page?
- How do 4 and 5 interplay? I think I'm missing something because it’s not clicking. Is 4 kind of talking about describing the conditioning/making reference to features?
Re: The Index Diachronica
Posted: Thu Nov 14, 2024 8:42 pm
by bradrn
Man in Space wrote: ↑Thu Nov 14, 2024 6:28 pm
1. Will we be writing these lists inline in a Brassica document (presumably using the
; flag), or will the Brassica file be like a supplement to some sort of comprehensive page?
So far I’ve been using a half-baked custom format (
sample) which mixes Brassica sound changes with references and other metadata. I considered using comments as you say, but unfortunately the very first thing the Brassica parser does is to strip them out, so it would require some thinking about how to do that best…
2. How do 4 and 5 interplay? I think I'm missing something because it’s not clicking. Is 4 kind of talking about describing the conditioning/making reference to features?
I’m not quite sure what you’re asking here, sorry. My thought was that, to ensure accuracy, we want to preserve the original data as it was presented (4), but we also want sound changes to be comparable with each other so we need to map those source-specific conventions to something more universal (5). The alternative would be writing everything as IPA directly, which is possible but I think obscures the nature of reconstructed phonemes. (The old
ID is a bit inconsistent but it basically does this, IIRC.)
Re: The Index Diachronica
Posted: Fri Nov 15, 2024 7:18 pm
by bradrn
Just remembered that fusijui made this pertinent comment a few months ago:
fusijui wrote: ↑Sun Sep 01, 2024 10:49 pm
For most if not all of the language families/groupings I personally know much about, what it sounds like you're expecting simply doesn't exist. There is not the documentation of sound changes that's published and also (plausibly) comprehensive, let alone also uncontroversial and widely accepted.
Additionally, those who have access to the kinds of material you want may volunteer to transcribe the data you want into the database structure you want, but that in itself is a an ask, even if the actual goodies are even there in the first place.
Meaningful/usable results; freedom from editing + high verifiability; volunteer engagement: pick two at most.
My
response to that reads as, essentially, an earlier version of those design principles I set out
above. Now that I’ve written them up properly, I’d be very interested to hear if fusijui has any further thoughts.
Re: The Index Diachronica
Posted: Sat Nov 16, 2024 11:51 am
by Zju
Re "5. Map transcriptions to IPA.", I presume original notation would be presented in parallel to IPA?
I have one question, though. Say there's e → i / _Cj:
1. Does the language have /ɛ ɔ/ in the first place, or just /e o/?
2. What do we do if phonetic nature of the phonemes is not or hardly discussed?
3. Would sound change sections have some metadata, e.g. how many and what phonemes the language has at the start and end of its historical / reconstructed development?
These are considerations if we want to be able to study the typology of sound change on the basis of the new ID.
Re: The Index Diachronica
Posted: Sat Nov 16, 2024 1:18 pm
by Lērisama
Based on the
examples (there are more, but this is the one I could find quickly, the answer for 3. is yes, a lot of it). Presumably any ambiguities in the source would be discussed in the notes of the mapping to IPA.
Edit: I would be like to help with the production of the index, but I don't currently have enough time to spare to do everything I want to. If I come accross a good one I'd be glad to do it (if I remember), andI'd be happy to review someone else's transcription, or if someone has a paper they don't want to transcribe for whatever reason, I'd gladly do it, but I'm not going to spend my time hunting for papers to transcribe.
Re: The Index Diachronica
Posted: Sat Nov 16, 2024 6:36 pm
by bradrn
Zju wrote: ↑Sat Nov 16, 2024 11:51 am
Re "5. Map transcriptions to IPA.", I presume original notation would be presented in parallel to IPA?
Indeed, this is the whole point. Lērisama linked an early draft, but you can find outputs from the actual project here:
https://bradrn.com/index-diachronica/
I have one question, though. Say there's e → i / _Cj:
1. Does the language have /ɛ ɔ/ in the first place, or just /e o/?
[…]
3. Would sound change sections have some metadata, e.g. how many and what phonemes the language has at the start and end of its historical / reconstructed development?
The database (at least as it is currently designed) does indeed list phoneme inventories, when they are provided by a source. So this information should be easy enough to see.
(This was also a key point of the
article I mentioned above — there is sampling bias in the sound changes reported by the literature. Providing phoneme inventories should help this somewhat, but we should continue thinking about ways to ameliorate the problem.)
2. What do we do if phonetic nature of the phonemes is not or hardly discussed?
I’ve been sort of handling this case in two ways:
- If the transcription seems likely to be IPA (or consistent with IPA), then I’ve just been using it as-is with no special note.
- If the transcription seems non-IPA, I’ve had to guess what the IPA could be. For this case I added a field to the metadata to mark when the IPA is a guess vs being explicitly specified in the source.
These are considerations if we want to be able to study the typology of sound change on the basis of the new ID.
Yes indeed. (For more on this I will refer you to that linked article, which is really very good.)
Re: The Index Diachronica
Posted: Sat Nov 16, 2024 7:58 pm
by Man in Space
Zju pretty much asked the question I was getting at in my earlier post in a better manner than I did.
Re: The Index Diachronica
Posted: Sun Nov 17, 2024 3:59 am
by Lērisama
I somehow missed that. Good to know
I’ve been sort of handling this case in two ways:
- If the transcription seems likely to be IPA (or consistent with IPA), then I’ve just been using it as-is with no special note.
- If the transcription seems non-IPA, I’ve had to guess what the IPA could be. For this case I added a field to the metadata to mark when the IPA is a guess vs being explicitly specified in the source.
This seems unsatisfactory, but is probably the best that can be done. Maybe guesses (not just for IPA values, but sound changes the source notates ambiguously etc.) could be held to a higher standard of review?
Re: The Index Diachronica
Posted: Sun Nov 17, 2024 5:05 am
by bradrn
Lērisama wrote: ↑Sun Nov 17, 2024 3:59 am
I’ve been sort of handling this case in two ways:
- If the transcription seems likely to be IPA (or consistent with IPA), then I’ve just been using it as-is with no special note.
- If the transcription seems non-IPA, I’ve had to guess what the IPA could be. For this case I added a field to the metadata to mark when the IPA is a guess vs being explicitly specified in the source.
This seems unsatisfactory, but is probably the best that can be done. Maybe guesses (not just for IPA values, but sound changes the source notates ambiguously etc.) could be held to a higher standard of review?
Honestly, the whole review system needs further working-out. This could certainly be a part of it.