bradrn wrote: ↑Sun Nov 10, 2024 7:20 pm
In other news: in the background, myself and Man in Space have been talking to some linguists who are very enthusiastic about having a sound change database for their own purposes. We haven’t agreed on very much yet, but hopefully we can put together a design which will be useful for everyone involved.
This has progressed! Today we held an initial online meeting to discuss the design of a future sound change database (whether a direct successor of the
ID, or of some other design). On our side, myself and Man in Space were there; the other attendees were Alex François (who I mentioned earlier), as well as Charles Zhang, a student whose research involves sound change simulation.
The meeting was long and productive, and covered a lot of ground. But to me the central question which came out of the meeting was one of
empiricalism: to what extent should the database be grounded in solidly attested linguistic data, as opposed to speculation?
Our current approach is determinedly anti-empirical: aside from a small amount of directly attested linguistic history (from Romance and suchlike), all our data comes from reconstructions and the comparative method. Or, to put it another way, it’s all just the personal opinions of the linguist(s) who happened to write the articles we use. We’ve contemplated working on our own reconstructions, but similarly that would just our own opinion. We can work to make our database an
accurate reflection of the literature, but by its nature, it would still be just a pile of speculations.
By contrast, Alex suggested a fundamentally different approach, based on his
EvoSem database (which is genuinely really useful, go check it out!). In brief, the idea would be to take
synchronic cognate sets, and compile sets of phoneme correspondences from those, completely ignoring how some linguist or another may have reconstructed the original situation. Of course, this would lose a large amount of information about the precise nature of diachronic sound change. In exchange, we get far more raw data, including from language families where reconstructions are poor or absent. And of course, that data would be far more reliable and empirically grounded.
The question is then: which of these approaches would be more useful for actual linguists? Perhaps a pile of speculative reconstructions would be intrinsically unreliable, and thus not as helpful as we’re assuming it would be. Conversely, maybe a purely empirical approach would lose too much vital information. Of course, the possibilities are not either/or. We could combine both datasets in one database, for instance. Or there’s Charles’s suggestion that sound changes could be coded with their degree of confidence (from ‘directly attested’ to ‘completely speculative’). And, of course, there could be other possibilities entirely which we haven’t thought of.
So — what do you all think about this question? Is there anything that we’ve missed here?