Phoneme frequency
Phoneme frequency
Zomp clearly thinks that phoneme frequency is something that needs to be considered during the development of a language—he brings it up in the Language Construction Kit (p56) and gen is set up to emphasise it pretty heavily—but I feel like it's the kind of thing that will organically fall out of the process of coining words and creating affixes and such. (Maybe the problem is just I've seen so little data on the subject of phoneme frequency in languages generally.) I know of Pabappa's Poswa as a language that heavily skews its phonology (in favour of labial sounds) for a particular feel, but has anyone else purposefully tailored the frequency of phonemes in their own language? Conversely, what phoneme frequencies fall out of a language which wasn't so tailored?
Re: Phoneme frequency
I tried to do it for Hlʉ̂, which was meant to have lots of /p b/. And, indeed, they were inspired by Pabappa’s languages. But possibly a bigger inspiration was Skou, a natlang allowing sentences such as Ya pe ne wí wí pa ne pang pang pa and Pe pe wa wa po te te báng.
This is the case for most of my languages:Conversely, what phoneme frequencies fall out of a language which wasn't so tailored?
- Eŋes (the most developed) ended up with a lot of /s/, mostly because several very common morphemes happen to start with it (sar- progressive, si- perfective, se- accusative). The 10 most common phonemes in a short text are /s a n i e m w o r ŋ/. (Travis has complained about the nasal distribution.) There are lots of onset clusters, the most common of which seem to be /wr wl ws ls/, though I don’t yet have enough text to rank them.
- Proto-Savanna (ancestor of Eŋes, a little less developed) has a much larger inventory with a quite different frequency ranking, /a i ə ŋ ʔ s u b n w/. I wasn’t expecting /ŋ/ to be so frequent; as with Eŋes /s/ it seems to be due to several frequent phonemes (/tʰaŋ/ singular definite article, /ŋaj/ 3s pronoun, /ŋaŋ/ negative focus). I think it’s also a clear sign that I was making up all the words from scratch, rather than deriving them from something else. Along similar lines it’s possible to see how the Eŋes frequencies evolved from those of Proto-Savanna — vowels were syncopated, and several common /ŋ/ morphemes were simplified or lost.
- I don’t have enough text to do a proper ranking for Wēchizaŋkəŋ. In any case its aesthetic is very heavily influenced by phonological processes which produce lots of lenited /β ɹ ɰ/ ⟨ꞵ th ch⟩. The frequencies would be very different depending on whether you look at the underlying or the surface forms — e.g. Wēchizaŋkəŋ /ˈweːɰiˌzaŋkəŋ/ is underlyingly //iwekizəmkəŋ// (ignoring the consonant mutations).
Conlangs: Scratchpad | Texts | antilanguage
Software: See http://bradrn.com/projects.html
Other: Ergativity for Novices
(Why does phpBB not let me add >5 links here?)
Software: See http://bradrn.com/projects.html
Other: Ergativity for Novices
(Why does phpBB not let me add >5 links here?)
-
- Site Admin
- Posts: 3070
- Joined: Sun Jul 08, 2018 5:46 am
- Location: Right here, probably
- Contact:
Re: Phoneme frequency
Just to clarify, you don't need to do anything; but this is based on my experience with gen: you get a very different feel with different frequencies.Ketsuban wrote: ↑Sat Jan 11, 2025 3:17 am Zomp clearly thinks that phoneme frequency is something that needs to be considered during the development of a language—he brings it up in the Language Construction Kit (p56) and gen is set up to emphasise it pretty heavily—but I feel like it's the kind of thing that will organically fall out of the process of coining words and creating affixes and such.
An example: here's some text from gen with one set of frequencies, then with the same phonemes, but reversing the frequencies:
1. Salo trotra iso akroituko i kuelo. Tuna pao itro ozine sranankam otu. Fesinsa aga teaso sarao nesapra sea? Tota. So ako tau matunfo sea senemuo.
2. U zui zuadada idezugu a rensonzlu. Vunuo iben uganzruu mi buzableglu gezru uizo! Drigu zan vrirlofunze gu vo? O vlusa luullo zilo didii zre iu frulu.
If you do all your words by hand, you may not have to think about frequencies, but you'll do it yourself based on the consonants you prefer. I use gen because (again, in my experience) when I try to be 'random' I end up being quite repetitive.
Re: Phoneme frequency
Apparently natural languages follow a Yule distribution in their phonemes. I like to use Lingweenie's word gen over others for this reason because it uses a distribution similar to Yule (Gusein-Zade) by default.
Re: Phoneme frequency
well, these frequencies, like naturalism in general,
are only the result of significant use,
in time and number of speakers...
by nature every language had a beginning,
it is not abnormal for a young, or isolated language,
to deviate from the norm which is only statistical...
the languages simulated, in fiction,
do not bother with calculations,
just with the ear of their designer,
who knows how to make natural instinctively...
as much to create one's language as it comes,
to accept its abnormalities,
in an imaginary world everything is possible,
in the real world, well, it is its own proof of its possibility...
are only the result of significant use,
in time and number of speakers...
by nature every language had a beginning,
it is not abnormal for a young, or isolated language,
to deviate from the norm which is only statistical...
the languages simulated, in fiction,
do not bother with calculations,
just with the ear of their designer,
who knows how to make natural instinctively...
as much to create one's language as it comes,
to accept its abnormalities,
in an imaginary world everything is possible,
in the real world, well, it is its own proof of its possibility...
Re: Phoneme frequency
I think I agree with xxx (although I'm not 100% sure if I know what he's saying). Sound changes tend to cause mergers and splits that influence phoneme frequency considerably, so I don't think every language would fit a specific kind of distribution. Especially if you only have a small phoneme inventory; I checked consonant frequencies for a few very small-inventoried languages and they were all over the place. And also there's bound to be a significant difference between 'wordlist frequency' (i.e. what you get out of a generator) and 'text frequency' based on chance formation of common morphemes.
Re: Phoneme frequency
I'm not sure i get it... all languages are going to have a certain frequency of each phoneme in general, however you calculate it (what percentage of phoneme X in a given reference corpus, what percentage of phoneme X in a given wordlist, how often it appears in some sample of natural speech, whatever) and if that language experiences this or that sound change, what's going to happen is that said list of frequencies will be some other list: /b/ will go from having a frequency of 0.3011% to 0.4277%, for example, and so on for other phonemes.
like... for any possible language and every possible way to measure the frequency of a phoneme, every phoneme in that language will have a given frequency, right? even though you make the language undergo sound changes, or lexicon changes, or whatever else, all you're doing is changing the frequencies, just like you change the width of a lump of clay while you're working in in those spinning plate pottery things. it's all somewhat arbitrary lines in the sand, i grant, which features we measure and how we measure them but, then again, that's true of... everything.
I've found much the same thing zompist mentions: just coming up with words in one's mind one will get much less diversity, one will cover a much smaller area of the possibility space of "vibes" or "sounds" that can be had for any given set of phonemes, and gen and stuff like that is very useful for sort of visiting chunks of the possibility space that one wouldn't be able to imagine just from the top of one's head. I sometimes write my own little scripts to come up with wordlists that are kinda like gen but add things like say, a rule where phoneme X becomes Y% more likely after phoneme Z, different frequencies in onset vs. coda position, and stuff like that, and it gives you a much broader palette.
like... for any possible language and every possible way to measure the frequency of a phoneme, every phoneme in that language will have a given frequency, right? even though you make the language undergo sound changes, or lexicon changes, or whatever else, all you're doing is changing the frequencies, just like you change the width of a lump of clay while you're working in in those spinning plate pottery things. it's all somewhat arbitrary lines in the sand, i grant, which features we measure and how we measure them but, then again, that's true of... everything.
I've found much the same thing zompist mentions: just coming up with words in one's mind one will get much less diversity, one will cover a much smaller area of the possibility space of "vibes" or "sounds" that can be had for any given set of phonemes, and gen and stuff like that is very useful for sort of visiting chunks of the possibility space that one wouldn't be able to imagine just from the top of one's head. I sometimes write my own little scripts to come up with wordlists that are kinda like gen but add things like say, a rule where phoneme X becomes Y% more likely after phoneme Z, different frequencies in onset vs. coda position, and stuff like that, and it gives you a much broader palette.
Re: Phoneme frequency
the advantage of a language with semantic primitives
is that its frequency of phonemes doesn't depend
on a random turnkey generator,
or arbitrary piecemeal choices,
but on the internal genius of the language,
which self-organizes according to the meanings of the words...
is that its frequency of phonemes doesn't depend
on a random turnkey generator,
or arbitrary piecemeal choices,
but on the internal genius of the language,
which self-organizes according to the meanings of the words...
Re: Phoneme frequency
The key thing is that one may have in one's head an underlying idea of how a language will sound that may not be realized if one just pulls frequencies out of a hat and assigns them to phonemes that are randomly chosen by a word generator.Torco wrote: ↑Mon Jan 13, 2025 11:32 am I've found much the same thing zompist mentions: just coming up with words in one's mind one will get much less diversity, one will cover a much smaller area of the possibility space of "vibes" or "sounds" that can be had for any given set of phonemes, and gen and stuff like that is very useful for sort of visiting chunks of the possibility space that one wouldn't be able to imagine just from the top of one's head. I sometimes write my own little scripts to come up with wordlists that are kinda like gen but add things like say, a rule where phoneme X becomes Y% more likely after phoneme Z, different frequencies in onset vs. coda position, and stuff like that, and it gives you a much broader palette.
Yaaludinuya siima d'at yiseka wohadetafa gaare.
Ennadinut'a gaare d'ate eetatadi siiman.
T'awraa t'awraa t'awraa t'awraa t'awraa t'awraa t'awraa.
Ennadinut'a gaare d'ate eetatadi siiman.
T'awraa t'awraa t'awraa t'awraa t'awraa t'awraa t'awraa.
Re: Phoneme frequency
the more specific the vibe one has in one's mind the harder it is to stumble upon an algorithm that will generate words with that vibe, but i feel as if it is probably true that for any vibe one has in mind there is at least one such algorithm that will generate words that will fit said vibe.
Re: Phoneme frequency
Ancient Mesopotamian language vibe algorithmTorco wrote: ↑Mon Jan 13, 2025 2:06 pm the more specific the vibe one has in one's mind the harder it is to stumble upon an algorithm that will generate words with that vibe, but i feel as if it is probably true that for any vibe one has in mind there is at least one such algorithm that will generate words that will fit said vibe.
Re: Phoneme frequency
Aiming for a natlang as a model, in this case Akkadian I presume, is definitely one way to do it.Ahzoh wrote: ↑Mon Jan 13, 2025 5:38 pmAncient Mesopotamian language vibe algorithmTorco wrote: ↑Mon Jan 13, 2025 2:06 pm the more specific the vibe one has in one's mind the harder it is to stumble upon an algorithm that will generate words with that vibe, but i feel as if it is probably true that for any vibe one has in mind there is at least one such algorithm that will generate words that will fit said vibe.
Yaaludinuya siima d'at yiseka wohadetafa gaare.
Ennadinut'a gaare d'ate eetatadi siiman.
T'awraa t'awraa t'awraa t'awraa t'awraa t'awraa t'awraa.
Ennadinut'a gaare d'ate eetatadi siiman.
T'awraa t'awraa t'awraa t'awraa t'awraa t'awraa t'awraa.
Re: Phoneme frequency
Akkadian is not the only "Mesopotamian language", you have Sumerian, Hittite, Hurrian, Urartian, Elamite, etc. All unrelated languages disparate in relation and time, yet all have the same kind of "vibe", perhaps due to some areal influence.Travis B. wrote: ↑Mon Jan 13, 2025 5:52 pmAiming for a natlang as a model, in this case Akkadian I presume, is definitely one way to do it.Ahzoh wrote: ↑Mon Jan 13, 2025 5:38 pmAncient Mesopotamian language vibe algorithmTorco wrote: ↑Mon Jan 13, 2025 2:06 pm the more specific the vibe one has in one's mind the harder it is to stumble upon an algorithm that will generate words with that vibe, but i feel as if it is probably true that for any vibe one has in mind there is at least one such algorithm that will generate words that will fit said vibe.
Re: Phoneme frequency
Vrkhazian definitely has much more of an Akkadian vibe than a Sumerian vibe or a Hittite vibe to me.
Yaaludinuya siima d'at yiseka wohadetafa gaare.
Ennadinut'a gaare d'ate eetatadi siiman.
T'awraa t'awraa t'awraa t'awraa t'awraa t'awraa t'awraa.
Ennadinut'a gaare d'ate eetatadi siiman.
T'awraa t'awraa t'awraa t'awraa t'awraa t'awraa t'awraa.
Re: Phoneme frequency
Yes, agreed.Travis B. wrote: ↑Mon Jan 13, 2025 10:30 pmVrkhazian definitely has much more of an Akkadian vibe than a Sumerian vibe or a Hittite vibe to me.
In general, I think these languages end up looking more similar than they actually are, because (a) cuneiform doesn’t preserve phonemic distinctions very well and (b) they all use very similar transcription conventions when written in Latin. There are clear differences, but you have to look more closely to see them.
Conlangs: Scratchpad | Texts | antilanguage
Software: See http://bradrn.com/projects.html
Other: Ergativity for Novices
(Why does phpBB not let me add >5 links here?)
Software: See http://bradrn.com/projects.html
Other: Ergativity for Novices
(Why does phpBB not let me add >5 links here?)
Re: Phoneme frequency
I can tell Akkadian and Sumerian as transcribed in Latin script apart from just glancing at them. Transcribed Sumerograms in the middle of otherwise transcribed-Akkadian text look completely out of place, and would even if they weren't written in all uppercase.bradrn wrote: ↑Mon Jan 13, 2025 10:43 pmYes, agreed.
In general, I think these languages end up looking more similar than they actually are, because (a) cuneiform doesn’t preserve phonemic distinctions very well and (b) they all use very similar transcription conventions when written in Latin. There are clear differences, but you have to look more closely to see them.
Yaaludinuya siima d'at yiseka wohadetafa gaare.
Ennadinut'a gaare d'ate eetatadi siiman.
T'awraa t'awraa t'awraa t'awraa t'awraa t'awraa t'awraa.
Ennadinut'a gaare d'ate eetatadi siiman.
T'awraa t'awraa t'awraa t'awraa t'awraa t'awraa t'awraa.
Re: Phoneme frequency
relevantAhzoh wrote: ↑Mon Jan 13, 2025 5:38 pmAncient Mesopotamian language vibe algorithmTorco wrote: ↑Mon Jan 13, 2025 2:06 pm the more specific the vibe one has in one's mind the harder it is to stumble upon an algorithm that will generate words with that vibe, but i feel as if it is probably true that for any vibe one has in mind there is at least one such algorithm that will generate words that will fit said vibe.
Re: Phoneme frequency
There are clear differences but also clear similarities. The languages of that region all have very similar phonologies and it's not just an impression made by cuneiform's limited ability to represent phonemes (the writers of the various languages are very creative) with only a few outlier phonemes like Sumerian's velar nasal.bradrn wrote: ↑Mon Jan 13, 2025 10:43 pmYes, agreed.
In general, I think these languages end up looking more similar than they actually are, because (a) cuneiform doesn’t preserve phonemic distinctions very well and (b) they all use very similar transcription conventions when written in Latin. There are clear differences, but you have to look more closely to see them.
They also have similar phonotactics and "word shape": outside of the Indo-European languages, they don't like complex onset clusters or coda clusters and they don't tend to have vowel hiatus or it is limited (e.g. not like Hawaiian which can have three or more vowels in hiatus in a row). They do, however, really really like gemination, some (Hurrian) more than others.
Amd grammatically, the languages all tend to be very agglutinating, though more in the realm of what they do with verbs than what they do with nouns (that is, they don't do the kind of noun derivation morpheme stacking you get with Turkish). Suffixaufnahme is also common. Should also mention that a lot of these languages display some degree of "ergativeness", some being more ergative-absolutive leaning (like Sumerian and Hurro-Urartian) or more split-ergative (like Hittite).
Imterestingly, Urartian is the only non-Semitic language with so-called "emphatic consonants" though the Hurro-Urartian family is also speculated to be related to the Caucasian languages, so that probably tracks.
Last edited by Ahzoh on Tue Jan 14, 2025 2:08 pm, edited 1 time in total.
Re: Phoneme frequency
Remember, though, that Semitic "emphatic" consonants originated as simple ejectives, as still reflected by South Semitic, which are by no means unique to Semitic. Also, a similar outcome of original ejectives has happened elsewhere, e.g. in the Berber languages.
Yaaludinuya siima d'at yiseka wohadetafa gaare.
Ennadinut'a gaare d'ate eetatadi siiman.
T'awraa t'awraa t'awraa t'awraa t'awraa t'awraa t'awraa.
Ennadinut'a gaare d'ate eetatadi siiman.
T'awraa t'awraa t'awraa t'awraa t'awraa t'awraa t'awraa.
Re: Phoneme frequency
They are called emphatic because they often don't stay ejectives. All we can say is that the speakers distinguished them from plain and voiced counterparts.Travis B. wrote: ↑Tue Jan 14, 2025 2:02 pmRemember, though, that Semitic "emphatic" consonants originated as simple ejectives, as still reflected by South Semitic, which are by no means unique to Semitic. Also, a similar outcome of original ejectives has happened elsewhere, e.g. in the Berber languages.
And in this case, the nature of Urartians "emphatic" series (likely borrowed terminology) are unknown just like Akkadian's emphatics, so it can't be said if they're ejectives exactly.