Page 1 of 1
Frequencies of grammatical categories
Posted: Sat Sep 12, 2020 11:31 am
by alice
Has anyone investigated the relative frequencies of, for example, specific noun cases or verb persons? I know that the dative case and the 1pl are rare compared to the others, but is anythng known in more detail?
Re: Frequencies of grammatical categories
Posted: Sat Sep 12, 2020 3:00 pm
by KathTheDragon
In what language(s)? Why should the results be entirely comparable?
Re: Frequencies of grammatical categories
Posted: Sat Sep 12, 2020 3:42 pm
by Kuchigakatai
KathTheDragon wrote: ↑Sat Sep 12, 2020 3:00 pmIn what language(s)? Why should the results be entirely comparable?
It sounds like a pretty interesting thing to do if you compare languages with somewhat similar systems, say, German's nominative/accusative/dative (plus an obsolescent genitive, largely being replaced by prepositions that take the dative) and modern Greek's nominative/accusative/genitive/vocative (whose genitive covers dative-like functions too). Or Old French's nominative/oblique vs. Classical Arabic's nominative/accusative/genitive (where the accusative, which has adverbial functions, and the genitive, which is used with almost all prepositions, together kind of correspond to the Old French oblique).
Ocurrences of person agreement would be even easier to compare, even if some languages would have very obvious skewings (like neutral/colloquial-register French and its high use of the 1PL construction (which uses 3SG verb forms,
on fait) as a passive voice, or the limited use of 3PL as an impersonal passive in Spanish.)
I find that a lot of these statistical questions are surprisingly understudied. Some 6 or 7 years ago we had a thread here in which Nortaneous and I tried to find studies of actual frequencies of
phonemes (not letters in spellings, which is what most related studies are about) in both real texts and dictionary lemmata in any given languages, of which we only found about six studies. Which were still interesting! Like how the top two vowels, typically something like [a] and [e], eclipse other vowels by a very wide margin. Or coronals being more common overall, by a certain margin I don't remember. (The thread was sadly pruned as part of the regular pruning of one-year-old threads that the L&L the forum used to have...)
Re: Frequencies of grammatical categories
Posted: Sat Sep 12, 2020 7:38 pm
by bradrn
What a coincidence! I was
asking about this topic just yesterday. (Specifically, I was asking about the frequencies of different types of SVC.)
Re: Frequencies of grammatical categories
Posted: Sat Sep 12, 2020 7:59 pm
by Richard W
For the persons, a lot will depend on the corpus. For example, in an encyclopaedia, the first person will be very rare, and the second person also. If the corpus consists of conversations, the first person is likely to be very common. In a book with a lot of dialogue, the 3s, 1s and 3p will be common (probably in that order), and the 1p and 2p will be almost nowhere. The 2s may be quite infrequent.
Re: Frequencies of grammatical categories
Posted: Sun Sep 13, 2020 4:55 am
by KathTheDragon
Ser wrote: ↑Sat Sep 12, 2020 3:42 pm
KathTheDragon wrote: ↑Sat Sep 12, 2020 3:00 pmIn what language(s)? Why should the results be entirely comparable?
It sounds like a pretty interesting thing to do if you compare languages with somewhat similar systems, say, German's nominative/accusative/dative (plus an obsolescent genitive, largely being replaced by prepositions that take the dative) and modern Greek's nominative/accusative/genitive/vocative (whose genitive covers dative-like functions too). Or Old French's nominative/oblique vs. Classical Arabic's nominative/accusative/genitive (where the accusative, which has adverbial functions, and the genitive, which is used with almost all prepositions, together kind of correspond to the Old French oblique).
Well, that's what I'm getting at. How d'you know if case X in language A is comparable to case Y in language B? If they happen to be closely related, you may be in luck, but most languages aren't.
A better approach may be to decompose cases into the "microfunctions" they're made up of, and count those.