Definition. The complexity of a language on a given sentence (let’s denote it as χ) is defined by finding each proposition which could be associated with that sentence, getting the quantum number for each such proposition, and then taking the difference between the highest and lowest calculated quantum numbers.
I think this is reasonable, but we can use cross-linguistic analysis to help us define our "minimal" number of quanta, as in my examples below. This whole discussion started as a question of whether it's actually more complex to learn Cherokee than Spanish, or if it merely
feels that way. So if we compare Spanish, Cherokee, and many other languages, in large passages, we should be able to quantify how many units of information must be included in Spanish or Cherokee, above the number of pieces of information that are common to all or nearly all control languages. These quanta of information would include mandatory morphological marking, marked syntactic structures, suppletion, slang, and anything else that would require its own discrete lexical construction in the mind of the speaker in order to be used correctly.
Let's take three examples:
Korean: gwisin i mwusewe [ghost subj frightening] (This is the most natural translation of "I am afraid of ghosts," and does not necessarily mean "Ghosts are scary in general.")
Choctaw: shilop i~ mahlatalih [ghost 3rd-afraid-1st]
Mandarin: wo3 pa4 gui3 [1st afraid ghost]
Both Korean and Choctaw have obligatory tense, seen here more or less as null suffixes, but this is absent in Mandarin (though of course all languages have the option to elaborate on time). Mandarin and Choctaw require the first person be overtly marked, but Korean lets it be implied (the assumption that emotion verbs with no overt subject marking refer to the speaker is common cross linguistically, and even shows up in English). Korean requires that “ghost” take argument marking, while in Choctaw it only needs a third person agreement prefix on the verb, and in Mandarin it shows up with no overt marking beyond its syntactic location to the right of the verb. In all three of these languages the only commonality is that “ghost” and “scared” appear, along with whatever syntactic alignment is the default. (Once again I am treating word-choice as one quantum. In a language with fifty words for fear, each with its own default syntactic alignment, we may need to unpack the semantic aspect of complexity further. But for now I'm focusing on morpho-syntax).
So does “scared of ghosts” inherently require person marking for the subject? Does it inherently require overt tense? Does it inherently require an adposition to specify the syntactic role of “ghost?” I would suggest that, at a fundamental level, it does not. The upshot of this is that person marking, argument marking, and tense all get thrown into one of those two buckets marked “redundancy” and “irregularity.” Also important: note that the three examples each have relatively similar amounts of redundancy/irregularity, despite having radically different amounts of “morphology” in most typological analyses.