I've done some Googling and found a
relevant study. I'll quote the interesting bits here. The introduction:
F1 correlates with vowel height: the higher the vowel, the lower the F1. A number of studies have found a positive correlation between F1 and duration in vowels in various languages, such as English (Heffner, 1937; House & Fairbanks, 1953; Peterson & Lehiste, 1960; Scharf, 1962), German (Meyer, 1904; Fischer-Jørgensen, 1940; Maack, 1949), Swedish (Elert, 1964), Inari Saami (Äimä, 1918; Stone, ¨ 2014), Thai (Abramson, 1962), and Spanish (Navarro Tomas, 1916). In other words, high vowels are shorter than low vowels. This paper revisits that generalization, as well as the question of whether the generalization is phonetic (mechanical, extrinsic) or phonological (controlled, intrinsic).
The traditional explanation for the positive correlation between F1 and duration appeals to physiology: low vowels take longer to produce because of the extra time it takes for the jaw to open (e.g., Lehiste 1970:19, Lehnert-LeHouillier 2007:80), or because the jaw position of high vowels is close to the jaw position held during the production of most consonants (Catford 1977:197; Maddieson 1997; Gussenhoven 2007). An alternative explanation is that each vowel has a phonologized duration target. Several arguments have been put forward for this view. Lisker (1974) points out that if low vowels are longer because of the time it takes the jaw to move, we would expect the onset and offset formant movements towards low vowels to be longer, not the steady-state of low vowels. However, the steady state is in fact remarkably long (Lehiste & Peterson, 1961). In addition, Tauberer & Evanini (2009) found that duration does not increase as vowels are lowered in language change. Finally, Sole & Ohala (2010) present data from Japanese, Catalan and English, where they investigate the effect of speech rate on vowel duration. For Catalan and English, they find that the duration differences have different size depending on speech rates. This is not expected if the effect is solely mechanical. The results differed for Japanese, where they found a constant change in duration as the speech rate changed. Sole & Ohala (2010) conclude that the positive correlation between vowel duration and F1 is controlled (phonological or high-level phonetic) in English and Catalan, but mechanical (low-level phonetic) in Japanese.
If the duration of vowels depends directly on how much the jaw moves, we would expect a positive correlation within categories as well as between categories. In other words, multiple tokens of the same vowel (e.g., [ɪ]) would be expected to display a correlation similar to the correlation between vowels; that is, a slightly lower pronunciation of a given vowel should be slightly longer.
We investigate the vowel duration and height between and within categories in English and Swedish, using F1 as a measure of vowel height. The between-category investigation confirms previous studies: high vowels are shorter than low vowels. However, we did not find the same correlation within categories: a higher instance of the vowel [ɪ] is not shorter than a lower instance of [ɪ].
The conclusion:
Overall, the conclusions of our studies are clear: higher vowels are generally shorter than lower vowels, as has previously been found. However, two different realizations of the same vowel do not display the same tendency. In other words, the within-category results do not mirror the between category results; at least not in English and Swedish.
...
We interpret these data as problematic for the physiological explanation on intrinsic vowel duration. If the effect is purely mechanical, we would expect to find it within as well as between categories. ... If the jaw opening explanation is incorrect, what explains the universal tendency for high vowels to be shorter than low vowel? Sole & Ohala (2010:647) note: “Duration is one of several distinctive manifestations of vowel identity, but the specific durational targets of vowels may have originated in biomechanical differences. The cross-linguistic tendency then has a physiological explanation, but this tendency has been phonologized in some languages and not others.” This reasoning makes sense to us.
However, the results of this study (and also the results of Lisker (1974), Tauberer & Evanini (2009), and Sole & Ohala (2010), mentioned in the introdution) suggest that a straightforward production-based analysis is problematic, at least for Swedish and English. Perhaps a perception-based analysis is possible? Perhaps high vowels somehow “sound” shorter than low vowels and this gets phonologized? This is still an open question, but the results of Gussenhoven (2007) instead seem to suggest that high vowels in fact sound longer than short vowels.