Linguistic Miscellany Thread
Re: Linguistic Miscellany Thread
The same story also appears in Isaiah (chapters 36 to 39).
-
- Site Admin
- Posts: 2944
- Joined: Sun Jul 08, 2018 5:46 am
- Location: Right here, probably
- Contact:
Re: Linguistic Miscellany Thread
Ah right, thanks. I should have checked!
Re: Linguistic Miscellany Thread
I got the story wrong, btw -- I didn't remember it very well.
Hezekiah's emissaries want to conduct the negotiations in Aramaic, but the Assyrian envoy insisted on Hebrew -- he wanted to deliver demoralizing propaganda.
Hezekiah's emissaries want to conduct the negotiations in Aramaic, but the Assyrian envoy insisted on Hebrew -- he wanted to deliver demoralizing propaganda.
It seems the Assyrians really were assholes. The Persians had many faults, but they did make an effort to be decent rulers. By contrast the Assyrians were more destructive.
Last edited by Ares Land on Mon Jul 25, 2022 3:16 am, edited 1 time in total.
-
- Site Admin
- Posts: 2944
- Joined: Sun Jul 08, 2018 5:46 am
- Location: Right here, probably
- Contact:
Re: Linguistic Miscellany Thread
Oh, no question about that. They regularly deported whole populations (Israel was only one instance); the kings boasted about how they terrorized everybody; they had no pretense of being benign overlords. (And it wasn't that everyone was like that-- the Kassites, by contrast, were pretty chill.)
-
- Posts: 1746
- Joined: Fri Aug 24, 2018 2:12 am
Re: Linguistic Miscellany Thread
More like Ass-yrians.
I did it. I made the world's worst book review blog.
Re: Linguistic Miscellany Thread
Why is there such a long and ignoble tradition of computer systems turning umlauts, letters with accents, and the like into completely different special characters seemingly at random?
Re: Linguistic Miscellany Thread
Is that a rhetorical question? If it's not, I can start a rant on character encodings.
Re: Linguistic Miscellany Thread
It's not a rhetorical question, and feel free to rant about character encodings as much as you like.
Re: Linguistic Miscellany Thread
It's called that encodings other than UTF-8 have failed to be completely phased out throughout the entire computing world, so programs still get confused about encodings every so often.
Yaaludinuya siima d'at yiseka wohadetafa gaare.
Ennadinut'a gaare d'ate eetatadi siiman.
T'awraa t'awraa t'awraa t'awraa t'awraa t'awraa t'awraa.
Ennadinut'a gaare d'ate eetatadi siiman.
T'awraa t'awraa t'awraa t'awraa t'awraa t'awraa t'awraa.
Re: Linguistic Miscellany Thread
The short version is: because most computer stuff has been developed in the U.S.A., and people from the U.S.A. tend to forget that there are letters beyond the basic Latin alphabet.
The long version... would take several pages, and references. Maybe some other day.
The middle version
Text always has to be encoded: the letters have to be turned into bits, sequences of zeros and ones. The oldest encoding still in active use is ASCII, which covers the basic Latin alphabet, digits from 0 to 9, parentheses, brackets, and basic punctuation. This was enough for English. ASCII uses 7 bits (it was developed for teletype, not computers), which means 128 characters. As computers usually group bits by groups of 8, this means one bit, the first one, was left unused.
When non-Anglophones started using computers, they noticed that they needed letters that weren't in ASCII: accents, umlauts, or even non-Latin alphabets. Since they wanted to keep the compatibility with ASCII, they invented various forms of Extended ASCII. The point is that:
The problem is that everybody went into a different direction. Windows-1252 and Mac OS Roman have about the same characters as Latin-1, but not in the same order. And there is no reliable way to say this text uses encoding X and be sure that the program that decodes your text knows it is in encoding X. So what happens is that someone wrote a text with the letter "ö" and encoded it as Windows-1252... but you read it on a Mac, your Mac thinks the text is in Mac OS Roman, and displays "^" instead. Since "normal" letters are part of ASCII and everybody is compatible with ASCII, there is no problem with them: only letters with umlauts get garbled.
The long version... would take several pages, and references. Maybe some other day.
The middle version
Text always has to be encoded: the letters have to be turned into bits, sequences of zeros and ones. The oldest encoding still in active use is ASCII, which covers the basic Latin alphabet, digits from 0 to 9, parentheses, brackets, and basic punctuation. This was enough for English. ASCII uses 7 bits (it was developed for teletype, not computers), which means 128 characters. As computers usually group bits by groups of 8, this means one bit, the first one, was left unused.
When non-Anglophones started using computers, they noticed that they needed letters that weren't in ASCII: accents, umlauts, or even non-Latin alphabets. Since they wanted to keep the compatibility with ASCII, they invented various forms of Extended ASCII. The point is that:
- Bytes with the first bit equal to zero are the same as ASCII.
- The other bytes (128 values with the first bit equal to one) are other characters: letters with accents or umlauts, non-Latin letters, etc.
The problem is that everybody went into a different direction. Windows-1252 and Mac OS Roman have about the same characters as Latin-1, but not in the same order. And there is no reliable way to say this text uses encoding X and be sure that the program that decodes your text knows it is in encoding X. So what happens is that someone wrote a text with the letter "ö" and encoded it as Windows-1252... but you read it on a Mac, your Mac thinks the text is in Mac OS Roman, and displays "^" instead. Since "normal" letters are part of ASCII and everybody is compatible with ASCII, there is no problem with them: only letters with umlauts get garbled.
Re: Linguistic Miscellany Thread
The solution is Unicode: the One Standard to rule them all. Unicode is a collection of codes for almost every character under the Sun, at least the ones that are used in a real language, or have been used in some computer system.
The most widely used encoding for Unicode is UTF-8, which is a variable-length encoding. Some characters use only one byte: they are... well, precisely the characters from ASCII. Retro-compatibility is important. Other characters use two, three, or four bytes. With modern computers, the added space doesn't really matter.
UTF-8 is the future. There are some legitimate criticisms of Unicode, but nothing even comes close to it. UTF-8 allows you to encode everything: even if conlangs aren't supported by default, there are ways to add a conscript locally.
The problem is... not everything is up to speed. Sometimes a program still encodes text in Latin-1; sometimes a program receives text in UTF-8 but thinks it's in Windows-1252. So you get text where the "normal" Latin letters are correct but the umlauts are garbled. And that's a best-case scenario: texts in non-Latin scripts would be completely garbled.
For some applications (file names, email addresses and titles), I tend to err on the side of caution and limit myself to strict-ASCII characters. You never know.
The most widely used encoding for Unicode is UTF-8, which is a variable-length encoding. Some characters use only one byte: they are... well, precisely the characters from ASCII. Retro-compatibility is important. Other characters use two, three, or four bytes. With modern computers, the added space doesn't really matter.
UTF-8 is the future. There are some legitimate criticisms of Unicode, but nothing even comes close to it. UTF-8 allows you to encode everything: even if conlangs aren't supported by default, there are ways to add a conscript locally.
The problem is... not everything is up to speed. Sometimes a program still encodes text in Latin-1; sometimes a program receives text in UTF-8 but thinks it's in Windows-1252. So you get text where the "normal" Latin letters are correct but the umlauts are garbled. And that's a best-case scenario: texts in non-Latin scripts would be completely garbled.
For some applications (file names, email addresses and titles), I tend to err on the side of caution and limit myself to strict-ASCII characters. You never know.
Re: Linguistic Miscellany Thread
Thank you, very informative!
Edit: Is it the norm to write "USA" as "U.S.A." in French?
Edit: Is it the norm to write "USA" as "U.S.A." in French?
Re: Linguistic Miscellany Thread
I should note that back in the day when people used Teletypes, accented characters were generated by overprinting, i.e. a letter would be printed followed by a backspace character followed by an overprinted character. This is the real reason ASCII includes characters such as backticks and carets, for printing grave and circumflex characters. Of course this quickly became obsolete with the spread of video terminals.
Yaaludinuya siima d'at yiseka wohadetafa gaare.
Ennadinut'a gaare d'ate eetatadi siiman.
T'awraa t'awraa t'awraa t'awraa t'awraa t'awraa t'awraa.
Ennadinut'a gaare d'ate eetatadi siiman.
T'awraa t'awraa t'awraa t'awraa t'awraa t'awraa t'awraa.
-
- Posts: 1746
- Joined: Fri Aug 24, 2018 2:12 am
Re: Linguistic Miscellany Thread
I'm tyring to learn about the pre-Hangeul writing systems of Korea, especially Idu and Hyangchal. But I'm struggling to find anything that lays out everything we know about these systems. Wikipedia links go to generic survey textbooks that say nothing. Academia.org has nothing. Google books has nothing. Somebody somewhere must have written about these systems in detail, in any language, but I can't find it. Any ideas?
I did it. I made the world's worst book review blog.
Re: Linguistic Miscellany Thread
I can't say I've heard of either of them, but I'll search through my library's copy of The World's Writing Systems next time I'm there.Moose-tache wrote: ↑Tue Aug 09, 2022 2:24 am I'm tyring to learn about the pre-Hangeul writing systems of Korea, especially Idu and Hyangchal. But I'm struggling to find anything that lays out everything we know about these systems. Wikipedia links go to generic survey textbooks that say nothing. Academia.org has nothing. Google books has nothing. Somebody somewhere must have written about these systems in detail, in any language, but I can't find it. Any ideas?
Happy hunting!
-
- Site Admin
- Posts: 2944
- Joined: Sun Jul 08, 2018 5:46 am
- Location: Right here, probably
- Contact:
Re: Linguistic Miscellany Thread
I'll save you the trip: there's a single paragraph about pre-Hankul systems.
No details or pictures. And it sure sounds like one of the uses of Kwukyel above was intended to refer to something else.Ross King in WWS wrote:The Hyangchal system, preserved in lyric texts, is reminiscent in some ways of the Japanese man'yogana, on which it doubtless had a formative influence. The abbreviated characters of the Kwukyel system, a transcription for interpretation and translation of Chinese texts, resemble the Japanese kana in some way, just as the Kwukyel sytem or annotating Chinese texts resembles Japanese kambun traditions. The Itwu 'clerk readings' were a system of prose transcription used widely in administrative contexts. At the time of the promulgation of the Hwunmin cengum (1446), the Hyangchal system was moribund, but Kwukyel and Itwu were still in use long after the invention of the Korean alphabet.
-
- Posts: 1746
- Joined: Fri Aug 24, 2018 2:12 am
Re: Linguistic Miscellany Thread
Hyangchal is basically the precursor to Manyogana. Idu is a slightly different set of phonograms, but operates on the same principles. Gugyeol is a notation system based on this Idu character set. The gugyeol set is very limited, since it's just a series of marks made to Chinese texts to allow them to be read with Korean grammatical information (kind of like an ancient Roman adding little case endings to French words so the sentence makes sense). But Hyangchal and Idu had hundreds of phonograms, and there doesn't appear to be a master list anywhere. Introductory texts in English and Korean from the last fifty years all fail to do this. For writing systems that were in continual use for about 1500 years, this is pretty shocking.
I did it. I made the world's worst book review blog.
-
- Posts: 1307
- Joined: Mon Jul 09, 2018 4:19 pm
Re: Linguistic Miscellany Thread
I asked this around and someone from Korea gave me this link, which apparently contains... a lot of stuff about Joseon-period Idu (late Middle Korean ~ early modern Korean).Moose-tache wrote: ↑Tue Aug 09, 2022 2:24 am I'm tyring to learn about the pre-Hangeul writing systems of Korea, especially Idu and Hyangchal. But I'm struggling to find anything that lays out everything we know about these systems. Wikipedia links go to generic survey textbooks that say nothing. Academia.org has nothing. Google books has nothing. Somebody somewhere must have written about these systems in detail, in any language, but I can't find it. Any ideas?
https://kostma.aks.ac.kr/dic/dicMain.aspx?mT=C
However, they would also like to warn that:
They'd also like to point out that content words were written semantically, and it's the syllables of grammatical morphemes that were rather written with Chinese characters phonetically. It's not like you could write Middle Korean using Chinese characters phonetically only, the way the Japanese did with kana.this isnt exactly accurate
like theres spelling errors everywhere with middle korean words
but thats what i usually look up
when i need to translate idu
[...]
i mean its enough for me
like the spelling errors are like
confusing arae a and normal a
which is understandable for modern speakers because they dont have that distinctuon anymore
EDIT: someone else later added:
btw in hyangchal some hanja ARE used phonetically in spelling root words, but only usually to represent the final consonant or syllable
and i don't know of a full list of those, although wikipedia has something on their old korean article iirc
if you can find 향가 해독 자료집 it would probably help