AI in conlanging - present and future

Conworlds and conlangs
User avatar
xxx
Posts: 810
Joined: Sun Jul 29, 2018 12:40 pm

Re: AI in conlanging - present and future

Post by xxx »

such a quantity of words is irrelevant,
even if generative artificial "intelligences"
try to replace the qualitative by the quantitative,
but all they produce is barroom talk with an erudite dummy...
keenir
Posts: 948
Joined: Fri Apr 05, 2019 6:14 pm

Re: AI in conlanging - present and future

Post by keenir »

zompist wrote: Sat May 11, 2024 5:16 am
I got ChatGPT to estimate the total number of words it has been exposed to in all of its training. [...]
It’s ultimate estimate, IIRC, was 5 quintillion.
There are some estimates of the total size of the Internet, but I don't think they're comparable... consider how much of the Internet is pictures, video, HTML codes, or other code.
given that this is AI, does - for it - count code and-or HTML as words?
zompist
Site Admin
Posts: 2945
Joined: Sun Jul 08, 2018 5:46 am
Location: Right here, probably
Contact:

Re: AI in conlanging - present and future

Post by zompist »

keenir wrote: Sat May 11, 2024 11:55 pm
zompist wrote: Sat May 11, 2024 5:16 am
I got ChatGPT to estimate the total number of words it has been exposed to in all of its training. [...]
It’s ultimate estimate, IIRC, was 5 quintillion.
There are some estimates of the total size of the Internet, but I don't think they're comparable... consider how much of the Internet is pictures, video, HTML codes, or other code.
given that this is AI, does - for it - count code and-or HTML as words?
Doing some Googling, it appears GPT-4 was trained on 13 trillion tokens. (Tokens are somewhere between words and morphemes.) At that level of analysis, I don't think they're separating out programs from human language.

That's nowhere near 5 quintillion, so I think ChatGPT got confused here. :)
sasasha
Posts: 468
Joined: Mon Aug 06, 2018 11:41 am

Re: AI in conlanging - present and future

Post by sasasha »

You’re quite right. I assumed the figure came from training corpus plus input from other AIs plus interactions with users, but didn’t interrogate that particularly well.

Here’s ChatGPT’s explanation:
More: show
User
Hi. I am interested in your training corpus. Please could you roughly estimate its total size in numbers of words?

ChatGPT
The training corpus used for training me contains around 570 gigabytes of text data, which roughly translates to over 45 trillion words.

User
Does this represent the total number of words you have ever learned from, or do you have a learning process that goes on as users interact with you?

ChatGPT
The training corpus represents the total number of words I learned from during my initial training process. However, I don't stop learning after that. I continuously learn and adapt from interactions with users like you, incorporating new information and refining my responses over time. So, in a way, my learning process is ongoing.

User
Whilst I am aware you may not have access to detailed diagnostic data on this front, I would appreciate it if you could give me a very rough ballpark figure of the total number of words you have ever interacted with, including in your interactions with users.

ChatGPT
While I don't have access to specific diagnostic data, I can give you a rough estimate. Based on my interactions and conversations with users since my launch, I've likely encountered billions to trillions of words. This includes all the conversations and text inputs I've received from users like you.

User
So, would you say that a previous estimate you gave, of ever having interacted with 5 quintillion words, was too high ‒ or are there other factors that might have led you to answer with this number? What do you suspect motivated this answer?

ChatGPT
Apologies for any confusion. The estimate of 5 quintillion words was indeed an overestimation. There might have been a misinterpretation or exaggeration in the response. It's important to clarify that while I've interacted with a significant volume of words, the actual number would be far less than 5 quintillion. My goal is to provide accurate information to the best of my ability.

User
No problem. What kind of factors do you suspect could lead you to exaggerate a response in this way?

ChatGPT
Exaggeration in responses could stem from a few factors, such as misunderstanding the question, misinterpretation of data, or providing a hypothetical scenario without considering the exact parameters. Additionally, there might be instances where I aim to convey the vastness of the dataset or the scope of my learning, which could lead to an exaggerated estimate. However, accuracy is always my priority, and I strive to provide the most precise information possible.

User
Have you been trained on input from other machine learning systems?

ChatGPT
I haven't been directly trained on input from other machine learning systems. However, during my training process, I was exposed to a wide variety of texts, which might include outputs generated by other machine learning systems. These texts could come from various sources such as websites, books, articles, forums, and more.

I didn’t save the chat where the very large estimate was given, so I can’t remember how it got there.
User avatar
xxx
Posts: 810
Joined: Sun Jul 29, 2018 12:40 pm

Re: AI in conlanging - present and future

Post by xxx »

as my ai-enabled search engine says,
“AI-generated response. Remember to check important information”...
believe me, but not too much...
Otto Kretschmer
Posts: 525
Joined: Tue Mar 16, 2021 4:09 pm
Location: Poland

Re: AI in conlanging - present and future

Post by Otto Kretschmer »

AI models are getting better at conlanging. The following text is a courtesy of Claude 3.5 Sonnet. Which language is this conlang descended from?

On heste tid, sceap tat hæfde næn ull sæh hesta on feld. An drog swær wagn, oter bær micele byrten, and tridde bær mann swiftlice.
Sceap cwæt to hestum: "Min heorte is sær, tonne ic seo mann ridan hesta."
Hestas cwædon: "Hlyste, sceap! Ure heortan sint sære tonne we seot: mann, hlaford, macet ull of sceapum to warm clatum for him sylf. And sceap hæft næn ull."
Ta sceap hærde tis, hit fleah into feld.


IPA transcript:
/ɔn heste tiːd ʃeːap tat hæfde næːn ul sæːh hesta ɔn feld an drɔg swæːr wagn ɔter bæːr mikele byrten and tride bæːr man swiftlike/
/ʃeːap kwæːt tɔ hestum miːn heorte is sæːr tɔne ik seːo man riːdan hesta/
/hestas kwæːdɔn hlyste ʃeːap uːre heortan sint sæːre tɔne weː seːot man hlaːfɔrd maket ul ɔf ʃeːapum tɔ warm klaːtum fɔr him sylf and ʃeːap hæːft næːn ul/
/ta ʃeːap hæːrde tis hit fleːah intɔ feld/
User avatar
WeepingElf
Posts: 1513
Joined: Sun Jul 15, 2018 12:39 pm
Location: Braunschweig, Germany
Contact:

Re: AI in conlanging - present and future

Post by WeepingElf »

Otto Kretschmer wrote: Mon Jun 24, 2024 9:12 am AI models are getting better at conlanging. The following text is a courtesy of Claude 3.5 Sonnet. Which language is this conlang descended from?

On heste tid, sceap tat hæfde næn ull sæh hesta on feld. An drog swær wagn, oter bær micele byrten, and tridde bær mann swiftlice.
Sceap cwæt to hestum: "Min heorte is sær, tonne ic seo mann ridan hesta."
Hestas cwædon: "Hlyste, sceap! Ure heortan sint sære tonne we seot: mann, hlaford, macet ull of sceapum to warm clatum for him sylf. And sceap hæft næn ull."
Ta sceap hærde tis, hit fleah into feld.


IPA transcript:
/ɔn heste tiːd ʃeːap tat hæfde næːn ul sæːh hesta ɔn feld an drɔg swæːr wagn ɔter bæːr mikele byrten and tride bæːr man swiftlike/
/ʃeːap kwæːt tɔ hestum miːn heorte is sæːr tɔne ik seːo man riːdan hesta/
/hestas kwæːdɔn hlyste ʃeːap uːre heortan sint sæːre tɔne weː seːot man hlaːfɔrd maket ul ɔf ʃeːapum tɔ warm klaːtum fɔr him sylf and ʃeːap hæːft næːn ul/
/ta ʃeːap hæːrde tis hit fleːah intɔ feld/
Old English.
... brought to you by the Weeping Elf
My conlang pages
Zju
Posts: 912
Joined: Fri Aug 03, 2018 4:05 pm

Re: AI in conlanging - present and future

Post by Zju »

To clarify, is it outright OE text, and not descended from it?
/j/ <j>

Ɂaləɂahina asəkipaɂə ileku omkiroro salka.
Loɂ ɂerleku asəɂulŋusikraɂə seləɂahina əɂətlahɂun əiŋɂiɂŋa.
Hərlaɂ. Hərlaɂ. Hərlaɂ. Hərlaɂ. Hərlaɂ. Hərlaɂ. Hərlaɂ.
User avatar
WeepingElf
Posts: 1513
Joined: Sun Jul 15, 2018 12:39 pm
Location: Braunschweig, Germany
Contact:

Re: AI in conlanging - present and future

Post by WeepingElf »

Zju wrote: Mon Jun 24, 2024 11:47 am To clarify, is it outright OE text, and not descended from it?
I am not an expert on Old English, but to me it looks just like outright Old English, not a conlang descended from it.
... brought to you by the Weeping Elf
My conlang pages
User avatar
alice
Posts: 962
Joined: Mon Jul 09, 2018 11:15 am
Location: 'twixt Survival and Guilt

Re: AI in conlanging - present and future

Post by alice »

WeepingElf wrote: Mon Jun 24, 2024 1:15 pm
Zju wrote: Mon Jun 24, 2024 11:47 am To clarify, is it outright OE text, and not descended from it?
I am not an expert on Old English, but to me it looks just like outright Old English, not a conlang descended from it.
I'd agree with this, or perhaps suggest that it looks like a particularly good sample of Markov-generated text from OE inputs.
Self-referential signatures are for people too boring to come up with more interesting alternatives.
User avatar
WeepingElf
Posts: 1513
Joined: Sun Jul 15, 2018 12:39 pm
Location: Braunschweig, Germany
Contact:

Re: AI in conlanging - present and future

Post by WeepingElf »

alice wrote: Mon Jun 24, 2024 3:04 pm
WeepingElf wrote: Mon Jun 24, 2024 1:15 pm
Zju wrote: Mon Jun 24, 2024 11:47 am To clarify, is it outright OE text, and not descended from it?
I am not an expert on Old English, but to me it looks just like outright Old English, not a conlang descended from it.
I'd agree with this, or perhaps suggest that it looks like a particularly good sample of Markov-generated text from OE inputs.
It is obviously Schleicher's Fable, in a Germanic language that looks like Old English - but it doesn't really seem to be that, rather it seems to show some North Germanic traits, such as hest 'horse' or ull 'wool'. My guess is that it is a bogolang, obtained by running Old Norse through the sound changes from Common (West) Germanic to Old English. This kind of thing doesn't really require AI, just a sound change applier ;)
... brought to you by the Weeping Elf
My conlang pages
Post Reply