Glossing multi-line texts on the ZBB

Conworlds and conlangs
Post Reply
rotting bones
Posts: 2836
Joined: Tue Dec 04, 2018 5:16 pm

Glossing multi-line texts on the ZBB

Post by rotting bones »

Warning: This script doesn't fail gracefully if the romanization and the gloss have different numbers of words.

I couldn't figure out how to do this using Neonnaut’s gloss generator, so I wrote a simple Python script for glossing multi-line texts. Each line of the text must have exactly four lines: original, romanization, gloss and translation. I'm sure you can figure out which lines to comment out if you don't need them all.

Code: Select all

#!/usr/bin/env python3
import sys
import io

# must have UTF-8
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')

def batch_glosses(input):
    lines = input.split('\n')
    glosses = []
    current_gloss = []

    # Split input separated by blank lines
    for line in lines:
        if line.strip() == '':
            if len(current_gloss) > 0:
                glosses.append(current_gloss)
                current_gloss = []
        else:
            current_gloss.append(line)
    
    if len(current_gloss) > 0:
        glosses.append(current_gloss)

    # Loop through each gloss
    results = []
    for gloss in glosses:
        if len(gloss) < 4:
            results.append('[color=red]Each gloss needs 4 lines[/color]\n')
            continue

        original = gloss[0]
        romanization = gloss[1]
        gloss_line = gloss[2]
        translation = gloss[3]

        # Split
        rom_words = romanization.split()
        gloss_words = gloss_line.split()

        #gloss tags
        tagged_rom_parts = []
        for i, rom_word in enumerate(rom_words):
            gloss_word = gloss_words[i] if i < len(gloss_words) else ''
            tagged_rom_parts.append(f'[gloss={gloss_word}]{rom_word}[/gloss]')
        
        tagged_rom=' '.join(tagged_rom_parts)

        quoted = translation if translation.startswith('"') else f'"{translation}"'

        results.append(f'{original}\n{tagged_rom}\n{quoted}')

    return '\n\n'.join(results)


def main():
    if len(sys.argv) < 2:
        print('Specify the input file')
        sys.exit(1)

    file = sys.argv[1]

    try:
        with open(file, 'r', encoding='utf-8') as f:
            input = f.read()
        
        output = batch_glosses(input)
        print(output)
    except FileNotFoundError:
        print(f'Input file not found', file=sys.stderr)
        sys.exit(1)
    except Exception as e:
        print(f'Error reading file: {str(e)}', file=sys.stderr)
        sys.exit(1)


if __name__ == '__main__':
    main()
Example input:

রাস্তায় গাড়িঘোড়ার বিরাম নাই,
rasta-i gaɽi-ɡʱoɽa-r biram nai
road.loc car-horse.gen rest is.none
On the road, the cars and horses have no rest

ফেরিওয়ালা অবিশ্রাম হাঁকিয়া চলিয়াছে,
pʰeriwala ɔbisram hãkija tʃolijatʃʰe
hawker no-rest call.part walk.3.perf.cont
the street hawkers keep walking, calling without rest,

যাহারা আপিসে কালেজে আদালতে যাইবে
dʒahara ɔpiʃ-e kɔledʒ-e ad̪alɔt-e dʒaib-e
those office.loc college.loc court.loc go.3.fut
those who will go to offices, colleges, law courts,

Corresponding output:

Code: Select all

রাস্তায় গাড়িঘোড়ার বিরাম নাই,
[gloss=road.loc]rasta-i[/gloss] [gloss=car-horse.gen]gaɽi-ɡʱoɽa-r[/gloss] [gloss=rest]biram[/gloss] [gloss=is.none]nai[/gloss]
"On the road, the cars and horses have no rest"

ফেরিওয়ালা অবিশ্রাম হাঁকিয়া চলিয়াছে,
[gloss=hawker]pʰeriwala[/gloss] [gloss=no-rest]ɔbisram[/gloss] [gloss=call.part]hãkija[/gloss] [gloss=walk.3.perf.cont]tʃolijatʃʰe[/gloss]
"the street hawkers keep walking, calling without rest,"

যাহারা আপিসে কালেজে আদালতে যাইবে
[gloss=those]dʒahara[/gloss] [gloss=office.loc]ɔpiʃ-e[/gloss] [gloss=college.loc]kɔledʒ-e[/gloss] [gloss=court.loc]ad̪alɔt-e[/gloss] [gloss=go.3.fut]dʒaib-e[/gloss]
"those who will go to offices, colleges, law courts,"
What it looks like on the ZBB:

রাস্তায় গাড়িঘোড়ার বিরাম নাই,
rasta-i
road.loc
gaɽi-ɡʱoɽa-r
car-horse.gen
biram
rest
nai
is.none

"On the road, the cars and horses have no rest"

ফেরিওয়ালা অবিশ্রাম হাঁকিয়া চলিয়াছে,
pʰeriwala
hawker
ɔbisram
no-rest
hãkija
call.part
tʃolijatʃʰe
walk.3.perf.cont

"the street hawkers keep walking, calling without rest,"

যাহারা আপিসে কালেজে আদালতে যাইবে
dʒahara
those
ɔpiʃ-e
office.loc
kɔledʒ-e
college.loc
ad̪alɔt-e
court.loc
dʒaib-e
go.3.fut

"those who will go to offices, colleges, law courts,"

---

If you use this tool, consider contributing to any of my conlang or natlang threads. Thank you.

Edit: I told a custom SLM to convert this script into a webpage: https://drive.google.com/file/d/1Yr2QnO ... sp=sharing I gave the SLM some inputs to test the script. It should be correct. Download the file and open it in your browser. (The SLM runs on my local machine, not a data center. If you are opposed to AIs in any capacity whatsoever regardless of environmental impact, then stick to the Python script above.)
Lērisama
Posts: 747
Joined: Fri Oct 18, 2024 9:51 am
Location: Kernow Voy

Re: Glossing multi-line texts on the ZBB

Post by Lērisama »

Thank you, that looks useful.
LZ – Lēri Ziwi
PS – Proto Sāzlakuic (ancestor of LZ)
PRk – Proto Rākēwuic
XI – Xú Iạlan
VN – verbal noun
SUP – supine
DIRECT – verbal directional
My language stuff
rotting bones
Posts: 2836
Joined: Tue Dec 04, 2018 5:16 pm

Re: Glossing multi-line texts on the ZBB

Post by rotting bones »

Lērisama wrote: Sat Dec 20, 2025 8:03 am Thank you, that looks useful.
You're welcome.

PS. Let me know if you see any problems.
User avatar
Neonnaut
Posts: 132
Joined: Wed Jun 02, 2021 4:23 am

Re: Glossing multi-line texts on the ZBB

Post by Neonnaut »

I have already solved this problem a long time ago with Gloss My Gloss.

https://neonnaut.github.io/
Lērisama
Posts: 747
Joined: Fri Oct 18, 2024 9:51 am
Location: Kernow Voy

Re: Glossing multi-line texts on the ZBB

Post by Lērisama »

Neonnaut wrote: Sat Dec 20, 2025 10:47 am I have already solved this problem a long time ago with Gloss My Gloss.

https://neonnaut.github.io/
The whole point of it was that Rotting Bones couldn't work out how to do multiline texts using Gloss My Gloss. If there is a way to do that, I would be very happy to hear it.
LZ – Lēri Ziwi
PS – Proto Sāzlakuic (ancestor of LZ)
PRk – Proto Rākēwuic
XI – Xú Iạlan
VN – verbal noun
SUP – supine
DIRECT – verbal directional
My language stuff
User avatar
Neonnaut
Posts: 132
Joined: Wed Jun 02, 2021 4:23 am

Re: Glossing multi-line texts on the ZBB

Post by Neonnaut »

Lērisama wrote: Sat Dec 20, 2025 11:25 am
Neonnaut wrote: Sat Dec 20, 2025 10:47 am I have already solved this problem a long time ago with Gloss My Gloss.

https://neonnaut.github.io/
The whole point of it was that Rotting Bones couldn't work out how to do multiline texts using Gloss My Gloss. If there is a way to do that, I would be very happy to hear it.
Oh I'm sorry I skim read the post.

If I can think back that far I rationalised that someone would do it one at a time if there were multiple ... uh, pieces of a translation.
Lērisama
Posts: 747
Joined: Fri Oct 18, 2024 9:51 am
Location: Kernow Voy

Re: Glossing multi-line texts on the ZBB

Post by Lērisama »

Neonnaut wrote: Sat Dec 20, 2025 12:55 pm
Lērisama wrote: Sat Dec 20, 2025 11:25 am
Neonnaut wrote: Sat Dec 20, 2025 10:47 am I have already solved this problem a long time ago with Gloss My Gloss.

https://neonnaut.github.io/
The whole point of it was that Rotting Bones couldn't work out how to do multiline texts using Gloss My Gloss. If there is a way to do that, I would be very happy to hear it.
Oh I'm sorry I skim read the post.

If I can think back that far I rationalised that someone would do it one at a time if there were multiple ... uh, pieces of a translation.
Maybe add a “empty line as a new gloss” option?
LZ – Lēri Ziwi
PS – Proto Sāzlakuic (ancestor of LZ)
PRk – Proto Rākēwuic
XI – Xú Iạlan
VN – verbal noun
SUP – supine
DIRECT – verbal directional
My language stuff
User avatar
Neonnaut
Posts: 132
Joined: Wed Jun 02, 2021 4:23 am

Re: Glossing multi-line texts on the ZBB

Post by Neonnaut »

I'll make it do multiple translations on whitespace by default. This gives me a reason to rewrite this program in typescript, and to fix my consistent misspelling of separate as 'seperate'. OPs program is pretty cool. They mention that if romanisation has a different length to the gloss it doesn't fail gracefully, to do this you find the line with the most amount of words and make sure all lines that aren't the translation line are the length of this max line.
rotting bones
Posts: 2836
Joined: Tue Dec 04, 2018 5:16 pm

Re: Glossing multi-line texts on the ZBB

Post by rotting bones »

Neonnaut wrote: Sun Dec 21, 2025 5:52 am I'll make it do multiple translations on whitespace by default. This gives me a reason to rewrite this program in typescript, and to fix my consistent misspelling of separate as 'seperate'. OPs program is pretty cool. They mention that if romanisation has a different length to the gloss it doesn't fail gracefully, to do this you find the line with the most amount of words and make sure all lines that aren't the translation line are the length of this max line.
Thanks, it will be nice to have full-featured software that supports this. I only wrote a new script because I wanted to quickly process four paragraphs of text that was broken up into a few words per line.
User avatar
Neonnaut
Posts: 132
Joined: Wed Jun 02, 2021 4:23 am

Re: Glossing multi-line texts on the ZBB

Post by Neonnaut »

I have made the changes, it now does multiple glosses delimited by blank lines.
rotting bones
Posts: 2836
Joined: Tue Dec 04, 2018 5:16 pm

Re: Glossing multi-line texts on the ZBB

Post by rotting bones »

Neonnaut wrote: Wed Jan 21, 2026 2:22 pm I have made the changes, it now does multiple glosses delimited by blank lines.
Thanks, I'll try it.
User avatar
Axas mlö
Posts: 42
Joined: Thu Dec 12, 2024 12:13 am
Location: Luna City

Re: Glossing multi-line texts on the ZBB

Post by Axas mlö »

Glossing engines are great.

I'm wondering if the forum can advise about the rest of formatting a post: I've been writing my drafts elsewhere, copy-pasting them into the ZBB post-drafting-box, and then manually fixing all the formatting that was lost. Which besides glosses includes section headings, italicized words of the conlang in the middle of paragraphs in English, and so on. Manually fixing them is ok, it's fine, but is there a better way to do this process?

Sorry for the (I assume) noob question, and possibly I'm posting it in the wrong place. I did look on the forum to see if this was already discussed. Also, I did a search online (I DuckDuckWent) "markdown to bbcode" and there are options, tons, which is frankly overwhelming, and I don't know how to tell which ones might be trustworthy.
Post Reply