I’ve used Toki Pona privately for about eight years but never properly joined any speaker community, I found this place while lurking on Reddit and thought to myself why not lol.
I’m definitely most attracted to engelangs - I dabbled in a lot of the popular ones (Lojban, Láadan, Ithkuil…) but never properly studied any of them besides TP. I’ve been eyeing Toaq and considering studying it for real just a few days ago, so finding this community now feels very serendipitous!
I once made a horrible oligosynthetic minimal IAL – think Searight’s Sona, but somehow collapsed into 40ish morphemes – back when I only knew about Esperanto. None of my efforts to make a proper conlang after that have succeeded, I always end up burning out in the process. But oh well, someday.
I’m a computer science student (currently writing my bachelor thesis) and Rust enjoyer. I have a private conlang-related programming project cooking, but it will probably take a lot more time; I’ve been at it on-and-off for about 2 years already and the only results have been two half-baked prototypes and a whole lot of concept notes. I like to crochet stuff like amigurumi to wind down from work. I’d like to knit something sometime but I’m pretty bad at it hehe
sona ni la, “toki” li kulupu nimi. nimi ale li ken lon toki li ken lon ala toki.
toki wan li ni: \{ab^n \mid n \in \mathbb{N_0} \}. toki li jo e nimi ni taso: open la nimi lili ‘a’ wan li lon. nimi lili ‘b’ mute li ken lon monsi ona.
ilo toki li lon. ilo ni li lukin e nimi li toki e ni: “nimi li lon toki mi.” anu la, “nimi li lon ala toki mi.”
(insa ilo li jo e ma mute e nasin mute. ilo li open lon ma wan. nimi lili li kama la ona li tawa lon nasin. ilo li lon ma pini la nimi li lon toki ona. toki Inli la, nimi pi ilo ni li deterministic finite automaton. ilo ante li lon kin)
pana sona la, ni li lon:
jan sona: o lukin e ilo toki ni. toki ona li seme?
jan pi kama sona: open la nimi lili ‘a’ wan li lon. nimi lili ‘b’ mute li ken lon monsi ona.
jan sona: pona. (anu la:) ike lili. nanpa pi nimi lili ‘b’ o tu anu mute. nimi lili ‘b’ pi wan taso la nimi li lon ala toki.
tenpo ni la, pali pi jan sona li wile e jan. sona pi lipu mi li ni: ilo li ken pali sama jan sona kepeken nasin seme. ilo li lukin e toki pi jan pi kama sona li sona e kon ona li ken toki pona tawa ona.
My thesis is on parsing descriptions of formal languages (and constructs like automata) that are written in natural language.
This research is to be integrated into an education platform for theoretical computer science; the vision is that someday the platform can auto-grade / give helpful feedback for every type of exercise that commonly appears in exams, without requiring human tutors to go through all the submissions. For tasks like “construct a finite automaton that accepts words with an even number of as” this is easy enough with formal methods, but the opposite direction (“describe what words this automaton accepts”) isn’t really.
Happy with it so far, I finally get to put my linguistics knowledge to good use :D
Do you think part of what makes this hard is that it (as stated) doesn’t limit the time complexity one is allowed to make the reader need in order to evaluate the description? And if it’s for pedagogical purposes you kinda can’t, since people can’t be expected to be able to evaluate that yet!
To be a bit clearer about what I mean, describing a formal language is in some sense the same thing as describing what words are in it, but (considering that deciding whether a CFG is ambiguous is undecidable) if I give you a CFG describing a regular language without telling you that it’s a regular language I force you to assume cubic time to recognize words in, whereas if I describe the same language with a regex you know you can recognize its words in linear time.
Another possible reason: is it hard because it can be hard to determine if two different descriptions describe the same formal language? Like, for regular languages it’s linear IIRC, but for CFGs I believe it’s undecidable.
Or is it some more down to earth “people’s natural language descriptions of formal languages are messy” thing?
Or is it some more down to earth “people’s natural language descriptions of formal languages are messy” thing?
It's rather that. I doubt the platform will ever be able to parse every regular/context-free/... language, because you very quickly reach a point where you have to word things very carefully to prevent ambiguity, and at that point programmatically figuring out what was meant let alone giving constructive feedback becomes a nightmare. So the focus is on parsing the descriptions into an intermediate format that supports common patterns like "ending with X", "X followed by Y", "one more X than Y" in rather simple and well-behaved combinations. Since the target language is known and selected by the teacher, I doubt time complexity / computability is a problem in practice here.
I know that those have been challenges in implementing other types of exercises on the platform, though. Particularly for checking equivalence of context-free languages (which you need if you let students construct CFGs / PDAs themselves), I think they've come up with some subset of context-free languages for which it's decidable even if only one of the languages is in the subset.