Bawkalm

Bawkalm

Bawkalm was inspired by imagining how I might reform Lojban if I was completely unconstrained by its morphology and syntax, however it has evolved to the point it’s almost unrecognizable. Lojban is in its DNA, but it is its own language. It’s major features are:

  • Minimalist grammar
    • This does not mean simple. It just means fewer, more powerful constructs
    • The philosophy is that semantics should be conveyed primarily by words, and grammar just glues words together
  • Well defined semantics that lean closer to logic than natural language
  • One of the first pitch accent based SSMs (This language started development years ago, so it did this back when Toaq still had the ō tone)
  • Grammatical infixes (oh my !)
  • Relatively free, case based word order with ergative alignment

Resources

I’m working on a website for the reference grammar and the dictionary, but it’s slow going. Until then, I should document most of it here. Note the language is not quite complete (e.g. I need a better solution for connectives and numbers), but it’s certainly usable. I currently store the vocab in a Google sheet here, but the state of that file is rough.

Phonology

The phonology is designed to be more systematic. For consonants there are three places of articulation and 2 × 2 + 2 manners of articulation, then an additional two glottal consonants treated specially. There are the usual five vowels and no phonemic schwa. Bawkalm distinguishes both gemination and vowel length.

Consonants

We use a series of abbreviations for the qualities of consonants summarized in the following tables.

Points of Production:

Symbol Class Name Points of Production Sound made with
P Labial bilabials or labiodentals lips
T Coronal dentals, aveolars, and retroflex tip of the tongue
K Dorsal palatals, velars base of the tongue
H Glottal glottals the throat

Manners of Articulation

Symbol Manner of Articulation Description
S Plosive Also known as stops, this stops air flow briefly
F Fricative A narrowing at the point of production creates turbulent airflow
U Unvoiced Vowel is pronounced after or before consonant. These should be
M Voiced Also known as modal voice, the vowel is pronounced during the consonant.
N Nasal Sound allowed through nose
Y Approximants Also known as glides, the sound stream is perturbed without creating turbulence

A consonant is made from one point of production and one of {S, F} × {U, M}, N, or Y. There are also two special glottal consonants treated specially. Each cell contains the single letter used to write that sound, and its optimal pronunciation.

SU SM FU FM N Y
P p /pʰ/ b /b/ f /f/ v /v/ m /m/ w /w/
T t /tʰ/ d /d/ s /s/ z /z/ n /n/ l /l~ɹ/
K k /kʰ/ g /g/ x /ɕ~x/ j /ʑ~ʁ/ q /ŋ/ y /j/
H  ' /ʔ/ h /h/

Part of the design philosophy of Bawkalm’s phonology is that unless a language has a very minimal inventory, there will always be some language for who there is a sound in your language that is difficult to pronounce. Therefore, this phonology is designed to be both forgiving and easy to explain. That is, if a sound is not present in their language, it should be easy to explain how to make a decent approximation to the proper sound by metaphor to the sounds they already know.

:warning: Warning: Speakers of languages with [ʃ] [ʒ] (sh and the s in measure in English) should be careful not to pronounce x and j this way since they’re likely to be misheard as s and z.

Vowels

There are (the usual) five vowels, i, e, a, o, u, which are class V. Properly they’re all unrounded, but the back vowels can be rounded as long as they stay short. These are approximations. As long as the vowel is in its general

 ---- ----ɯ u
\  ɪ \    | i
   --- ---ɤ o
  \   \   |
    ɛ-- --  e
    \  \  |
       - -ɑ  a 

All diphthongs and long vowels are formed with Y class consonants (glides), i.e. are class VY. Even if it seems odd, Vl is considered a diphthong for morphological purposes, but these are predictably pronounced. In general, a diphthong can be pronounced as its spelled, but certain pronunciations are more canonical.

Spelling Pronunciation Notes
iw /y:/ In general, w rounds vowels
ew /œ:/
aw /ɑʊ/
ow /oʊ/
uw /u:/ Long u and i are spelled with their corresponding glides
iy /i:/
ey /e:~eɪ/
ay /aɪ/
oy /ɤɪ/
uy /ɯɪ/
ih /i:/ Use iy when not followed by hV
eh /ɛ:/
ah /ɑ:/
oh /ɤ:/
uh /ɯ:/ Use uw when not followed by hV

It’s fairly common to have series of V(YV)*, e.g. uhu and oya. Here the diphthong from the above table transitions naturally into the glide, but the vowel is considered short.

Alphabetic Order

Bawkalm has an alternate alphabetic order:

' i e a o u p b f v m w t d s z n l k g x j q y h

This is not always used, but worth mentioning. Note that there is no c or r.

Next Time: Morphology

That’s all I’ll type out for now. I’ll add more later.

7 Likes

Hell yeah dude, the long-awaited bawkalm is upon us

1 Like

Morphology

There are two main morphological classes: verbs[1] and particles. Verbs carry all the meaning, and particles control the syntax. Here’s a toy grammar that describes the morphology of the two word types. Note that this is assumed to be case insensitive, and simplifies some things for clarity.

US <- 'p' | 't' | 'k'
UF <- 'f' | 's' | 'x'
U <- US | UF
M <- 'b' | 'v' | 'd' | 'z' | 'g' | 'j'
N <- 'm' | 'n' | 'q'
C <- U | M | N
Y <- 'w' | 'l' | 'y'
H <- 'h'
V <- 'i' | 'e' | 'a' | 'o' | 'u'
vowels <- V ((H | Y) V)*
initial <- ('\'' | C) Y?
particle <- initial vowels
CC <- N (C | '\'') | H H | U (U | N | '\'') | M (M | N)
cluster <- Y (C | '\'' | CC) | CC
coda <- M | N | UF
verb <- initial vowels (cluster vowels)* (Y coda? | coda | H)

This is a little dense, so let’s break down what this means:

Vowel sequences

Firstly, we must understand way a vowel sequence is: These are called vowels in the above grammar. These are considered one vowel stretched across multiple syllables. They are formed by multiple short vowels separated by a single approximate (Y) or H between each vowel, i.e. VYVYV…. Though they can be arbitrarily long, they’re almost always shorter than three short vowels. For example:

  • Valid:
    • e
    • uhi
    • awa
    • oweyi
    • iyayiyayo
  • Invalid
    • ewya (Short vowels only/two approximates)
    • oia (No buffer semivowels)
    • ayhwao (Everything is wrong here)

Particle (Valgen) Morphology

All words can start with either any non approximate (non-Y) consonant (we refer to this class as C) and the glottal stop (') followed by any glide. A particle is an initial followed a vowel sequence ending in short vowel. These are called “Valgen” in Bawkalm.

Particle examples:

  • Valid
    • 'i
    • pe
    • 'la
    • bwe
    • 'weya
    • ziyo
    • bahaha
    • twalu
  • Invalid
    • pey (Long vowel → verb)
    • 'wehal (Long vowel → verb)
    • spa (No initial clusters)
    • mhi (h not allowed as an initial glide)

Verb morphology

Verbs are more complex than particles, but they’re essence is easy to recognize: They are an initial, a vowel sequence, any number of two consonant cluster then vowel sequence pairs, then a coda. A cluster is any two C consonants that match in voicing. A coda is a long vowel followed by an optional consonant, or a short vowel followed by a mandatory consonant.

It’s easier to understand verbs as one or more CV(YV)*C segments which we’ll call Qafkalm, which roughly translates to “stem” or “root word”. Then, a Qafkalm starts with an initial, and ends with one of:

  • Long pure vowel, i.e. one of iy, eh, ah, oh, uw.[2]
  • Coda consonant, which is one of:
    • Nasal (N)
    • Voiced (M)
    • Unvoiced fricative (UF)
  • A long dipthong vowel followed by an optional consonant coda.

Qafkalm examples:

  • Valid
    • 'Aw
    • Miy
    • Dah
    • Qol
    • Jed
    • Xalg
    • Mam
    • Days
    • Gelun
    • 'Uwes
    • Taluwoyz
  • Invalid
    • Tsoy (Starts with cluster)
    • Bedz (Ends with a cluser)
    • Batpal (Valid verb, but not a single Qafkalm)
    • Tat (Ends in an unvoiced plosive)
    • 'Oya (A particle disguising itself with a capital letter)
    • Buht (No consonants after long pure vowels)
    • Dlun †

† This is morphologically valid, but is not a stem since it’s declined with an an infix -l-. In general, there is an optional Y? after the initial consonant, but it is used for infix conversion (see: TODO[3]). Some combinations such as Cyi and Cwu are difficult to distinguish. In general, it’s best to avoid using the infix here[4], but if it is used, a buffer vowel (preferable a short schwa, but any central mid-to-close vowel will work) can be inserted between the initial to easy pronunciation, e.g.

Jyil
/ʒə̆ˈjɪl/

Derivation

To derive a new verb from two old ones, simply concatenate two together. Compounds are technically opaque, but the components can serve as mnemonic for the underlying meaning. Usually these are derived from complex chains (see: TODO[5]), so the first Qafkalm determines the case frame (see: TODO[6]) of the compound verb. For example, «Bawkalm» is «Baw» + «Kalm» where «Baw» is the root for language, and «Kalm» is the language’s endonym.

Though concatenation is usually sufficient for verb formation, there are some exceptions:

  • Final stops and fricatives match the voicing of initial stops and fricatives.[7] E.g.
    • «Mex» + «Gal» = «Mejgal»
  • Glottal stops are considered unvoiced
    • «Xed» + «'Annak» = «Xet’annak»
  • A final long pure vowel, and an initial glottal stop (') combine into a short vowel, and

If case does not appear in this list, the two Qafkalm can be concatenated with no changes. We call attention to some specific cases:

  • Gemination is permitted in clusters (and only in clusters). This can result from devoicing of a final consnant. Ex:
    • 'Uwesseyn
    • «Sab» + «Peq» = «Sappeq»
  • A diphthong followed by an initial consonant does not result in any ambiguity since all words must start with a consonant. E.x.
    • «Sah» + «'Aw» = «Sahhaw»
  • Nasals do not assimilate, and can precede or follow any other consonant. In particular, they do not assimilate even to other nasals.[8]
    • «Xamney»

Conventions

Note that by convention, verbs are capitalized, and initial glottal stops are written. This is to clarify pronunciation. As long as one includes the proper spaces, the following is perfectly and unambiguously legible:

i la miy xu aw pled la geq

which is properly written

'i 'la Miy xu 'Aw Pled 'la Geq

Some find the former style more visually appealing. The author has no aesthetic sense and fills her Lojban with periods and diacritics, so use what you prefer.

Punctuation is unnecessary, but appreciated. Typically sentence level punctuation precedes the sentence, and uses the inverted form. It replaces the letter for the glottal stop if present, e.g.

¿iya Kag 'le Mihon
.i Jew ¡i Kaq 'le Mihon

Guillements «» should be used around quotes. Commas should be used after terminators ('e, 'u, 'o. 'a, etc.). Colons are used after subordinating determiner phrases (introduced with «qu»).


Next up, we describe the SSM, which should hopefully be a shorter post.


  1. Technically these are contentives since Bawkalm has no nouns, but “verb” is much more broadly understood and easier to type. ↩︎

  2. Note that though I include them here, I’m unsure whether I will keep these. I worry about ambiguity, but the SSM should be able to handle it. It’s also nice to be able to loan “Pitsah” directly. ↩︎

  3. Infix case conversion redirects cases to other case roles of the verb. For example, «'le Dlun» is “the giver” instead of «'le Dun», which is “the gift”. This is roughly similar to «se» in Lojban, or bo- in Toaq. ↩︎

  4. The best way is to instead decline the identity predicate «'Ew». For example «Jyil» could also be equivalently expressed «'Yew Jil», which is much easier to pronounce. ↩︎

  5. Chaining is the principle method of subordination in Bawkalm. Every verb has a chaining slot, and a verb that appears after it forms a (lambda) clause that’s passed to that slot. Therefore, these are what should lexified into new verbs. ↩︎

  6. Instead of a place structure, a verb can take 0 to 3 case objects, and 1 to 2 subordinate clauses. This is the case frame, and is essentially the definition of a verb. ↩︎

  7. This matching of voicing can also happen outside of compounds for verbs ending in plosives. This is, «Sab Peq» may be pronounced [sap.peŋ]. This is /p~b/, /t~d/, and /k~g/ in the coda. ↩︎

  8. This is an unfortunate consequence of loaning from Lojban. If I was designing a morphology from scratch, this design would be very different. ↩︎

1 Like

Are you planning to write a formal minimalist grammar? If so I think that would be a first for a loglang(!). If so, and if you want inspiration on parsing, I have a handful of papers laying around on parsing algorithms for minimalist grammars.

I did not mean in the formal sense, but reading briefly, I think I could ? There are some constructions that I’m not sure can be derived from simple left, right elimination, e.g. free term order and some clause inversion. I was mostly comparing to Lojban and to a lesser degree Toaq where there are many grammatical classes. In Bawkalm, most grammar is just connecting predicate slots, so I think it could be compatible.

I know I’ve looked at a type theoretic approach to case elimination, but there was no way to derive the type classes automatically, not even with functional deps. We’re getting ahead of ourselves, but Bawkalm allows places to be filled in any order, so there’s no obvious way to curry, for example. There’s only three cases, so it could be done exhaustively, but that feels sort of inelegant.

I would love to see the papers, but I can’t promise I would have time to read them. I honestly don’t know how you have the time to read all this literature.

For [insert tedious argument reducing minimalist grammar to MCFG for which I have intuition here] reasons, I expect this would be possible, but I can’t guarantee that it will be pretty.

I’m procrastinating on writing my thesis :no_mouth:

No worries, I finally went ahead and actually imported all the random PDFs I’ve downloaded into Zotero so I can actually find them again, so, paper dump incoming!

  • Minimalism and Computational Linguistics: defines standard MG and mentions a smörgåsbord of useful extensions (of the kind you might want to employ as a language designer) which are all weakly equivalent (same strings, not necessarily same parse trees) to standard MG. Also talks about MGC, minimalist grammars with copying, and how they’re weakly equivalent to PMCFG (MCFG with copying; no surprise, since MG and MCFG are weakly equivalent). Also has a section on subregular complexity, specifically subregular phonology (a field built on the conjecture that even mere regular languages are way more expressive than you need for a theory of any natural phonotactics), which you might find interesting
  • IDL-PMCFG, a Grammar Formalism for Describing Free Word Order Languages: title kinda says it all. This formalism has more expressive power than MCG/PMCFG, but it comes at a cost: :warning: it’s NP-hard to parse
  • Parsing Minimalist Languages with Interpreted Regular Tree Grammars: ok, time to parse! This has the lowest upper bound for parsing MGs that I know of, by re-analyzing an older algorithm and showing that its former analysis was too pessimistic
  • On the Computational Complexity of Head Movement and Affix Hopping: provides a simpler algorithm than the one above, with the same complexity, and then extends it more cheaply than the previously best known technique to achieve an apparently very sought-after one of those extra features that doesn’t change the weak generative capacity
  • Two models of minimalist, incremental syntactic analysis: this one comes with code! code that hasn’t been touched since 2013, mind you, but code nonetheless, in several languages including Haskell and Prolog. The paper describes a pretty simple top-down parser, and also argues (pretty convincingly!) why, despite being weakly equivalent, MGs are better than MCFs
  • Parsing as Deduction Revisited: Using an Automatic Theorem Prover to Solve an SMT Model of a Minimalist Parser: I mean, MG parsing isn’t NP-hard, but sure, why not. Not entirely as unnecessary as it sounds, the idea is to use it for underspecified models. Also comes with code
  • Wide-Coverage Neural A* Parsing for Minimalist Grammars: a little empirical snack to finish off. O\left(n^3\right) empirically on their corpus despite being (using the tighter bounds for the algorithm they use) O\left(n^{15}\right) theoretically

Self Segmenting Morphology (SSM)

An SSM is an integral part of any loglang. There are many known strategies which are primarily morphemic in nature. Bawkalm takes a different approach: The key insight is that many natural languages distinguish word boundaries with a natural rhythm and melody. We attempt to copy this approach into Bawkalm, but with a far more systematic approach. This is sometimes described as a pitch-accent SSM, and Bawkalm was the first language to use this approach[1].

Though the SSM may seem complex and impossible to manage at first note two things : First, this does get easier with practice. It feels natural once fluid. Secondly, part of the philosophy of Bawkalm's SSM is "defense in depth". It's a dance between tone, stress, rhythm, and morphology where each boundary (or lack thereof) is marked in multiple ways. This means that an error in one dimension will be compensated for by another.

Morphology

We have already reviewed some aszpects of the SSM. Only Qafkalm always end in a consonant or a long vowel (or both), so a long vowel or a cluster always mark a wordQafkalm boundary. Particles can only contain vowel sequences, so any single consonant .after a short vowel marks a boundary. A verb can be made of multiple Qafkalm, so the rest of the SSM is built around distinguishing the boundaries between serial verbs (See: TODO[2])

Stress

Bawkalm has initial stress on verbs, and particles take no stress. If a verb has more than three syllables, there's secondary stress on the final consonant. This alone is technically sufficient to achieve SSM, but the author does not feel stress is reliable enough to distinguish word boundaries. This is particularly true for long series of monosyllabic verbs. Regardless, here are some examples.

Bawkalm IPA Meaning
'le Malt xu Mel /ʔlɛ ˈmalt ɕɯ ˈmɛl/ The cat is pretty
'Uwesseyn 'le Miy /ˈʔuwesːˌeɪn ʔlɛ miː/ I'm American
'la Miy xu 'Aw Biy Zam Tals /ʔla miː ɕɯ ˈʔɑʊ ˈbiː ˈzam ˈtals/ I want to become stronger.

Tone

As we can see with the last example, many stressed or unstressed syllables in a row is a common occurance in Bawkalm. In addition to the above systems, Bawkalm has structured melody. Some languages may call this a pitch accent, but Bawkalm's melody is a little more flexible. The general idea is that verbs go up in pitch at the beginning, then fall to the end, and particles have a low tone. The minutiae are just a systemitization of this.

All particles have a relatively lower tone, so most of this will cover verbs. The treatment of verbs is dependent on the number of syllables. Note that we do mean syllables here. For example, «Gelun» is one Qafkalm, but has two syllables. There are four main classes:

  • Monosyllabic verbs have a simple falling tone.

    • Ex: «Mel» is /mɛl˥˩/
  • Declined monosyllabic verbs, this is, monosyllabic verbs with conversion infix, have a rising-falling tone, but the rise is shorter and starts from a mid tone.

    • Ex: «Dlun» is /dl˧˥ɯn˥˩/.
  • Two syllables verbs have a rising tone on the first syllable, and a falling tone on the second.

    • Ex: «Gelun» is /gɛl˩˥ɯn˥˩/
    • Ex: «Bawkalm» is /bɑʊ˩˥kɑlm˥˩/
  • Three or more syllables verbs still start with a rising tone and have a falling tone on the second syllable, but the tone never falls all the way. Then, there's another falling tone on the final syllable (i.e. the same syllalble that takes secondary stress.)

    • Ex. Bawindon is /bɑ˩˥wɪn˥˨dɤn˦˩/
    • This is a good example of how Bawkalm's SSM integtates: For a very long verb (rare in Bawklam), the intervening third and beyond syllables may feel diffilcut to distinguish in terms of tone and stress, but the consonant clusters (and rhythm) will make it clear the verb is not yet over.

Rhythm

Different elements of the morphology have a different a different length when spoken. We think of these in terms of "beats". A single, monosyllabic verb is one beat, and everything else is based on that. We denote this with musical notation, and a quarter note (𝅘𝅥𝅘𝅥 ) is one beat. The following table summarizes each mmorphological structure, the portion of a beat that it takes, and the equivalent in musical notation.

Morphological Structure Example Portion 𝅘𝅥𝅮 
Monosyllabic Qafkalm with short vowel Kam 1 𝅘𝅥 
Monosyllabic declined Qafkalm with short vowel Dlun 1+½ 𝅘𝅥 .
Monosyllabic Qafkalm ending in long vowel Ney 1 𝅘𝅥 
Monosyllabic declined Qafkalm ending in long vowel Xwil 1+½ 𝅘𝅥 .
Monosyllabic Qafkalm with long vowel and consonant Malt 1+½ 𝅘𝅥 .
Monosyllabic declined Qafkalm with long vowel and consonant Xwayd 1+¾ 𝅘𝅥 ..
First syllable of particle xu ¾ 𝅘𝅥𝅮 .
First syllable of declined particle 'le 𝅘𝅥𝅮 ..
Latter syllables of a particle 'weya ½ 𝅘𝅥𝅮 
The first part of a polysyllabic Qafkalm Gelun ¾ 𝅘𝅥𝅮 .
The first part of a declined, polysyllabic Qafkalm Gwelun 𝅘𝅥𝅮 ..
The last part of a polysyllabic Qafkalm ending with a short vowel Gelun ¾ 𝅘𝅥𝅮 .
The last part of a polysyllabic Qafkalm ending in a long vowel Kilow ¾ 𝅘𝅥𝅮 .
The last part of a polysyllabic Qafkalm ending in a long vowel and a consonant Qoweyd 𝅘𝅥𝅮 ..
Any medial part of vowel sequence 'Iyayow ½ 𝅘𝅥𝅮 

This may seem like tedious memorization but there is a method to it. In general, a verb part has one beat, and particle part has half a beat. Anything with more complexity, we give it another half of its value. These correspond to augmentation dots in musical notation E.g., for «Xwayd», it gets one beat for being a monosyllabic verb, another half for ending in a long vowel and a consonant, and another half of a half for being declined. The initial syllable of particles get an extra half its value just so particles don't get lost in the metaphorical verb soup.

Note that these are all estimates. A speaker just has to get close enough to be understood, and these can be stretched and shortened to fit into poetic or musical meter.

Holding Space

Given all these interlocking, it may feel impossible to coordinate, or even pause while talking. If a speaker needs to pause to think while speaking, they may hold any nasal, preferably m. That said, in combination, this system is fairly robust: One can completely ignore any one dimension, and the remaining will compensate. It may be better to think of these as strategies speakers and listeners can use to indicate word boundaries than as mandatory parts of correct speech.

Next Time: Verb Structure

We're finally through the tedious morphological components. Finally we get to start talking about the interesting aspects of Bawkalm: The semantics. Next time we'll talk about Bawkalm case structures and the anatomy of a verb definition.


  1. Note that the development of Bawkalm predates Xextan and Toaq's use of tone contours. ↩︎

  2. Juxtaposition of verbs is essentially grammatical subordination. ↩︎

4 Likes

Wow, this is advanced! Would love to hear a fluidly spoken sample to see whether I can recognize all the different parts! :eyes:

1 Like

There's an example on the LLL, but it's an older version of the language, so it's not quite correct. I might translate and read some paragraph length text later just to have a sample.

1 Like