Ŋarâþ Crîþ v9n devlog

flirora · October 18, 2024, 12:33am

Ŋarâþ Crîþ v9n is a work-in-progress revision of Ŋarâþ Crîþ.

9n-proto is the working draft for Ŋarâþ Crîþ v9n as well as a Ŋarâþ Crîþ morphology in PyFoma.

Links

2024-10-17

Background info:

A bridge is the coda of one syllable and the initial of the next. Bridge resolution has two functions: to canonicalize bridges so that they comply with the maximal-onset principle and to change hard-to-pronounce bridges to be easier to pronounce.
Oginiþe cfarðerþ is the repetition of certain sounds in ways I dislike and thus are the target of deduplication rules. In Ŋarâþ Crîþ v9e, this is limited to a select few consonants, but it is set to be much more pervasive in Ŋarâþ Crîþ v9n

I’ve finished the first draft of the bridge resolution rules, but it has some problems.

In the context of the sound change rules as a whole, bridge resolution needs to interact with hyphens. Ŋarâþ Crîþ v9e ignores this problem, so that any hyphens that were present on the s state (see the state machine diagram) remain on the s state. For instance, the bridge ⟦rþ:-⟧ is canonicalized into ⟦r:-þ⟧, but if the coda following this bridge is ⟦þ⟧, then we have an instance of oginiþe cfarðerþ that becomes undetectable because the hyphen between the two copies of ⟦þ⟧ disappears. While most hyphens in actual Ŋarâþ Crîþ v9 inflection are on the o boundary, some affixes (such as verbal and relational object affixes) do have s-hyphens, so this problem does occur in practice.

My initial idea was to replace the s hyphen with n and g hyphens around the bridge if the bridge is changed (or even unconditionally). This has problems as well: one of the key principles of oginiþe cfarðerþ is that it is only resolved when it occurs across a hyphen: any instance thereof that occurs entirely within a morpheme is preserved (unless it overlaps with another instance between morphemes). But applying this to ⟦þaþ:-a⟧ yields ⟦þa-:þ-a⟧, where an intramorphemic OC is converted to an intermorphemic one.

Another solution might be to track hyphens and move them as necessary (e.g. ⟦rþ:-⟧ → ⟦r:þ-⟧), but this is complicated for several reasons:

Hyphens can occur between components but are forbidden from appearing within them: ⟦t:r⟧ canonicalizes to ⟦:tr⟧, but if we originally had ⟦t:-r⟧, then where do we put the hyphen in ⟦:tr⟧? This also applies to the coalescence of ⟦t:š⟧ to ⟦:č⟧.
This still leaves the problem of undetectable oginiþe cfarðerþ when either the coda or the initial does change.

Perhaps a solution is a hybrid approach: move the hyphen when canonicalizing the bridge (temporarily allowing hyphens within the onset?), but add a hyphen before the coda when it changes, and after the onset when that changes.

TheAndSys · October 18, 2024, 2:50am

this conlang looks SICK i love it

flirora · October 18, 2024, 4:57am

Thanks for the appreciation, but I was hoping for some more details. Is there anything specific that stood out to you? And is there anything I could explain better?

In general, I’m open to suggestions from other people.

TheAndSys · October 18, 2024, 5:12am

Oh, I don't really have any specific feedback, I'm just loving the aesthetics of the language, especially the script it's written with

zearen · October 18, 2024, 4:19pm

This looks very interesting. It is clearly a project that has been in development for a decade. The consequence of this is that there is a large amount of documentation to review. It is easier to read rather than write technical pieces like this, so I will ask for patience as I review these. I do have some preliminary comments:

This is amazingly advanced, and I expect this to be a loglang, if you want that label. This is especially true given your morphology uses reversible ("invertible") declination.
I enjoy the use of transformers. I'm not quite sure what the practical effects are on the language yet, but I'm intrigued.
The documentation makes many allusions to previous versions. However, your audience (e.g. myself) is unlikely to be familiar with the8 previous versions of your language. I try to move such historical information to appendices and footnotes, personally.
The documentation feels a bit scattered. I have no real advice here until I work through it all.

I have a couple preliminary questions:

What are the core features of the language ? I.e. is the word order free ? How many cases are there ? It seems agglutinative, but I was unsure.
Is there a guiding philosophy for the language ? There seems to be a core data structure, but the section explaining it lacked detail for me to understand it out of context.

flirora · October 18, 2024, 5:26pm

Thanks, but Ŋarâþ Crîþ is different from a loglang; there are some intentional ambiguities, such as the genitive being used both for possessive and appositive meanings or semblative-case nominalized verbs being used to mean both “as if X” and “so that X”.

Since Ŋarâþ Crîþ v9n is still in its early stages, most of the work has been put in the morphophonology, which is the part that will differ the most from v9e and is the hardest to get right. So unfortunately, you’ll have to read the v9e grammar for the full picture, though I’ll try to summarize where needed.

The word order is quite free; the verb occurs at the end of a clause, but the noun phrases can be moved relative to each other, and many modifiers are separable from their heads. The language is more fusional than agglutinative, actually. Ŋarâþ Crîþ v9n will likely have the same cases as v9e: nominative, accusative, dative, genitive, locative, instrumental, abessive, and semblative. However, I’ll probably change the number system a bit, primarily because I find myself rarely using the dual and plural numbers.

I’ve written a chapter on the principles of Ŋarâþ Crîþ in the v9e grammar. This also holds for v9n.

flirora · October 18, 2024, 9:10pm

Background info: Ŋarâþ Crîþ v9n plans to deal with three types of oginiþe cfarðerþ: Type I oginiþe cfarðerþ is (roughly) formed by duplicate consonants around a vowel. Type II oginiþe cfarðerþ is formed by two consecutive codas differing only by vowel tone, and type III oginiþe cfarðerþ is formed by three consecutive vowels of the same quality.

Yet another option would be to apply the resolution rules other than canonicalization, then the deduplication rules for (type I) oginiþe cfarðerþ, and then finally bridge canonicalization. It is easy to see that bridge canonicalization always preserves the presence of type I or type III oginiþe cfarðerþ. If we posit type II oginiþe cfarðerþ to require identical codas, then bridge canonicalization can add type II OC; e.g. ârþ:-ar lacks any type II OC in this case but canonicalizes to âr:þ-ar, which does. There is a proposal to expand the definition of type II OC, however, to only require codas to have one consonant in common, in which case bridge canonicalization can only remove existing type II OC.

If this approach could be made to work, then it would be free of concerns about dealing with intra-initial hyphens in downstream bridge resolution. In exchange, we would need to worry about dealing with non-canonical bridges in these rules.

(A more radical solution might be to disallow s-hyphens, or rather, heavily restrict where they occur, such as only after a null, ⟦r⟧, ⟦n⟧, or ⟦l⟧ coda. This would remove the need for the downstream bridge resolution rules but would preclude the existing treatment of verbal object affixes, which are suffixed to finite forms.)

flirora · October 22, 2024, 2:21am

Alternatives to hyphens?

Since v7, Ŋarâþ Crîþ has admitted hyphens as part of morphology to apply deduplication to oginiþe cfarðerþ without affecting loanwords containing what would be considered OC. However, this results in the need to manage the appearance of multiple consecutive hyphens, treating them as a single one, and every sound change rule needs to be formulated with hyphens in mind, putting '-'? in the appropriate places.

One option is to add the concept of ‘protected’ segments. For instance, in the case of type I oginiþe cfarðerþ, consonants subject to deduplication could have a variant that is immune from being the first consonant of a pair to be deduplicated. However, this pushes the complexity onto all other rules – rules that want to work on both unprotected and protected consonants would have to use something like t|'t_protected' instead of t, and since type I oginiþe cfarðerþ affects nearly every consonant, this would quickly become untenable – add to that the need for a protection flag for the remaining two types of oginiþe cfarðerþ.

Another option is to switch to using an ‘anti-hyphen’ character that prevents sound change rules from working across them. But for a morpheme like viv (rather than being a loanword, it’s the N stem of the word for ‘fly’), where do we put the anti-hyphen? We have two plausible locations (or three if we account for the glide): v~iv or vi~v. Or maybe we could place them at all possible locations, yielding v~i~v. While this prevents deduplication of the two v’s in the morpheme, it also prevents type II and III oginiþe cfarðerþ from working: imagine that we had a suffix like -isti. Then v~i~visti would lack any type III OC to resolve, and we would get vivisti.

Both of these alternative approaches also rely on morphemes being annotated properly, though this could be automated.

The magic mirror and refactoring

I have ported the venerable magic mirror, which shows how all coda–initial combinations resolve, to v9n. Here’s the beginning of the output:

┌───────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┐
│ Onset │    ∅     │    s     │    r     │    n     │    þ     │    rþ    │    l     │    t     │    c     │    f     │    ł     │    cþ    │
├───────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┤
│   ∅   │   :-     │  s:-     │  r:-     │  n:-     │  þ:-     │ rþ:-     │  l:-     │  t:-     │  c:-     │  f:-     │  ł:-     │ cþ:-     │
│   c   │   :-c    │  s:-c    │  r:-c    │  n:-c    │  þ:-c    │ rþ:-c    │  l:-c    │ -c:-t-   │  c:-c    │  f:-c    │  ł:-c    │ -þ:-c    │
│   n   │   :-n    │  s:-n    │  r:-n    │  n:-n    │  þ:-n    │ rþ:-n    │  l:-n    │ -n:-n    │ -ŋ:-n    │  f:-n    │  ł:-n    │ -þ:-n    │
│   ŋ   │   :-ŋ    │  s:-g-   │  r:-ŋ    │  n:-ŋ    │  þ:-g-   │ rþ:-g-   │  l:-ŋ    │ -n:-ŋ    │   :-ŋ    │  f:-g-   │ -l:-g-   │ -þ:-g-   │
│   v   │   :-v    │  s:-v    │  r:-v    │  n:-v    │  þ:-f-   │ rþ:-f-   │  l:-v    │  t:-f-   │  c:-f-   │  f:-f-   │ -l:-v    │ -þ:-f-   │
│   s   │   :-s    │   :-þ-   │  r:-s    │  n:-s    │  þ:-þ-   │ rþ:-þ-   │  l:-s    │  t:-s    │  c:-s    │  f:-s    │  ł:-t-   │ cþ:-þ-   │
│   þ   │   :-þ    │  s:-þ    │  r:-þ    │  n:-þ    │  þ:-þ    │ rþ:-þ    │  l:-þ    │  t:-þ    │  c:-þ    │  f:-þ    │  ł:-t-   │ cþ:-þ    │

(You might notice the ŋ coda in some places and ask whether it’s suddenly become valid in v9n. This was introduced in v9e as a ‘pseudo-coda’ that ξ-transforms the preceding vowel and becomes r; ŋ is still a pseudo-coda in v9n, but thanks to the new proposed rule application order, it will be able to participate in deduplication of type I oginiþe cfarðerþ.)

I also refactored the bridge rules to be more maintainable and fixed some bugs in the process. Now most of the grunt work is encapsulated in an auxiliary function:

dd('CollapseHyphensB', "$^rewrite('-':'' / _ ('-'|':'))")

@depends_on(('CollapseHyphensB',), ('passthru',))
def bridgerule(x: FST) -> FST:
    p = FST.re("('-'? $x '-'?) @ $CollapseHyphensB", {**defs, 'x': x})
    return FST.re("$^passthru($p)", {'p': p}, functions={passthru.func})

flirora · October 23, 2024, 7:14pm

Revisiting oginiþe cfarðerþ detection

I’m fixing some bugs in the transducers for detecting oginiþe cfarðerþ, and it turns out that pickle recurses too deeply when trying to serialize large automata. (Also, combining HasType{I,II,III}OC takes over 10 minutes and produces an automaton with 32798 states, which might indicate that the definitions are too complex.)

Type I OC is the most complex to detect. To simplify the rules, we could remove consonant class II (where VCVC or CVCV with identical vowels constitutes OC), moving these consonants into class I (CVC constitutes OC). This saves having to repeat the pattern for each vowel quality, shrinking HasTypeIOC from 4963 states to 521 (edit: 422 after fixing another bug) and allowing it to be serialized by pickle. Regardless, it’s clear that I’m going to run into this limitation again at some time.

Edit: What about a different definition for class II?

Instead of requiring the vowels to be identical in the CVCV and VCVC pattern, we could remove that constraint. This would still be distinct from class I since a CVC pattern surrounded either by consonants or word boundaries would be excluded from oginiþe cfarðerþ. This modestly increases the size of the associated automaton (689 states for HasTypeIOC), but this is still a viable approach.

flirora · October 24, 2024, 6:31am

Laying the groundwork for docaroginat’ing the cfarðerþ

For resolving type I oginiþe cfarðerþ, I’m going to need something more powerful than the built-in $^rewrite. I’m still ways from my goal, but with some help from trusty old Kaplan & Kay (1994), I’ve started with a version of $^rewrite that supports directional application. The approach used by PyFoma seems to be quite different from the one described in Kaplan & Kay’s paper, though, so I have to figure out myself how I’d implement parallel (“batch”) rules^[1]. I also have an issue open on PyFoma’s repository, but I can only wonder when Mans (or someone else) will reach out back to me.

Alternatively, I could implement the traditional rewrite rule formalism rather than PyFoma’s version, but I want to try and see otherwise. ↩︎

flirora · November 3, 2024, 10:21pm

Planning out Type I deduplication rules

I’ve worked on a spreadsheet for planning out the deduplication of Type I oginiþe cfarðerþ.

The “Consonants” sheet lists the possible start and end sides for each consonant of Class II or higher.
The “Rules-!α” sheet lists the planned rules for handling Type I oginiþe cfarðerþ when the start side is a bridge (i.e. the OC is after the start of the word). In this case, the first instance of the offending consonant is changed. Right now, I have equivalent rules for the consonants that have deduplication rules in v9e, but the ones for the rest are still undetermined. I’m also unsure whether these rules should be allowed to create instances of Type I oginiþe cfarðerþ with class III consonants (recall that existing instances of class III OC are allowed to remain).
If the start side is the first initial of a word, then the second consonant is changed. The sheet for this has yet to have been created.