Complex Segment Learner

Complex Segment Learner

Boumaa Fijian (Austronesian)

The Fijian simulation is described in detail in the early sections of the paper.
The corpus is from An Crúbadán, which is compiled from online texts. There are quite a few English words in the data so they were eliminated by intersecting the corpus with Celex, and by removing any remaining words that had non-Fijian orthographic consonant clusters. See here for more.

Simulation data at a glance

Click on simulation name to view additional simulation details.

Simulation nameInitial state Learning DataInitial state features
only version LearningData.txt Features.txt

Simulation details for Fijian

Input:

LearningData.txt | Features.txt

Summary of iterations:

IterationLearning Data producedFeatures producedInseparabilityNew Segments addedSegments removed
1 LearningData.txt Features.txt [download] [view] tʃ, mb, nd, ŋɡ b, d, ɡ, ʃ
2 LearningData.txt Features.txt [download] [view] nr, ndʒ ʒ

Summary of inventory changes

StageConsonant set
Inputp t k b d ɡ m n ŋ f s ʃ ʒ l r β ð ʔ w j
Outputp t k m n ŋ f s l r β ð ʔ w j tʃ mb nd ŋɡ nr ndʒ

Simulation Plots

/media/fijian/simulation/insep_plots.png