Complex Segment Learner

Complex Segment Learner

Bolivian Quechua (Quechuan)

Quechua is the last case study discussed in the paper, and the one that most clearly highlights the problem of the nature of the learning data. As described in the paper, the simulations on word corpora fail in various ways to arrive at the posited (uncontroversial) inventory of affricates. The reason is that the phonotactics and morphology of Quechua conspire to inflate the frequencies of certain clusters, diluting the frequencies of affricates. Reasonable-looking results arrive only when the learner is trained on more abstract data: roots or a morpheme list. Quechua is also a case where we tried to decompose aspirated and ejective plosives into more primitive parts. The learner does not find these segments when trained on words.

Simulation data at a glance

Click on simulation name to view additional simulation details.

Simulation nameInitial state Learning DataInitial state features
Words Broad LearningData.txt Features.txt
Roots LearningData.txt Features.txt
Morphemes Broad LearningData.txt Features.txt
Words_With_Mb LearningData.txt Features.txt
Words Narrow LearningData.txt Features.txt
Words Glot_Feats_As_Segs LearningData.txt Features.txt
Morphemes Narrow LearningData.txt Features.txt

Simulation details for Quechua words broad

Input:

This word list was compiled from an online Quechua newspaper. See Gouskova and Gallagher 2019 for further details.

LearningData.txt | Features.txt

Summary of iterations:

IterationLearning Data producedFeatures producedInseparabilityNew Segments addedSegments removed
1 LearningData.txt Features.txt [download] [view] ŋ, ɴ
2 LearningData.txt Features.txt [download] [view] tʃ', sq, ɲtʃ, jk ʃ'
3 LearningData.txt Features.txt [download] [view] nk None
4 LearningData.txt Features.txt [download] [view] tʃʰ, nt, rq ʃʰ
5 LearningData.txt Features.txt [download] [view] nq None
6 LearningData.txt Features.txt [download] [view] xt, mp, jt None
7 LearningData.txt Features.txt [download] [view] sp, ʎp' None
8 LearningData.txt Features.txt [download] [view] rp None
9 LearningData.txt Features.txt [download] [view] rl None
10 No new learning data No new features [download] [view] None None

Summary of inventory changes

StageConsonant set
Inputp t k q p' t' ʃ' k' q' pʰ tʰ ʃʰ kʰ qʰ s ʃ x h m n ɲ r l ʎ j w ŋ ɴ
Outputp t k q p' t' k' q' pʰ tʰ kʰ qʰ s ʃ x h m n ɲ r l ʎ j w tʃ tʃ' sq ɲtʃ jk nk tʃʰ nt rq nq xt mp jt sp ʎp' rp rl

Simulation Plots

/media/quechua/words/broad/simulation/insep_plots.png


Simulation details for Quechua roots

Input:

This list of roots was prepared by Gillian Gallagher from the Laime Ajacopa (2007) dictionary of Quechua. See Gouskova and Gallagher 2019 for details and references.

LearningData.txt | Features.txt

Summary of iterations:

IterationLearning Data producedFeatures producedInseparabilityNew Segments addedSegments removed
1 LearningData.txt Features.txt [download] [view] tʃ', tʃʰ, tʃ ʃ', ʃʰ, ɲ, ŋ, ɴ
2 No new learning data No new features [download] [view] None None

Summary of inventory changes

StageConsonant set
Inputp t k q p' t' ʃ' k' q' pʰ tʰ ʃʰ kʰ qʰ s ʃ x h m n ɲ r l ʎ j w ŋ ɴ
Outputp t k q p' t' k' q' pʰ tʰ kʰ qʰ s ʃ x h m n r l ʎ j w tʃ' tʃʰ tʃ

Simulation Plots

/media/quechua/roots/simulation/insep_plots.png


Simulation details for Quechua morphemes broad

Input:

This morpheme list is tokenized from the morpheme-segmented word list. Nasal place assimilation is not transcribed.

LearningData.txt | Features.txt

Summary of iterations:

IterationLearning Data producedFeatures producedInseparabilityNew Segments addedSegments removed
1 LearningData.txt Features.txt [download] [view] tʃ', tʃʰ, tʃ ʃ', ʃʰ
2 No new learning data No new features [download] [view] None None

Summary of inventory changes

StageConsonant set
Inputp t k q p' t' ʃ' k' q' pʰ tʰ ʃʰ kʰ qʰ s ʃ x h m n r l ʎ j w
Outputp t k q p' t' k' q' pʰ tʰ kʰ qʰ s ʃ x h m n r l ʎ j w tʃ' tʃʰ tʃ

Simulation Plots

/media/quechua/morphemes/broad/simulation/insep_plots.png


Simulation details for Quechua words_with_mb

Input:

LearningData.txt | Features.txt

Summary of iterations:

IterationLearning Data producedFeatures producedInseparabilityNew Segments addedSegments removed
1 LearningData.txt Features.txt [download] [view] tʃ, sq, jk, ŋk, ɴq None
2 LearningData.txt Features.txt [download] [view] tʃ', tʃʰ, nt, ɲtʃ, rq, ʎp', ŋk' ʃ', ʃʰ
3 LearningData.txt Features.txt [download] [view] sp, xt, rp, rl, ɴqʰ None
4 LearningData.txt Features.txt [download] [view] sk', xs, mp, nt', rqʰ, ʎq, jk' None
5 LearningData.txt Features.txt [download] [view] sk, sq', xʎ, jt, jw None
6 LearningData.txt Features.txt [download] [view] ʃk, mp', rk', rm, lt, ʎp, jʎ, wp, ws None
7 LearningData.txt Features.txt [download] [view] st', ʃkʰ, xr, xtʃ, ɲtʃ', ʎtʃ', jq, jn None
8 LearningData.txt Features.txt [download] [view] xt', xtʃ', rk, rq', lw, jm, jtʃ' None
9 LearningData.txt Features.txt [download] [view] sp', xj, mpʰ, rkʰ, lm, jq', jr None
10 LearningData.txt Features.txt [download] [view] ʃw, xp, xw, rw, js None
11 LearningData.txt Features.txt [download] [view] nr, rt, ʎm None
12 LearningData.txt Features.txt [download] [view] st, xn, ms, ʎk, jtʃ None
13 LearningData.txt Features.txt [download] [view] sm, xm None
14 LearningData.txt Features.txt [download] [view] ns None
15 No new learning data No new features [download] [view] None None

Summary of inventory changes

StageConsonant set
Inputp t k q p' t' k' q' pʰ tʰ kʰ qʰ s ʃ ʃ' ʃʰ x h m n ɲ r l ʎ j w ŋ ɴ
Outputp t k q p' t' k' q' pʰ tʰ kʰ qʰ s ʃ x h m n ɲ r l ʎ j w ŋ ɴ tʃ sq jk ŋk ɴq tʃ' tʃʰ nt ɲtʃ rq ʎp' ŋk' sp xt rp rl ɴqʰ sk' xs mp nt' rqʰ ʎq jk' sk sq' xʎ jt jw ʃk mp' rk' rm lt ʎp jʎ wp ws st' ʃkʰ xr xtʃ ɲtʃ' ʎtʃ' jq jn xt' xtʃ' rk rq' lw jm jtʃ' sp' xj mpʰ rkʰ lm jq' jr ʃw xp xw rw js nr rt ʎm st xn ms ʎk jtʃ sm xm ns

Simulation Plots

/media/quechua/words_with_mb/simulation/insep_plots.png


Simulation details for Quechua words narrow

Input:

This is the same word list as "words broad", but with uvular retraction transcribed on sonorants.

LearningData.txt | Features.txt

Summary of iterations:

IterationLearning Data producedFeatures producedInseparabilityNew Segments addedSegments removed
1 LearningData.txt Features.txt [download] [view] tʃ, jk, ŋk, ɴq, r̞q None
2 LearningData.txt Features.txt [download] [view] tʃ', sq, nt, ɲtʃ, ŋk', r̞qʰ, ʎ̞q, j̠̠q ʃ', ʎ̞
3 LearningData.txt Features.txt [download] [view] tʃʰ, xt, mp, jt, ɴqʰ, r̞q', j̠̠q' ʃʰ, r̞, j̠̠
4 LearningData.txt Features.txt [download] [view] sp, rp, ʎp' None
5 LearningData.txt Features.txt [download] [view] rl None
6 No new learning data No new features [download] [view] None None

Summary of inventory changes

StageConsonant set
Inputp t k q p' t' ʃ' k' q' pʰ tʰ ʃʰ kʰ qʰ s ʃ x h m n ɲ r l ʎ j w ŋ ɴ r̞ l̞ ʎ̞ j̠̠ w̞
Outputp t k q p' t' k' q' pʰ tʰ kʰ qʰ s ʃ x h m n ɲ r l ʎ j w ŋ ɴ l̞ w̞ tʃ jk ŋk ɴq r̞q tʃ' sq nt ɲtʃ ŋk' r̞qʰ ʎ̞q j̠̠q tʃʰ xt mp jt ɴqʰ r̞q' j̠̠q' sp rp ʎp' rl

Simulation Plots

/media/quechua/words/narrow/simulation/insep_plots.png


Simulation details for Quechua words glot_feats_as_segs

This simulation demonstrates that not only does it not help to break down aspirated and ejective plosives into sequences in this case--it actually makes things worse. Certain laryngeally marked stops [kʰ, t'] are so infrequent that they fail to be unified, whereas clusters common in morphologically complex words do get unified.

Input:

This version of the Quechua word list transcribes ejectives and aspirates as sequences with glottal stops and [h] respectively.

LearningData.txt | Features.txt

Summary of iterations:

IterationLearning Data producedFeatures producedInseparabilityNew Segments addedSegments removed
1 LearningData.txt Features.txt [download] [view] t˗ʃ, ŋk, ɴq t˗, ŋ, ɴ
2 LearningData.txt Features.txt [download] [view] sq, ɲt˗ʃ, jk None
3 LearningData.txt Features.txt [download] [view] qh, nt None
4 LearningData.txt Features.txt [download] [view] rq None
5 No new learning data No new features [download] [view] None None

Summary of inventory changes

StageConsonant set
Inputp t k q t˗ s ʃ x ʔ h m n ɲ r l ʎ j w ŋ ɴ
Outputp t k q s ʃ x ʔ h m n ɲ r l ʎ j w t˗ʃ ŋk ɴq sq ɲt˗ʃ jk qh nt rq

Simulation Plots

/media/quechua/words/glot_feats_as_segs/simulation/insep_plots.png


Simulation details for Quechua morphemes narrow

Input:

This version was created by tokenizing the morpheme-segmented wordlist. Nasal place assimilation is transcribed.

LearningData.txt | Features.txt

Summary of iterations:

IterationLearning Data producedFeatures producedInseparabilityNew Segments addedSegments removed
1 LearningData.txt Features.txt [download] [view] tʃ', tʃʰ, tʃ, ŋk, ɴq ʃ', ʃʰ, r̞, l̞, ʎ̞, j̠̠, w̞
2 LearningData.txt Features.txt [download] [view] ʃk, nt, ɲtʃ, ŋk' None
3 No new learning data No new features [download] [view] None None

Summary of inventory changes

StageConsonant set
Inputp t k q p' t' ʃ' k' q' pʰ tʰ ʃʰ kʰ qʰ s ʃ x h m n ɲ r l ʎ j w ŋ ɴ r̞ l̞ ʎ̞ j̠̠ w̞
Outputp t k q p' t' k' q' pʰ tʰ kʰ qʰ s ʃ x h m n ɲ r l ʎ j w ŋ ɴ tʃ' tʃʰ tʃ ŋk ɴq ʃk nt ɲtʃ ŋk'

Simulation Plots

/media/quechua/morphemes/narrow/simulation/insep_plots.png