Complex Segment Learner

Complex Segment Learner

Sengwato / Standard Tswana (Niger-Congo)

Mentioned in the paper only in passing, Tswana is a particularly rich case where the argumentation from phonotactics is reasonably clear--but its labiocoronals are controversial because they are typologically rare. This case study demonstrates that what is rare cross-linguistically is also rare intra-linguistically; the labiocoronal sounds found in the Sengwato dialect occur in just a handful of morphemes. Since this dialect does not have a dictionary, we did the best we could to create a corpus for it from the only electronic dictionary of Tswana available to us. The very thorough description of the dialect in One Tlale's 2005 dissertation was an invaluable aid.

While related to the Zimbwabwean Shona languages, Tswana differs in affording its nasals syllabic and tone-bearing status. As a result, Tswana is not usually analyzed as having prenasalized consonants. It is, however, analyzed as having labialized nasals. Our learner finds [ŋw] and [nw], but the corpus had no instances of [ɲ w], so the learner did not find them.

Yet another difference between Zezuru Shona and Sengwato Tswana is that the latter has a peculiarly gapped inventory of consonants. Because just about all the sequences of consonants are analyzed as complex segments in Tswana, the gappy inventory is not a problem for the learner.

Simulation data at a glance

Click on simulation name to view additional simulation details.

Simulation nameInitial state Learning DataInitial state features
Standard LearningData.txt Features.txt
Sengwato LearningData.txt Features.txt

Simulation details for Tswana standard

Input:

This is a more or less unmodified version of Creissels' 1996 dictionary. See here for details.

LearningData.txt | Features.txt

Summary of iterations:

IterationLearning Data producedFeatures producedInseparabilityNew Segments addedSegments removed
1 LearningData.txt Features.txt [download] [view] dʒ, ts, tsh, tɬ, tɬh, kx, ŋw sh, ɸ, ʒ
2 LearningData.txt Features.txt [download] [view] tʃ, tʃh, kw, sw, xw, rw, lw, tsw, tshw, tɬw, kxw ʃh
3 LearningData.txt Features.txt [download] [view] tw, thw, nw, tɬhw None
4 No new learning data No new features [download] [view] None None

Summary of inventory changes

StageConsonant set
Inputh b p ph d t th k kh s sh x ɸ ʃ ʒ ʃh ɬ ɬh m n ŋ ɲ r l j w
Outputh b p ph d t th k kh s x ʃ ɬ ɬh m n ŋ ɲ r l j w dʒ ts tsh tɬ tɬh kx ŋw tʃ tʃh kw sw xw rw lw tsw tshw tɬw kxw tw thw nw tɬhw

Simulation Plots

/media/tswana/standard/simulation/insep_plots.png


Simulation details for Tswana sengwato

Input:

The Sengwato corpus was created by converting Denis Creissels' 1996 electronic dictionary from CBOLD into Sengwato by script. Since the standard dialect has neutralized labiocoronal consonants with affricates (e.g., Sengwato [mptʃa] 'dog' is [ntʃa] in other dialects), we had to add/fix some words by hand, based on One Tlale Boyer's 2005 Georgetown thesis. See for more.

LearningData.txt | Features.txt

Summary of iterations:

IterationLearning Data producedFeatures producedInseparabilityNew Segments addedSegments removed
1 LearningData.txt Features.txt [download] [view] dʒ, ts, tsh, tʃh, kx, ŋw sh
2 LearningData.txt Features.txt [download] [view] pʃh, tʃ, tw, thw, kw, sw, xw, ɸʃ, rw, lw, tsw, tshw, kxw ʃh
3 LearningData.txt Features.txt [download] [view] nw None
4 No new learning data No new features [download] [view] None None

Summary of inventory changes

StageConsonant set
Inputh b p ph d t th k kh s sh x ɸ ʃ ʒ ʃh m n ŋ ɲ r l j w
Outputh b p ph d t th k kh s x ɸ ʃ ʒ m n ŋ ɲ r l j w dʒ ts tsh tʃh kx ŋw pʃh tʃ tw thw kw sw xw ɸʃ rw lw tsw tshw kxw nw

Simulation Plots

/media/tswana/sengwato/simulation/insep_plots.png