Complex Segment Learner

Zezuru Shona (Niger-Congo)

Shona is a term for several languages of Zimbabwe, which differ primarily in their inventories of complex segments. We focus on Zezuru Shona, which is the dialect described by Fortune and which has been the focus of some debates (see the Kadenge and Maddieson works cited in our paper). Our paper only summarizes the main results concerning the segments with the most complexity, but as you can see from the simulation details below, there is quite a lot more to say about Shona.

The source of our data is the excellent ALLEX project, which hosts a digital version of Chimhundu's dictionary. We converted the orthography into transcriptions following Fortune (1980); the script and all the materials are on GitHub.

Simulation details for Shona


Summary of iterations:

IterationLearning Data producedFeatures producedInseparabilityNew Segments addedSegments removed
1 LearningData.txt Features.txt [download] [view] pf, tʃ, dz, dʒ, mb, nd, ŋg None
2 LearningData.txt Features.txt [download] [view] ts, tʂ, kw, bw, bʋ, dʐ, gw, ɦw, nz, nʐ, ɲdʒ, dʒg, mbʒ None
3 LearningData.txt Features.txt [download] [view] pkw, tkw, skw, ʃw, mw, mʋ, nw, ɲŋ, rw, tʃk, dzgw, ŋgw, tskw, nzgw, ɲdʒg None
4 LearningData.txt Features.txt [download] [view] st, zgw, ŋw, tʃkw, dʒgw, ndw None
5 No new learning data No new features [download] [view] None None

Summary of inventory changes

StageConsonant set
Inputp t k b d g bʱ dʱ f s ʃ ʂ ɦ vʱ z ʐ ʒ m mʱ n nʱ ɲ ŋ r l j w ʋ
Outputp t k b d g bʱ dʱ f s ʃ ʂ ɦ vʱ z ʐ ʒ m mʱ n nʱ ɲ ŋ r l j w ʋ pf tʃ dz dʒ mb nd ŋg ts tʂ kw bw bʋ dʐ gw ɦw nz nʐ ɲdʒ dʒg mbʒ pkw tkw skw ʃw mw mʋ nw ɲŋ rw tʃk dzgw ŋgw tskw nzgw ɲdʒg st zgw ŋw tʃkw dʒgw ndw

