Complex Segment Learner

Complex Segment Learner

American English (Indo-European)

English is discussed in its own section in the paper, which reviews the arguments for its two purported affricates and discusses the data sources we used. Briefly, the arguments are not particularly strong, and the results of our simulations are correspondingly fragile. Finding [tʃ] requires narrow transcription of the stop portion. While this can be motivated, it is not necessary in other cases.

Simulation data at a glance

Click on simulation name to view additional simulation details.

Simulation nameInitial state Learning DataInitial state features
Celex Broad LearningData.txt Features.txt
Cmu Narrow LearningData.txt Features.txt
Celex Narrow LearningData.txt Features.txt
Cmu Broad LearningData.txt Features.txt

Simulation details for English celex broad

Input:

Data source: Celex lemmas. The Celex transcriptions assume British English pronunciations.

LearningData.txt | Features.txt

Summary of iterations:

IterationLearning Data producedFeatures producedInseparabilityNew Segments addedSegments removed
1 LearningData.txt Features.txt [download] [view] None
2 No new learning data No new features [download] [view] None None

Summary of inventory changes

StageConsonant set
Inputp b m f v θ ð t d s z n l ʃ ʒ ɹ j k g ŋ h w
Outputp b m f v θ ð t d s z n l ʃ ʒ ɹ j k g ŋ h w dʒ

Simulation Plots

/media/english/celex/broad/simulation/insep_plots.png


Simulation details for English cmu narrow

Input:

Same data as for "broad", same manipulation as for Celex except that an across-the-board search-and-replace was used for [t ʃ] and [d ʒ]. CMU does not have morpheme boundaries in any form.

LearningData.txt | Features.txt

Summary of iterations:

IterationLearning Data producedFeatures producedInseparabilityNew Segments addedSegments removed
1 LearningData.txt Features.txt [download] [view] cʃ, ɟʒ None
2 No new learning data No new features [download] [view] None None

Summary of inventory changes

StageConsonant set
Inputp b m f v θ ð c ɟ t d s z n l ʃ ʒ ɹ j k g ŋ h w
Outputp b m f v θ ð c ɟ t d s z n l ʃ ʒ ɹ j k g ŋ h w cʃ ɟʒ

Simulation Plots

/media/english/cmu/narrow/simulation/insep_plots.png


Simulation details for English celex narrow

Input:

This is the same dataset as "broad", but with [t ʃ] and [d ʒ] replaced by [c ʃ] and [ɟ ʒ] respectively. Celex shows morpheme boundaries as syllabification in cases like "pot shot", so those [t ʃ] sequences were transcribed with alveolar first halves. See details here.

LearningData.txt | Features.txt

Summary of iterations:

IterationLearning Data producedFeatures producedInseparabilityNew Segments addedSegments removed
1 LearningData.txt Features.txt [download] [view] cʃ, ɟʒ ɟ
2 No new learning data No new features [download] [view] None None

Summary of inventory changes

StageConsonant set
Inputp b m f v θ ð c ɟ t d s z n l ʃ ʒ ɹ j k g ŋ h w
Outputp b m f v θ ð c t d s z n l ʃ ʒ ɹ j k g ŋ h w cʃ ɟʒ

Simulation Plots

/media/english/celex/narrow/simulation/insep_plots.png


Simulation details for English cmu broad

Input:

Data source: Carnegie Mellon University pronunciation dictionary, as prepared by Bruce Hayes and Jamie White.

LearningData.txt | Features.txt

Summary of iterations:

IterationLearning Data producedFeatures producedInseparabilityNew Segments addedSegments removed
1 LearningData.txt Features.txt [download] [view] None
2 No new learning data No new features [download] [view] None None

Summary of inventory changes

StageConsonant set
Inputp b m f v θ ð t d s z n l ʃ ʒ ɹ j k g ŋ h w
Outputp b m f v θ ð t d s z n l ʃ ʒ ɹ j k g ŋ h w dʒ

Simulation Plots

/media/english/cmu/broad/simulation/insep_plots.png