European / Quebecois French (Indo-European)
European French is traditionally analyzed as having no complex segments, even though it is phonotactically fairly permissive, and allows [tʃ] in word-initial position (alongside [ps], [ks] and other stop-fricative clusters).
By contrast, Quebecois French is described as having the affricates [ts] and [dz], which occur before high front vowels [i, y]. This analysis is supported by some experimental evidence (Béland and Kolinsky 2005, Journal of Multilingual Communication Disorders 3:2, pp. 110-117)
We tried the learner on two data sources: a large corpus of child-directed speech, and the Lexique dictionary of French (where 71K words have transcriptions). We transcribed both datasets as either European French (unmodified transcriptions from Lexique and CHILDES) or as having the affrication rule (replacing t/d with ts/dz before {i, y}). Our learner finds the affricates in the CHILDES corpus, which represents token frequencies in connected speech. Training the learner on the Lexique transcribed as Quebecois does not bring the learner to threshold, although [d z] and [t s] are the most inseparable clusters in the data. (The same result obtains when we train the learner on CHILDES tokenized by word; no complex segments are found.)
Simulation data at a glance
Click on simulation name to view additional simulation details.
Simulation details for French euro lexique
Input:
The data come from Lexique, which is a dictionary with transcriptions provided for about 71,000 words. We did not change anything else in the European version of the dataset.
Summary of iterations:
| Iteration | Learning Data produced | Features produced | Inseparability | New Segments added | Segments removed |
| 1 |
No new learning data |
No new features |
[download] [view] |
None |
None |
Summary of inventory changes
| Stage | Consonant set |
| Input | p b m f v t d s z n l ʃ ʒ ʁ j w ɥ k g ŋ ɲ |
| Output | p b m f v t d s z n l ʃ ʒ ʁ j w ɥ k g ŋ ɲ |
Simulation Plots

Simulation details for French quebec lexique
In this simulation, the learner does not unify [ts] and [dz], but they are at the top in terms of inseparability scores (dz = 0.91, ts = 0.64).
Input:
This dataset is Lexique, with {t, d} replaced by corresponding affricates {t s}, {d z} before {y/i}.
Summary of iterations:
| Iteration | Learning Data produced | Features produced | Inseparability | New Segments added | Segments removed |
| 1 |
No new learning data |
No new features |
[download] [view] |
None |
None |
Summary of inventory changes
| Stage | Consonant set |
| Input | p b m f v t d s z n l ʃ ʒ ʁ j w ɥ k g ŋ ɲ |
| Output | p b m f v t d s z n l ʃ ʒ ʁ j w ɥ k g ŋ ɲ |
Simulation Plots

Simulation details for French euro childes_cds_token
Input:
The corpus of French infant-directed speech comes from CHILDES, and was prepared by Maria Julia Carbajal, Camillia Bouchon, Emmanuel Dupoux, & Sharon Peperkamp (2018). We used their IPA correspondence table to transcribe the files. Since the corpus consists of utterances rather than words, we split the utterances on commas, exclamation marks, question marks, and "unintelligible" signs, so each connected utterance appears without word breaks on its own line. The corpus represents token frequencies rather than type frequencies; i.e., the word "ty" (you) would appear multiple times in connected speech.
Summary of iterations:
| Iteration | Learning Data produced | Features produced | Inseparability | New Segments added | Segments removed |
| 1 |
No new learning data |
No new features |
[download] [view] |
None |
None |
Summary of inventory changes
| Stage | Consonant set |
| Input | p b m f v t d s z n l ʃ ʒ ʁ j w ɥ k g ŋ ɲ |
| Output | p b m f v t d s z n l ʃ ʒ ʁ j w ɥ k g ŋ ɲ |
Simulation Plots

Simulation details for French quebec childes_cds_type
The learner does not find any complex segments to unify when trained on type frequencies from child-directed speech in Quebecois. Note that just as in Lexique, [dz] and [ts] are on top in terms of inseparability (though well below the threshold of 1).
Input:
The corpus of French infant-directed speech comes from
CHILDES, and was prepared by
Maria Julia Carbajal, Camillia Bouchon, Emmanuel Dupoux, & Sharon Peperkamp (2018). We used their IPA correspondence table to transcribe the files. This is the version of the data tokenized by word, so only
type frequencies are represented, i.e., the word "ty" appears just once in the data instead of multiple times.
The Quebecois variant was generated by substituting all occurrences of [t] and [d] before [i, y] with [t s] and [d z], respectively.
Summary of iterations:
| Iteration | Learning Data produced | Features produced | Inseparability | New Segments added | Segments removed |
| 1 |
No new learning data |
No new features |
[download] [view] |
None |
None |
Summary of inventory changes
| Stage | Consonant set |
| Input | p b m f v t d s z n l ʃ ʒ ʁ j w ɥ k g ŋ ɲ |
| Output | p b m f v t d s z n l ʃ ʒ ʁ j w ɥ k g ŋ ɲ |
Simulation Plots

Simulation details for French quebec childes_cds_token
This simulation is the only one where the learner succeeds in identifying [ts] and [dz] in Quebecois.
Input:
The corpus of French infant-directed speech comes from
CHILDES, and was prepared by
Maria Julia Carbajal, Camillia Bouchon, Emmanuel Dupoux, & Sharon Peperkamp (2018). We used their IPA correspondence table to transcribe the files. Since the corpus consists of utterances rather than words, we split the utterances on commas, exclamation marks, question marks, and "unintelligible" signs, so each connected utterance appears without word breaks on its own line. This is the version of the data where token rather than type frequencies are represented, i.e., the word "ty" appears multiple times and not just once.
The Quebecois variant was generated by substituting all occurrences of [t] and [d] before [i, y] with [t s] and [d z], respectively.
Summary of iterations: