Observed/Expected Counting Utility
This utility will calculate the Observed/Expected statistic (Trubetzkoy 1939) in a learning data file. The calculation can be over local/adjacent strings only, or over nonlocal/non-adjacent strings. For example, if you are interested in the co-occurrence of [p, t, k], the local calculation will only count one substring of [p a t k a], "t k". The non-local calculation will count both "p, t" and "t, k", as well as "p, k". The formula for the calculation is as follows:
- OBSERVED: this is the simple count of how often a sequence occurs in your data.
- EXPECTED: sums the probability of the first segment being the first segment (out of all segments in consideration); plus the probability of the second segment (out of all segments), divided by the sum of all the attested pairs of segments under consideration.
- In short: O/E = N(S1S2) / (N(S1)*N(S2)/N(all pairs))
- Note that O/E is a relative measure, not an absolute one. For example, the calculation can change for [p t k] if the segments [b d g] are added to the set you are counting, compared to just [p t k].
The requirements for the input file are the same as for the other utilities on this site; see Help.