Mass Producing Minimal Pairs

Although my recent attempts to get something meaningful out of a wikipedia data dump have hit a few walls, I still hope to eventually do something interesting with applying named entity recognition or some sort of (semantic?) dependency parsing to be able to construct simple comprehension questions.

But that’s being shelved for the time being.

The more positive news is that my exploratory research into mass production of minimal pairs is looking very promising, and I’m ready to set up a proof of concept data set for a basic web app.

But first, a bit of background:

I had been looking (on and off for the past 2 years or so) for a bit of research I thought I remembered coming across that investigated the effect of minimal pair training (a pair of words that differ in only one sound, for example: bet/bit) with multiple speakers saying the minimal pairs instead of just one speaker, as is commonly done in commercial language textbooks.

Turns out that nobody I’ve talked to has ever heard of the one particular study I obviously imagined, but I did finally stumble across a body of literature that references this more generally…

It’s called High Variability Phonetic Training (HVPT), and it’s been around for a while. Two interesting blog posts that describe it a bit are this shorter one, and this longer one.

The key takeaways, however, are that the minimal pairs are presented to the listener by a number of different speakers (I’ve seen numbers listed anywhere between 6 and 10), and that the feedback on whether the listener has successfully identified the correct sound difference is immediate.

Now one big disadvantage of this approach is that as the number of speakers you want to have increases, and the number of minimal pairs increases, you need (number of speakers) X (number of minimal pairs) of recordings.

As pointed out in the insightful comment section of the longer blog:

I haven’t heard of this technique before, but I would like to try it in my classrooms. Have any materials been produced, or would I have to try to create all the recordings in my own unpaid time, having recruited unpaid volunteers to be speakers?

So basically the question becomes, who’s going to record all these and compile them and put them all in one place?

…sounds like a job for micromaterials!

As a quick recap of this post, two of the key advantages of micromaterials are that they are automatically produced from available content, and that they can give immediate feedback to the student, wherever an internet connection is available.

What I propose is to automatically generate minimal pairs from transcripts of TED talks, then selectively pull out those particular words into separate audio files. The only limits would be on the number of transcript/audio file pairs we could get (it looks like 2500+ official TED videos, or 90,000+ TEDx videos), and the storage space for all the audio files.

Given that the number of minimal pairs extracted in my testing is around 1% of the total word count of each transcript, and that 250GB of cloud storage runs around $5/month, I don’t think this will necessarily be an issue.

I’ll probably start out with only 20 videos (each with a unique speaker), just to get something up and working. If it scales well, I can easily imagine a system with 200+ speakers, showcasing varying accents (both native AND non-native…Global English is the most common form these days).

The second challenge will be making it into a nice looking web app, though helpfully I can draw on expertise in the local web development community to help me out a bit.

One thought on “Mass Producing Minimal Pairs

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s