Â鶹ĘÓƵ

Skip to main content

Google Docs Speech-to-Text: Observations and Updates

How to access Voice Typing in Google Docs

There is a mutual relationship between the advancement of technology and the ongoing evolution of language. Technology shapes how we communicate with one another, and language itself informs and inspires developers to create human-like input and output of linguistic data. Due to this interplay and the ubiquitous presence of technology in our students’ lives, it is important for us as ESL practitioners to be aware of and to take advantage of this mesh of language and technology for the benefit of our students.

One example of a developing linguistic technology is Google’s Speech to Text feature in Google Docs, which can give students autonomy in recognizing areas of improvement in spoken English production. I will share my experiences using this product in various contexts at the International English Center as well as some strengths and weaknesses of the product as they pertain to student learning outcomes.

The goal of this software is to recognize spoken language and transcribe it into text on the Google Docs application. It can be used on a PC/Mac with a microphone input or on the Google Docs application on a smartphone. As a teaching tool, students can read scripted language into the microphone (e.g. a news article, target grammar examples from their textbook, presentation script, etc.) and Google’s voice typing algorithm “types” the perceived spoken words and phrases. One assumption is that if the transcribed words or phrases are not exactly what the student intended to say, then there could be inaccuracies in their spoken English.

The technology is far from perfect, but it gives students the chance to see how their own language production is perceived by this software through the relationship between what they intended to say and what the computer “heard”. This process can be both rewarding and infuriating for the students. Here are some takeaways from teaching with this technology:

Patience and Context

It can be very frustrating for students when they repeat the same word or phrase, and the software continues to transcribe a word that they did not mean to say. For instance, when a student who continues to say the word dog, Google could repeatedly transcribe it as dock. You will then have a transcript that looks like this and a very frustrated student:

            Dock dock dock dock dock

Instead, the student could contextualize the word, since Google’s algorithm will most likely understand:

            She has a cute dog.

Google seems to get it right when the word is spoken in a sentence context. This will prevent student fatigue and frustration.

Reading and Spacing

In my experience, it is best if the students practice with this technology when they have something written before they begin. For instance, if students perform “free speech” into the microphone, especially something extended, it is easy for students to forget what they initially said. Having a script beforehand can help students compare what they meant to say with what Google actually heard.

When students are reading a transcript, have them pause the microphone after every sentence or long clause, then activate the microphone and read the next.  In addition to reinforcing meaningful phrasing for the student, this spacing or chunking assists the nascent Google robot in comprehension.

Student can also say the words Comma and Period to insert punctuation, as another option to reinforce phrasing, if needed.

 

Technology that Models Output

When students work on this project independently and they do not have an instructor on hand to show how a given word is pronounced, there are several applications that can model pronunciation. First, students can type the target word in Google plus the word definition. They can click on the little speaker icon and the computer will read the word aloud and give the pronunciation. 

For phrases, sentences, and longer samples of language, there are many text to speech platforms. With this, the user can highlight a chunk of text and the program will “read aloud” that sample. The output sounds like a robot, but it does accurately pronounce words and phrases correctly, including syllable/word stress, and other prosodic features of English. One example of a pronunciation platform is Office 365 Word Online Immersive Reader does this. Here is a screengrab of this feature: 

Another program is a Chrome Extension called These technology tools can help students practice the pronunciation of words or phrases independently by hearing them, then mirroring the language into Google Docs speech to text; thus integrating both listening and speaking skills with engaging technology. 

Student Feedback

Here are some samples of student feedback about this technology:

“Practicing with Google is quite fun, and it can help to know the problem with my speaking. However, Google Docs sometimes processes data slowly and I must speak slowly to make sure it can catch all the words.” – ESLG Pronunciation Student.

“People, especially native [speakers], can understand me even when I speak in wrong way with wrong pronunciation, but Google doesn’t. So I think Google can be a quite harsh teacher.” – ESLG Pronunciation Student

 â€śWhen I read the same sentence several times, I noticed that the words I intended and the words recognized by Google Docs were [not] consistent. Therefore, I think that it is effective to read the same sentence several times and choose the part where the error is pointed out constantly and to practice the part intensively.” – ESLG Pronunciation Student

“Google documents has been a tool that has helped me to practice my pronunciation a lot. My experience as a non-native speaker at the beginning was difficult because you have to repeat the same mistake several times and it’s [tiring], but once you achieve the correct pronunciation the satisfaction is enormous.” â€“  A2 L/S Student

Conclusion

As you use this technology, remember that like most new technologies, it is not perfect and should not be a summative evaluation of their pronunciation. As an emerging tool, it can have an effective purpose to link technology and language production for students in an interesting and autonomous way.