Please excuse our look. We're just getting started here.

Want to learn more about Twilio Forums? Check out our FAQ page here.

Caption (Speech-To-Text) support for Twilio Programmable Video

Is there anybody able to support caption for Twilio Programmable Video using Twilio ecosystem or Google Cloud?

Any insights would be appreciated.


  • sbrijmohan
    sbrijmohan admin
    edited August 2021

    Hi @sotheara — this isn't something I've tried yet. I think it is going to be tricky, it would probably require a good deal of knowledge around webRTC so that you can extract the raw audio from the participants to send to a speech-to-text service. Technically, it should be possible to use Google Cloud speech to text there is more information on it in this article, but I'm not entirely sure how much of a lift that is.

    @pnash do you have any insight on this?

  • Hey @sotheara, this is something I've thought about before, but I've not actually built myself, so all I have is ideas.

    I am assuming that what you want here is to capture the audio from each participant in a conversation, send that off to a speech-to-text service, then take the result and send it to the other participants in the room so that it can be displayed on their screen, over the video as a caption. If you are looking for something different, then let me know, but that's what I'm going with.

    Twilio doesn't have an in-house speech-to-text capability, so we will want to use another service, like Google Cloud's speech-to-text. In the Chrome browser you can actually access this service for free using the WebSpeech API. I wrote an article that shows you how to translate speech-to-text in browsers that support the WebSpeech API here. In browsers that don't support this, you will need to capture the audio and send it off to the transcription service yourself, this seems like quite a good blog post that explains how to do that.

    Once you receive the result for each participant, you need to then send the transcribed text to the other participants in the room. The Video SDK provides a way to send arbitrary data to other participants using the DataTrack API. There's a good blog post on how to connect participants using the DataTrack API here.

    The DataTrack API is ephemeral, so it doesn't store the text you are sending. If you want something more permanent you could add the Conversations SDK to the application and send the transcribed messages as if they were chat messages. This blog post will show you how to add Conversations to a Twilio Video room.

    So, like I said, I haven't done most of this and these are suggestions. I hope that it perhaps points you in the right direction though.

  • @pnash Thank you so much for your ideas. Will bring it to the team and let's see what we can do. I'll update the progress once I have one. :smile:

  • @pnash Hello again! I hope you still remember me :) I would like to thank you and share the updates that we are able to implement a working solution with caption using your direction.

    Although, right now we only support only one stream and Firefox seems to cause us some limitation because they do not support sampleRate and sampleSize from their getUserMedia() constraints; not sure if you have any idea around it.

    Again, thank you so much 🙏

  • Hey @sotheara, so glad to hear it's working!

    I'm not sure I understand what the problem in Firefox is though? It does not support those media constraints, you're right, but how does that affect the captions?

  • Hello @pnash

    We need to down sampling the audio to 16000Hz for better result with Google Speech-To-Text. The AudioContext does support sampleRate but getUserMedia() does not so Firefox sees them as different sampling rate which is correct since getUserMedia() chooses the default one which most likely to be 44000Hz and it just throws unsupported exception

  • This is old, but it's the first thing I found. You could use it to resample the audio in browsers that don't support sampleRate in the media constraints.

If this is an emergency, please contact Twilio Support. This is not an official Support channel.
Have an urgent question?
Please contact Twilio Support. This is not an official Support channel.
Contact Support