Captions, subtitles, translations, and interpretation for virtual events

This page provides an overview of the options available to you to provide subtitles, closed captions, translations, transcription, or sign language interpretation for your online events, whether you are streaming live video or prerecorded content.


Using prerecorded video provides the greatest flexibility and accuracy for various captioning workflows. If you’re hosting a live event and are unable to provide realtime captions, consider making the captioned recording available afterwards. The only service for which prerecording content does not facilitate a wider variety of workflow options is sign language interpretation, which is typically done live or at the time the event is recorded.


Subtitles or closed captions are preferable to transcriptions when possible. What’s the difference between subtitles, captions, and transcriptions? However, providing a transcription is still more accessible than not providing anything, and having a script or transcription will make the process of generating captions faster and more accurate if you do end up adding them later on.


Choosing to prioritize accessibility as part of an overall culture of inclusion is not separate from other ways of recognizing the ways our organizations can seek in each decision to generate equal opportunities and minimize injustice. If you can afford to hire a professional captioning editor or translator, consider hiring a freelancer or small business directly, or examine the payment and hiring practices of any company you consider contracting. Ask yourself how the values you have chosen to prioritize might be made actionable in your hiring process; for example, if you have made a commitment to sensory accessibility, consider contracting an organization that itself is owned or run by people who themselves have a sensory disability.

For Live Video

Subtitles, closed captions, or translations

Providing live subtitling or translation requires relatively specialized skills. You’ll likely need to hire a professional or third-party service. Your streaming platform also must support closed caption capability. ¹

There are likely to be alternative automated options for subtitles as the technology improves, indeed, a handful of services do this already for common platforms like Zoom and Youtube.



Subtitles or closed captions are preferable to transcriptions when possible, but providing transcriptions is better than not providing anything at all! Once you have a transcription, you can also more easily and accurately generate captions after the fact if you are making the recording available after your event. There are two major options for providing transcription for live events:


  1. Hire a professional or third-party service. The major advantage is its higher accuracy. Its drawback is the higher cost (on average about $90-180 per hour).
  2. Use an automated speech-to-text service. The major advantage is its lower cost (on average about $1.25 per minute although Otter AI provides monthly plans for $20). Its drawback is lower accuracy, especially when transcribing voices with non-English accents or the voices of women and people of color, as a result of biases in the AI training sets, or in settings with ambient noise or sounds. Not all languages may be supported by all services, but many widely spoken world languages are.

One of the most affordable options for live automated transcription is Otter AI. This service is included for Zoom at the monthly subscription rate of $20-30, and support for streaming to other platforms may be available for an additional fee.


Unless members of your staff or volunteers are already accustomed to doing live transcription professionally, this remains a challenging task even for fast typers so is difficult to provide in-house.


¹Search for details in your platform or ask during a demo. Example requirements/instructions for live captioning in Youtube, in Zoom, or in Facebook Live. This process typically involves sending the captions as HTTP POST from captioning software to your streaming platform.


Sign Language Interpretation

Sign language interpretation may be provided onsite, e.g. where the interpreter is in the same room as the performer or speaker; or remotely, in which case it is commonly referred to as video remote interpretation (VRI).


  • Costs for VRI may be around $3.50 per minute or starting from around $200 per hour.
  • Costs for onsite sign language interpretation of a live streamed event vary by location and other factors such as time of day. You can expect to pay in the ballpark of about $250 per hour, although providers may have minimum event lengths or price by the day rather than hourly.


If it’s your first time providing sign language interpretation, here are a few things you should know:

    • There are many sign languages! Check whether your provider offers the sign language local to or most commonly used among your intended audience. Ask your community or let them request specific languages if you’re not sure.
    • As in subtitling and translations, the cost of sign language interpretation both onsite or remote may be higher for events using specialized language or jargon.
    • Whether onsite or remote, interpreters typically work in pairs (simultaneous interpreters) for each active language combination and rotate off every 20-30 minutes. Examples of active language combinations are: English to American Sign Language, or Spanish to American Sign Language, or English to XYZ).

For Prerecorded Video

Subtitles, closed captions, translations, or transcripts

There is a wider range of options available for providing assistive text for recorded videos compared to live streamed ones. They include the following:

  1. Hire a professional or third-party service.
  2. Use an automated speech-to-text transcription or subtitling service. The major advantage is its lower cost (on average about $10-12 per hour, though in some use cases free). Its drawback is lower accuracy especially when transcribing voices with non-English accents or the voices of women and people of colour, as a result of biases in the AI training sets, or in settings with ambient noise or sounds. Not all languages may be supported by all services, but many widely spoken world languages are. This option cannot generate captions, but it can generate subtitles, transcripts, and in some cases translations.
  3. Do it yourself, either from scratch or in combination with the use of a speech-to-text service.

If your organization can afford to, we recommend hiring a professional or third-party service. This is especially helpful if you need fast turnarounds or have a large amount of material, for example for a festival or industry conference. Just be attentive to predatory pricing by a handful of international companies in this market, which often is backed by exploitative labor practices.


Otherwise, we’d recommend using an automated speech-to-text service for an initial pass that is then edited and optimized by a member of your organization. This prioritizes quality captions and can be done at a minimal cost. See details below about creating subtitles, closed captions, or translations yourself for prerecorded videos.


Specific free workflows for common platforms

If you are hosting your video on Youtube, Youtube will generate automated audio subtitles.

If you used Zoom to record some or all of your content and have a Business, Education, or Enterprise license, Zoom can generate automated audio transcription for recordings saved to the cloud. These can later be embedded into the video either as a transcript or subtitles. Learn more about using Zoom to generate audio transcripts.


Sign Language Interpretation

We haven’t seen sign language interpretation of prerecorded video commonly offered as a service, but you could discuss your needs with your sign language interpreter. In most cases you would hire a sign language interpreter during the filming of your event, and can then incorporate that video into your final production.

Creating subtitles, closed captions, or translations yourself for prerecorded videos

If you’re using YouTube

As of the time of writing, Youtube provides the most robust free service for both automating and editing subtitling, captioning, and/or translations. If Youtube is one of your destinations for your video, then you can create subtitles, captions, and/or translations directly within Youtube Studio.


If you do not wish to use Youtube as a final destination for your video, you can still upload your content as a private video to utilize these features and then download the caption file for use in another platform.

Workflow alternatives to YouTube

There are just two main steps to the process of creating subtitles or closed captions (whether in the original language of the video or a translation): create the subtitle file, and add it to your video.


1. Create the subtitle file

A subtitle file is a simple text file that abides by a specific formatting structure. One common cross-platform subtitle file format is SRT.


Rather than building this file from scratch, there are many services to help make it easier to create and edit subtitle files so that they line up with the spoken text. They may also make it easier to support closed captioning features like the identification of different speakers.


This includes downloadable applications, many of which are also opensource, like autoEdit, Y, and Z, as well as web-based platforms. Most web platforms for creating and editing subtitles also enable you to pay for automated transcription as a first step in the process. This can save significant time.³ Just be aware that some platforms actually require you to pay for a first automated pass before you can use their subtitle editor, either charged by the minute or a subscription (e.g.,,, and, which may or may not fit your needs. One platform which permits both free manual subtitle editing and automated transcription which can be edited is This is a great choice if you already have a script or subtitle file to work from as a starting point. For example, you could download a subtitle file from Youtube’s free automated service, upload it into the platform, and make manual edits and review from there.


Pro tip: If you already have a script or transcript, upload it into your subtitle editing program before running any subtitle generation. Many programs permit this and this will help improve accuracy.


³Typically, creating subtitles entirely from scratch typically takes about 5-10 times the duration of the video, even when using user-friendly software to generate the timestamps. By contrast, when manually editing/proofing an automatically generated transcript, expect it to take about three times the duration of the video for a complete proof, including watching it all the way through. All of these times are reduced if you already have a script or transcription file you can work from.


2. Add the subtitle or caption file to your video

If you are uploading to a video platform like Youtube or Vimeo, you should upload the video as recorded and upload the subtitle file separately. Many streaming services offer this option as well; search for documentation for your specific platform.

If you are distributing on platforms that don’t let you add a subtitle file, you can burn the text into the video file itself. If you’re using a professional video editing program, search for documentation for your specific application. If not, the free app Handbrake lets you burn subtitles or open captions into the video (instructions).


Although several of these terms are colloquially used interchangeably, in this material we respect the formal meanings of the following terms.


Subtitles – Subtitles provide a text representation of spoken language in a video or sound recording. They may be in the original language or translations. They incorporate time information so that they are presented in relative sync with the audio or video.


Captions – Captions provide a text representation of all essential audio in a video or sound recording, including non-speech sounds, music, and other information. Captions can be open captions (“burned into” the video so they are always visible) or closed captions (users can turn captions on or off). They incorporate time information so that they are presented in relative sync with the audio or video.


Transcripts – a static text representation of material originally presented verbally or in another medium. For online events, the transcript is typically made available via a separate webpage or file. The content is very similar or identical to subtitles, but it does not include time information nor automatically display in sync with the audio or video.


VRI (video remote interpretation) – Sign language interpretation services that are provided remotely via video.


RTMP / RTMPS (Real-Time Messaging Protocol [Secure]) – One common protocol that your streaming platform might use to receive closed captions from specialized captioning software.