The issue

The absence of captions on videos is an accessibility issue that directly excludes deaf and hard-of-hearing people. WCAG success criterion 1.2.2 requires that prerecorded synchronized media (video with audio) be accompanied by synchronized captions. For videos containing speech, synchronized captions are indispensable. Captions are not translations: they are the synchronized text transcription of a video's audio content in the same language. They must include not only dialogue but also relevant sound information (music, background noise, sound effects) that contribute to understanding the content. A caption that indicates '[dramatic music]' or '[door slamming]' provides essential context that dialogue text alone does not convey. Automatic captioning offered by YouTube, Vimeo, and other platforms is a starting point but is rarely sufficient as-is. Speech recognition algorithms make errors, particularly with regional accents, technical vocabulary, proper nouns, and noisy environments. Unreviewed and uncorrected automatic captioning can be misleading and does not satisfy WCAG quality requirements. Captions must be synchronized with the audio track, segmented into readable units (no more than two lines of 37 characters), and positioned so as not to obscure important visual information. The standard format for the web is WebVTT (.vtt), integrated via the track element in the HTML video tag. For videos hosted on third-party platforms (YouTube, Vimeo), captions must be uploaded to the platform via its built-in captioning tools.

Does your site have this issue?

106 RGAA criteria analyzed in 5 minutes by our AI.

Test your site for free

Impact on users

Worldwide, approximately 466 million people have disabling hearing loss. Without captions, your site's videos are totally inaccessible to these people. They cannot follow a product presentation, understand a tutorial, or watch a client testimonial. This represents lost information and potentially lost revenue. Captions also benefit a much wider audience than people with hearing disabilities. People in noisy environments (public transport, open offices), people watching a video without sound (at the office, at night), and people whose native language is not the video's language all use captions. According to a Verizon Media study, 80% of people who use captions are not deaf or hard of hearing. Captions also improve SEO: caption text is indexed by search engines, making your videos discoverable via text search.

Code example

Before (non-compliant)
<video controls>
  <source src="/videos/presentation.mp4" type="video/mp4">
</video>

<!-- YouTube video without captions -->
<iframe src="https://www.youtube.com/embed/abc123"
  allowfullscreen></iframe>
After (compliant)
<video controls>
  <source src="/videos/presentation.mp4" type="video/mp4">
  <track kind="captions" src="/videos/presentation.vtt"
    srclang="en" label="English" default>
</video>

<!-- YouTube video with captions enabled by default -->
<iframe src="https://www.youtube.com/embed/abc123?cc_load_policy=1&cc_lang_pref=en"
  allowfullscreen
  title="Presentation of our services — with captions">
</iframe>

<!-- Example .vtt file -->
<!-- presentation.vtt:
WEBVTT

00:00:01.000 --> 00:00:04.000
Hello and welcome to this presentation.

00:00:04.500 --> 00:00:08.000
[Intro music]

00:00:08.500 --> 00:00:12.000
Today, we will talk about web accessibility.
-->

Frequently Asked Questions

Are YouTube's automatic captions sufficient for WCAG?
No. Automatic captions often contain transcription errors (misrecognized words, missing punctuation, distorted proper nouns). WCAG requires relevant and correctly synchronized captions. You must review and correct automatic captions before considering them conformant. YouTube allows editing automatic captions in YouTube Studio.
What is the difference between captions and a text transcript?
Captions are synchronized with the video and appear as overlay during playback. A text transcript is static text available next to or below the video, containing all spoken content and sound information. WCAG requires at minimum a text transcript or captions. Ideally, provide both.
Must speechless videos be captioned?
If the video contains significant sounds (sound effects, meaningful music, important ambient noise for understanding), yes. If the video is purely visual with no relevant audio, captions are not necessary, but an audio description may be required if the visual content carries information.
How much does professional captioning cost?
Professional captioning costs between $3 and $10 per minute of video. To reduce costs, you can use automatic captions as a base and only pay for human correction. Tools like Rev, Otter.ai, or Happy Scribe offer AI-assisted captioning with human correction at competitive rates.

Does your site have this issue?

Scrutia audits your site against WCAG 2.1 AA criteria in 5 minutes. Keyboard navigation, ARIA components, visible focus, contrasts, and much more.

Run a free audit