Video Accessibility: Adding Captions

You have probably seen it while browsing your favourite social media feed – where a video plays silently, but there are captions running across the bottom, indicating that something interesting is happening. Personally, I am more likely to stop and watch a video in my social feeds if there are captions (while the video plays silently). i’m pretty sure that content creators on Facebook and Twitter have found that this is true as well – and use this method to draw more viewers.  In addition to being more catchy and engaging, video captions allow those without the ability to hear to know what is happening in the video, improving the accessibility of your content.

I’m Sold!  How do I Add Captions?

This is easy.  The HTML5 video player supports captions out of the box.  Simply adding a track attribute with a caption file will add the captions:

<video autoplay muted controls>
  <source src="myvideo.webm">
  <source src="myvideo.mp4">
  <track default lang="en" kind="captions" src="myvideo.vtt">
  <track lang="es" kind="captions" src="muvideo-es.vtt">

In the above example, there are two captions files – the default is in english, but there are also captions provided in Spanish.  What are all the parameters?  Default is the default caption, in this case english.  The lang parameter describes which language – and the player will use this in the menu when users choose the caption language.  “Kind” describes the type of track: whether they are captions or subtitles.  There *are* semantic differences between these tracks: captions are a literal transcription of the dialogue, whereas subtitles describe the scene:


Which should you use?  Go with what data you have!  What if you don’t have either? We’ll also look at how to automatically generate your captions!


The VTT File

The text for the captions is stored in a VTT file (Video Text Tracks).  They have a very simple format (they can have CSS and other features added, but let’s start with a vanilla file).


00:00:15.000 --> 00:00:17.951
At the left we can see...

00:00:18.166 --> 00:00:20.083
At the right we can see the...

00:00:20.119 --> 00:00:21.962
...the head-snarlers

00:00:21.999 --> 00:00:24.368
Everything is safe.
Perfectly safe.

00:00:24.582 --> 00:00:27.035

00:00:28.206 --> 00:00:29.996
Watch out!

00:00:47.037 --> 00:00:48.494
Are you hurt?


The VTT file is identified in the first line with WebVTT, and followed by a number of cues.  Each cue is broken into 3 parts: the identifier, the timing, and the cue payload itself.  Each cue is separated by one (or more) newlines.

  1. The identifier is optional, does not need to be a number (and if numbered do not have to be sequential). the just cannot have the string “–>” in the cue.
  2. The timing is the start time for the cue to appear in hh:mm:ss.ttt, followed by “–>” and the end time of the cue.  Cues can overlap in time.
  3. The cue itself (the words that will appear on the video).


Creating the VTT file

Assuming that there is no transcript with timings – and that you do not wish to create the transcript manually – there are a number of video transcription services online.  Some use humans (and cost more money), while others use computer algorithms to create the VTT files.  In this example, I’m going to use Cloudinary’s Azure Video Indexer (Cloudinary also offers a similar service through Google Cloud).

Building a simple Node application, I can upload a video, and apply a transcription in english (full working code can be found on Github for both the Azure and Google transcription services):

cloudinary.uploader.upload(process.argv[2], { resource_type: "video", public_id: id,
  raw_convert: "azure_video_indexer:vtt:en-US" },function(err,result){
    console.log("** File Uploaded!");
  if (err){ console.warn(err);}
  console.log("* public_id for the uploaded image is generated by Cloudinary's service.");

This will add a VTT file in your Cloudinary account with the name of the video as the filename.

Before blindly using the VTT file in your app or website – you should read the text to make sure that the transcription was done correctly – you don’t want the name of your company to be incorrect, or misspelled!  However, since these are just text files, you can open them in any text editor and resolve the issues quickly.  The captions created for this video has 2 issues that should be resolved.   The name of the company is misspelled:

Screen Shot 2019-11-09 at 11.35.58 PM.png


and one of the sentences ends in a “?” that should probably just be a “.”


Screen Shot 2019-11-09 at 11.35.51 PM.png


Internationalisation with Captions

The Azure tool above has another really cool option – it will translate your VTT files into multiple languages.  The full list of language can be found in the documentation.  Again, I would express caution at machine translated captions, and you should have someone who understands the language preview and edit the text before publishing.  The video in the link above has four different subtitles available:

Screen Shot 2019-11-09 at 11.38.43 PM.png

By changing the language, your video is now potentially understandable to more of your users.

Screen Shot 2019-11-09 at 11.55.35 PM.png


Adding captions to your online videos improves accessibility, but can also be used to draw interest to autoplaying silent videos.  If captions or transcriptions do not exist – look at automated tools that can generate these files for you automatically (like those offered by Cloudinary above).  These tools can also help you with your site internationalisation by creating captions for those who make not understand your default language – helping your content reach a wider audience!

The best thing is – with the tools outlined in this post – it isn’t terribly hard to add captions, and this addition will differentiate your content from the content of your competition!




One thought on “Video Accessibility: Adding Captions

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.