Later this month (June 2020), I am speaking at the ‘virtual’ ImageCon. All of the speakers were invited to record their talks in advance, and the talks will be posted on the website. I wanted to have more fun than the suggested “just record your talk in Zoom and share the link,” and decided to build my conference talk with Cloudinary’s video transformations.

There are a lot of very creative ways that conferences generate videos of presentations. Typically, these videos have a custom backdrop, and have 2 synced videos – one of the speaker talking (and generally small to the side) and a larger video of the slides that the speaker is presenting.

In Part 1 of Creating a Virtual Conference Talk, I created a template webpage to layout the way the video will appear on the page:

With the Cloudinary url that I receive from this page, I can substitute the images for videos.

Of course, when you record 2 videos at the same time – it is very difficult to start both of the videos at the same time. Even if you press “start”on both devices simultaneously, they might have software delays that will prevent the recording from starting at exactly the same time. Rather than attempting to sync the two videos during recording, we can do this in post-production. In this post, we’ll sync the two videos together so that the playback match.

Step 1: Presentation start times in video

The first step is to estimate the video start times. The startTime.html page is in the conference template GitHub repository. You can add your video to the page using the url parameter “url.” Now, just play the video in the browser to identify the time where the presenter begins speaking. The rewind button can help move backwards in 0.5 second increments. Below the buttons is the current time in the video playback.

.html showing the video starts at ~8 seconds.

Estimating the start time of both the presentation video and the presenter video will simplify the syncing step. In the screenshot above, I have estimated that the presentation begins at 8 seconds into the video.

Step2: Syncing poresentation startup

Knowing approximately when the presentations begin in the 2 videos allow us to more easily sync the presentation start time. The syncTime.html page takes 4 url parameters: url1, url2, start1, and start2. These demote the 2 videos to sync, and the approximate start time of the presentation in both of them. If you open the page, you’ll see the following:

synctime screenshot

This page has the two videos at the top, set to start at the suggested start times (also shown in the two boxes). In this example, the left video starts 8 seconds in, and the right video starts 5 seconds in.

Below the two videos is a waveform image for 10 seconds of the video, centered around the video start time. The top waveform is for the presenter video, and the bottom is the presentation. This is a visual representation of the audio, and we can see that the peaks for the audio do not line up.

With Cloudinary, you can easily create a waveform from any audio or video file: Simply change the format to ‘.jpg’ (or other image format, and add the parameters ‘fl_waveform’. In the case above, I use so_ and eo_ to indicate the exact start and end offset times to ensure the waveform is showing just 10 seconds of the video.

Pressing the play button will start both of the videos at the same time (you can try it yourself with the link above). The audio is not *quite* synced up. It looks (from the waveform) that the presentation audio plays later than the presenter audio. I can shift the video start times in the two input boxes and pressing reset. When I set the presentation start time to 5.25s, there is almost no echo in the audio, meaning that the two videos are completely synced.


In Part One of this series, we created the template for our conference talk – laying out the way the two videos will be displayed in the final presentation. In this post, I synced the two videos. It no longer matters how the videos were recorded – the audio is now synced, meaning that the content on the slides will exactly match what the presenter is saying.

In Part Three of this series, we’ll take the layout we created in part 1, the video start times uncovered in this post, and we’ll use them to generate the final conference talk video.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.