Creating a Virtual Conference Talk Part 3: Pulling it all Together

In 2020, ImageCon (like many conferences) has morphed into a virtual conference. I was invited to give a talk on video streaming at the conference. In order to record the talk, they recommended that I record my presentation with Zoom, recording the “call” and using the video created by Zoom. That video will then be used as a part of the conference available to watch on demand.

I thought it would be more fun to create my conference talk video with the tools found in Cloudinary. In Part 1, I used image manipulations to move and arrange a template for the video of my presentation. In Part 2, I took my recorded videos and synced them so that the video and audio of both presentations were aligned (almost) perfectly.

In this final post on the series, I will take the template arrangement and the video start times and build my conference video.

Testing Video Creation

One of my favourite things about doing demos with Cloudinary is that I can simply manipulate the url, and the change happens on the fly in my browser allowing instant feedback on images and video. These ‘on the fly’ transformations are limited to 40 MB of data. Since in this case I am working with 20 minute videos, I am well over this limit.

To perform these manipulations, I will use explicit transformations with the NodeJS API. Explicit transformations can be performed on files already in your Cloudinary account, and generate the urls for your modified content – but the encoding happens asynchronously (since they take longer to run).

The sample code for this is on GitHub, but let’s walk through what is happening.

First, I set my credentials:

// set your env variable CLOUDINARY_URL or set the following configuration

cloud_name: 'dougsillars',
api_key: 'XXX',
api_secret: 'YYY'

Next, I begin to define my transformation (it is called an eager transformation, as I am defining it on ‘upload’). For these large transformations, they must be run asynchronously, and I have a notification url to tell me when the transformation is completed:

// File upload
{ type: "upload", resource_type: "video",
eager_async: true,
eager_notification_url: "",

But we are all here for the eager transformation – so let’s see how I do this.

  • In the first JSON section, I set the overall height and width to 1920×1080. For the background video, I remove the audio (we only need audio from one source), and – for testing purposes, I set the duration to 6 seconds. These large video transformations can take a long time, so I test with just the beginning rather than re-encoding all 20 minutes.
  • The second JSON section sets the background image across the entire canvas of the video, and it is applied with the layer_apply.

format: "mp4",
transformation: [
{ width: 1920,
height: 1080,
audio_codec: "none",
format: "mp4",
duration: 6
width: 1920,
height: 1080,
flags: "layer_apply"

  • Now, we begin adding videos, first the presentation video ‘video:imagecon_video’, with no audio, the start offset I determined when synching the videos, and set to 720p. I then add the location of the video with the x and y parameters from the upper left side of the video (gravity “north west”) that I determined with the video layout template.

start_offset: 5.2,
width: 1280,
height: 720
x: 475,
y: 200,
flags: "layer_apply"

And finally, we add the video from the camera of the presenter. Again, set the offset based on the video syncing process, and the location and sizes based on the video templating exercise. In this case, the audio is added from this source.
width: 350,
height: 500,
crop: "fill"
x: 50,
y: 350,
flags: "layer_apply"

Running this node project results in a 6 second video that can be observed to make sure that everything works as expected. If there are any issues – the parameters can be readjusted and tested again until the start of the video is perfect.

Once we are happy with the way the video runs, we can create the entire video. The full video is created with the presentation_nocreds.js (just add your credentials) Node file. The big change here is removing the duration: 6 – allowing Cloudinary to create the entire video.

NOTE: Creating a large file like this can take some time – and Cloudinary has a 15 minute processing time limit. You may need to file a ticket for that limit to be removed from your account.


There is code in both Node files to create caption files of the videos. However, they are set based on the original videos – so all of the times have incorrect timestamps. It should be possible to run a script to subtract off the start offset and add a captions file. Unfortunately, to generate a fresh caption file on the final video – I would have to download and re-upload the final video – making the script seem a better option. This is left as an exercise 🙂


And there you go – we have taken 2 videos and a backdrop and converted them into a professional looking conference video. In Part 1 – we set up the layout of the video by rearranging the images on the backdrop. In Part 2, we synced the video start times. Finally, we used Cloudinary’s explicit transformations to arrange our recorded videos according to the template and start times to sync them. On July 30, You’ll be able to see the fruits of this exercise when my talk goes live. In the meantime, here is a sample video I created reading a popular children’s book:

One Fish Two Fish

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.