Video Voice “Unders”

As I enter week 3 of total self isolation (some have done a lot more, others much less), we’re all getting a little punchy.  Many parents have vented frustrations online about how hard it is to work from home, and also homeschool their children. Our kids are understandably concerned too, which makes concentration on even the easiest tasks difficult.

What f you could get your child’s idol to personally ask them to do their homework?  Would that help push them towards getting things done faster?  Well, in this post, you’ll learn how to make customised messages from anyone (albeit in a very cheezy late-night TV sort of way).

Here’s Queen Elizabeth II asking your kids to do their homework:

Custom Voice “Unders”

The first step in creating a voice “under” is to remove the mouth from the person in the photo.  We can do this with Cloudinary’s advanced facial recognition software.  (Note, this is an add-on that must be enabled). On upload of the image, we ask for an analysis of the fascial attributes (the examples and code here are NodeJS:

cloudinary.uploader.upload(files.imageSource.path ,{ detection: "adv_face" , public_id: imageName, tags:"face_upload"},function(err,image){
//lots more


The response has a coordinates of a number of features on the face in the uploaded image.  The goal here is to create an oval around the mouth, so we’ll focus on the mouth attributes:

"mouth": {
"left": {"x": 538.4879999999999,"y": 1059.48},
"right": {"x": 728.028,"y": 1078.92},
"under_lip": {"bottom": {"x": 641.52,"y": 1112.697},"top": {"x": 644.193,"y": 1086.696}},
"upper_lip": {"bottom": {"x": 642.492,"y": 1066.5269999999998},"top": {"x":642.006,"y":1049.031}},

Using these parameters, we can calculate the center x,y position of the mouth, as well as the width (left to right) and height (upper lip to lower lip) of the mouth.  In my Cloudinary account, I’ve uploaded an simple black image and named it “black”. If I create the following transformation:

transformation {
  format: 'png',
  overlay: 'black'
  border: '2px_solid_rgb:33390b60'
  radius: 'max'
  gravity: 'center'
  x: 13,
  y: 14
  width: 36,
  height: 13
  effect: 'cut_out'
  flags: 'layer_apply

The image becomes a PNG (PNGs have a transparent layer – which will be important in a second), I create a box with x,y offsets, and calculated widths and heights. The r_max removes the corners of the box until it is an ellipse.  I then use the cut_out parameter to ‘punch’ a hole in the image, leaving the transparent ellipse where the mouth used to be.  This image is uploaded as “name”-mouth.


Here’s Beyonce on her way to becoming a voice under:

Creating a voice under video

This was actually really simple.  By holding my phone in landscape mode really close to my face, I could use the front camera to record a video where my lips were centered in the screen.  Record yourself saying a few silly lines.  Upload these videos to Cloudinary.  In my example, I have “please do your homework.”  ” The dishwasher won’t empty itself”, “I know you didn’t wash your hands for 20 seconds” and “could you clean up your room?”  All perfect messages to send to your loved one who might not be doing their full part while quarantined at home

Adding the video to the image

The format of the video transformation looks something like this:,h_540,g_center,bo_2px_solid_rgb:66390b60,e_vignette,x_-38,y_-2,c_lpad,fl_ignore_aspect_ratio/l_bey-mouth/mouthhomework.mp4

The first step (in red) is to make the video small enough to fit in the space of Beyonce’s mouth (in this case 88 pixels wide).

Next (in blue), we add the image of Beyonce, resizing the video to be large (matching the size of the image).  We move the image so that the hole is centered around the video of the mouth.  We calculated the hole location in the first step, so this is relatively easy.

And that’s it! Here she is, reminding your kids to wash their hands for 20s:


Play with the Tool!

I have placed the code that I used on Glitch.  You can upload a photo of any person, and then choose one of the 4 pre-recorded messages I created. Or, feel free to remix, add your Cloudinary credentials, and create your own video messages!

This month, Cloudinary and I are celebrating “Livin’ the Video Loca”. Create a video “Voice Under” and enter our contest. Winners will have a donation made in their name to COVID-19 relief efforts.

Since we borrowed Ricky Martin’s song for the contest, I also borrowed his likeness:


We used a few advanced techniques here.  We used he advanced facial recognition tools of Cloudinary to identify the mouth on a face (and then we removed it!).  We then placed a video under the image so that a moving mouth appears in the hole where the image’s mouth once was – giving the appearance of an image giving a personalised message (in a different voice.

UPDATE: Contest

Each week this month, I’ll have an updated video transformation contest with Cloudinary tools. We’re calling it “Livin’ la Video Loca.” Tweet your Virtual Vacation video with the hashtag #LivinlaVideoLoca and enter the contest by April 16. Cloudinary will make a donation in the winner’s name to COVID relief efforts in their region.” part at the end



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.