Image Processing Shortcuts using AI and ML: Part 2

In my previous post on image processing shortcuts, I showed how object detection libraries can be utilised to detect objects inside images. But how are these libraries trained to identify objects? In this post, I’ll walk through the steps to train an image detection model for specific objects.

Why do models need to be trained? In the first post in this series, the ML object detection libraries were able to detect cars, buses, people and a clock. However, one example included an image of a llama, and the library wasn’t trained to know what a llama was, and labelled it as a dog (which is close, but not correct). If the library doesn’t know what a llama is, how can it identify one? (Aside: the awesome PWA app Llama Vision is indeed trained to detect llamas.)

Training a Model: Finger Identification

When training an image object detection library, you need a set of images that contain the image, and the location of the object(s) in the image for the algorithm to learn what it is looking for.  But what to train the model to look for.  I had just finished a ropes course, and looking at my photos, I realized I had accidentally placed my finger over the lens.

‘I wish I could remove that automatically’ -and a lightbulb went off. Time to train a object detection library for fingers covering the camera lens!

Google Cloud Object Detection

In order to train a model, you need images containing the object in question. So, I began snapping images with my finger over the lens, and grabbed photos from Google image search.  Once I had the raw materials, I followed the tutorial guide on how to upload my images to Google Cloud, identify the objects, and train the model.

Identifying objects for the model

Once I had the photos uploaded to Google Cloud, I needed to tell the algorithm where the fingers were in the images. Typically, this is done with a CSV file indicating (x,y) coordinates of a rectangle that encloses the object. This is great if you already have the information, but I was dreading the task of IDing the boxes, and transcribing the values into the CSV.

What I like about the Google tooling is the ability to draw the rectangles in the web interface – labeling the area and uploading the data in one step. (and no chance of transcription errors!)

Screen Shot 2019-07-22 at 3.00.38 PM.png

[An image of Max the Golden Retriever, with my finger over the lens at lower left.  I used the Google UI to identify a rectangle containing my finger]

Training the model

Once I was happy with the images I had uploaded and I had identified all of the regions with fingers overlapping the lens, it was time to train the model. There are lots of articles and papers behind the math and procedures to follow when models are trained. (They do all sorts of fancy stuff like grouping images into training and testing sets, and rotating images several degrees to train the model at different perspectives.)

But really, all you have to do is click the ‘train’ button. The tool does all the math and modeling for you.  It is not trivial math, and depending on your sample size (images and objects to train), this can take a long time:

Screen Shot 2019-07-22 at 3.06.34 PM

In my case, I just used one node, and the 43 images took several hours to train. Time to go for a walk!

Training Results:

Upon completion of the training, a report is given describing the success of the modeling and testing of the model:

Screen Shot 2019-07-22 at 3.07.56 PM

In this case, my precision is 83% – meaning that the model made correct predictions 5/6 of the time (and is incorrect 1/6 of the time).  The recall indicates how frequently the model identifies known areas. High recall means no false positives (meaning that 1/6 of the identifications could be a false positive).

Remember how I said that the cloud will do all the work for us? The test set of images was 6 – hence the 5/6 scoring above.  If I added more images to the model I could improve these numbers, but as a first attempt I was pretty happy.

Testing the Model:

Ok – now I have a model, and I should test it against images to see if the model can find a finger over the lens.  With the example python script (from the tutorial guide), it is pretty easy to do.  In fact, I had the most trouble getting my Google authentication set up properly on my computer than actually running the code.

In the screenshot above – the model is not deployed – there is an hourly cost to keeping the model deployed on a GCP node – so I typically only deploy it when I will need it.  It takes around 15-20 minutes to deploy the model.

I can then run a python script that connects to GCP with my model and Google credentials and the base 64 encoded image:

# ‘content’ is base-64-encoded image data.
def get_prediction(content, project_id, model_id):
prediction_client = automl_v1beta1.PredictionServiceClient()

name = ‘projects/{}/locations/us-central1/models/{}’.format(project_id, model_id)
payload = {‘image’: {‘image_bytes’: content }}
params = {}
request = prediction_client.predict(name, payload, params)
return request # waits till request is returned

I sent this image:


and the model returns a response with the vertices of the box (.77, 0), (1, .68), and the model is 99% certain that there was a “finger” found in the search.

Screen Shot 2019-07-23 at 9.26.32 AM.png

Awesome! The model works, and it identified the finger.  This is super exciting, now I can automatically crop this area out of the picture.

Cropping out the identified object

Now that I have found the finger covering the lens of my photo, I can manipulate the image to remove it.  The simplest solution is to just crop the finger out:

Screen Shot 2019-07-23 at 9.30.09 AM.png

So, in this case, I am cropping out the out the top right corner of the image – but I want to make sure that I crop it losing the fewest pixels – so I crop the image vertically with OpenCv in Python:

Screen Shot 2019-07-23 at 9.41.30 AM.png


I know have an automated system that will identify a finger over the lens of the camera, and crop the finger out of the picture!


Object detection in images is a very powerful way to automate your image processing pipeline.  In this post, I’ve walked through the steps to create an object detection model, and used that model to automatically process images to crop out the object detected.  The potential applications for object detection in images is boundless, so please share the uses you have come up with in the comments!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.