Case Study: Using Machine Learning to find my Teddy Bear

Problem

For the last 8 years I’ve been travelling with my Teddy Bear (Optymis) taking “selfies” of him whenever I go. The result is a massive collection of photos like those:



I wanted to find them all to create an album.

My wife and I accumulated a lot of photos – my Google Drive shows over 20000 photos taken with my mobile phone and we have additional 450000 photos stored on NAS drive. I didn’t fancy browsing through all that manually.

Solution idea

Machine Learning excels in image recognition, so I decided to try this approach. My friend suggested looking for a pretrained model instead of starting from scratch. Quick search revealed that Inception model from Tensor Flow contains a “teddy bear” class, so it should work for me well.

Inception CNN (Convolutional Neural Network) developed by Google is mature, very sophisticated network and, luckily for me, is distributed with checkpoint file which contains network state after it was trained on 1.2M images from ImageNet contest. I decided to use latest available version which is Inception v4 with inception_v4_2016_09_09.tar.gz checkpoint (available to download here: https://github.com/tensorflow/models/tree/master/research/slim).

Development

For development I used a subset of 1323 images out of which 650 contained the teddy. I’ve sorted the photos manually to get a benchmark of results from the network.

My first approach was to try to feed the the whole image at once into a network and take five classes with highest score as an answer. This is a naive approach and can be improved by using the score threshold instead of fixed number of best guesses. It’s a first optimization point.

The result was better than expected, but far from ideal.

Total Files: 1323
Total Positives: 650

Matches:
Missed: 273
False positives:  7

I’ve checked the data and I’ve noticed that the bear was sometimes recognized as a dog, so I’ve tried to massage the data by loosening the criteria and allowing various breeds of dogs to be treated as Optymis:

Total Files: 1323
Total Positives: 650

Loose match ('teddy bear' and various dogs breeds):
Missed: 186
False:  13

There’s an improvement in matches, but I’m getting more false positives. This wasn’t the solution and, as you’ll see later, it gets worse. The point to take away: don’t do silly, random changes just because they seem to work in one specific case. It something sounds wrong, it is wrong.

Making it better – understanding the input data

My photos are “holiday snaps”, not portraits of my bear and thus are meant to show an objects and area behind him. Because of that Optymis is usually located in such way that is takes a small portion of the picture.

My first approach asked the network to recognise objects on the whole picture at once, so in many cases it did find a mountain or a church etc and missed the bear. So I decided to split the image into smaller chunks and process it in bits. I used a sliding window with 50% width and 50% height of the image. I decided to use an overlap in X axis only to limit the amount of images being worked on – it was pretty safe to do as most of my images are landscape and the bear is almost always shown near the bottom edge (to hide my hand). I left full image processing as a 7th step to catch the rare cases. This is an another naive approach. In my data the bear is almost never in 2nd or 3rd box, so I could skip them to optimise the speed.

The sliding window approach gave amazing results:

Total processing time of 1323 items was 863.86130s
	of which ML time was 519.58978s

Strict ('teddy bear' found):
Missed: 23
False:  8

Loose match ('teddy bear' and various dogs breeds):
Missed: 21
False:  25

The error rate is less than 2.5% which is incredible. There are some photos of different teddy bears (the network wasn’t trained on Optymis, so it catches other bears too). The application didn’t work correctly on panoramic images, which was expected – the image is resized to square 299×299 image before it is fed to the network, so wider the image the greater the distortion. This can be easily fixed by improvements to sliding window sizes.

The network also found out nearly 20 my mistakes I made during manual classification – I made both false positive and missed positive errors.

Here’s one of the examples of image I missed, but network recognised correctly. Chapeau bas!

The initial “optimisation” I made (using “dogs” as positive for bear) gave much worse results – a penalty for slightly lower missed rate is the higher increase in false positives. It’s probably not worth it.

Performance

The test was done using 1323 photos which are nearly 4.7GB in total stored in a Vera crypt volume.
The code I wrote is quick and dirty without much (premature) optimisations. It runs on a Windows 10 box with i7 and GTX 1070 GPU. The program is single threaded and runs in a loop: open file, cut, scale, recognise, store result in a MySQL database.

Total processing time of 1323 items was 863.86130s
	of which ML time was 519.58978s

The Tensor Flow takes about 11 seconds to initialise. After that it process a photo (7 runs – for each of the sliding windows) in about 0.4s. The average GPU utilisation is 60%, with the remaining 40% time spent preparing the input data and storing result. It should be fairly easy to shave some of the preprocessing it.

The CPU utilisation reported by Windows for this program is about 15% which it more than expected for a single core application (this is a 12 thread CPU). Some of the libraries used must be doing multi-threading by themselves (cool!).

Memory usage is negligible – about 130MB.

Production run on a bigger data set shows consistent results.

Conclusion

This weekend project was very successful. The recognition rate is incredibly high and the performance is acceptable. It would take less than 4 days to process ten years worth of my photos.

Source Code

find_a_bear.py:

import tensorflow as tf
from nets.inception_v4 import inception_v4
import nets.inception_utils as inception_utils
from PIL import Image
import numpy as np
from datasets import imagenet
from timeit import default_timer as timer

class FindABear():

    im_size = 299

    def __init__(self):
        start = timer()
        self.num_top_predictions = 5
        self.names = imagenet.create_readable_names_for_imagenet_labels()
        slim = tf.contrib.slim
        self.sess = tf.Session()
        inception_v4.default_image_size = self.im_size
        arg_scope = inception_utils.inception_arg_scope()
        self.inputs = tf.placeholder(tf.float32, (None, self.im_size, self.im_size, 3))

        with slim.arg_scope(arg_scope):
            self.logits, end_points = inception_v4(self.inputs, is_training=False)

        saver = tf.train.Saver()

        saver.restore(self.sess,'checkpoint\inception_v4.ckpt')
        end = timer()
        self.init_time=end-start

    def find(self,image):
        start = timer()
        im = Image.open(image)
        im = im.resize((299, 299))
        im = np.array(im)
        im = im.reshape(-1, 299, 299, 3)
        im = 2. * (im / 255.) - 1.

        results=self.findInImage(im)
        end = timer()
        return results, (end-start)

    def findWithSlidingWindow(self,image):
        start = timer()

        totalmltime=0

        resultsAll = []

        im = Image.open(image)

        width, height = im.size

        # X steps will be overlapping, Y steps won't
        stepsX = 2
        stepsY = 2

        windowwidth = (width / stepsX)
        windowheight = (height / stepsY)

        stepX = (width / (stepsX + 2))
        stepY = (height / stepsY)

        for x in range(0, stepsX + 1):
            for y in range(0, stepsY):
                #print("crop to (%d,%d,%d,%d)" % (stepX * x, stepY * y, stepX * x + windowwidth, stepY * y + windowheight))
                im2 = im.crop((stepX * x, stepY * y, stepX * x + windowwidth, stepY * y + windowheight))
                im2 = im2.resize((299, 299))
                im2 = np.array(im2)
                im2 = im2.reshape(-1, 299, 299, 3)
                im2 = 2. * (im2 / 255.) - 1.
                results, mltime=self.findInImage(im2)
                resultsAll = resultsAll + results
                totalmltime+=mltime

        # and now the whole image
        im = im.resize((299, 299))
        im = np.array(im)
        im = im.reshape(-1, 299, 299, 3)
        im = 2. * (im / 255.) - 1.
        results, mltime = self.findInImage(im)
        resultsAll = resultsAll + results

        results,mltime = self.findInImage(im)
        totalmltime += mltime

        end = timer()
        return resultsAll, (end - start), totalmltime

    def findInImage(self,im):
        start = timer()

        logit_values = self.sess.run(self.logits, feed_dict={self.inputs: im})
        predictions=logit_values[0]

        top_k = predictions.argsort()[-self.num_top_predictions:][::-1]
        results=[]
        for node_id in top_k:
            human_string = self.names[node_id]
            score = predictions[node_id]
            result=(node_id, score, human_string)
            results.append(result)

        end = timer()
        return results,(end-start)

runner.py

from find_a_bear import FindABear
import mysql.connector
import os

class Runner:

    def initDb(self):
        self.cnx = mysql.connector.connect(user='****', password='****',
                                  host='127.0.0.1',
                                  database='optymisie')

    def cleanUp(self):
        self.cnx.close()

    def findCandidates(self, start_path):
        addFileQuery = ("INSERT IGNORE INTO files(filename) values (%(filename)s)")
        cursor = self.cnx.cursor()
        for dirpath, dirnames, filenames in os.walk(start_path):
            for filename in [f for f in filenames if (f.endswith(".jpg") or f.endswith(".JPG"))]:
                fullfilename=dirpath+os.sep+filename
                cursor.execute(addFileQuery,{'filename':fullfilename})
        self.cnx.commit()
        cursor.close()

    def findPositives(self, start_path, data_path):

        addPositivesQuery = ("INSERT IGNORE INTO positives(filename) values (%(filename)s)")

        cursor = self.cnx.cursor()

        for dirpath, dirnames, filenames in os.walk(start_path):
            for filename in [f for f in filenames if (f.endswith(".jpg"))]:
                fullfilename=dirpath+os.sep+filename
                fullfilename=fullfilename.replace(start_path, data_path)
                cursor.execute(addPositivesQuery,{'filename':fullfilename})
        self.cnx.commit()
        cursor.close()

    def processFiles(self):
        total_processing_time=0
        total_ml_time=0
        total_items=0
        addResultQuery = ("INSERT INTO results (id_files, score, name_id, name) values (%(id_files)s, %(score)s, %(name_id)s, %(name)s)")
        findFilesToProcessQuery = ("select id_files, filename from files where result is null")

        cursor = self.cnx.cursor()
        cursor.execute(findFilesToProcessQuery)

        files=[]
        for(id_files, filename) in cursor:
            files.append((id_files,filename))
        cursor.close()

        if len(files)==0:
            print("No new files")
            return

        finder=FindABear()
        print("Init time %.3f" % finder.init_time)

        cursor = self.cnx.cursor()

        for (id_files, filename) in files:

            try:
                results,processing_time,ml_time=finder.findWithSlidingWindow(filename)
                total_processing_time+=processing_time
                total_ml_time+=ml_time
                total_items+=1
                #print('Processing time %.3f' % processing_time)
                #print(results)
                allresults=""
                for result in results:
                    name_id, score, name=result
                    cursor.execute(addResultQuery,{"id_files":id_files, "score":float(score), "name":name, "name_id":int(name_id)})
                    allresults+=name+"|"

                updateQuery=("update files set result=%(result)s where id_files=%(id_files)s")
                cursor.execute(updateQuery, {"result":allresults, "id_files":id_files})

            except ValueError:
                print("Error processing %s" % filename )

            if (total_items%100==0):
                self.cnx.commit();
                print("\tProcessing time so far of %d items was %.5f" % (total_items, total_processing_time))
                print("\t\tof which ML time was %.5f" % total_ml_time)

        self.cnx.commit();
        cursor.close()
        print ("Total processing time of %d items was %.5f" % (total_items,total_processing_time))
        print ("\tof which ML time was %.5f" % total_ml_time)

    def printResults(self):
        cursor = self.cnx.cursor()
        getAllQuery="select filename, f.id_files, name from files f left join results r on f.id_files=r.id_files"
        cursor.execute(getAllQuery)
        for (filename, id_files, name) in cursor:
            print("%d %s %s" % (id_files,filename,name))
        cursor.close()

    def calculateStats(self):
        cursor = self.cnx.cursor()

        print("Updating stats")
        cursor.execute("update files set loosly_ok=false, strict_ok=false")
        cursor.execute("update files set strict_ok=true where id_files in (select id_files from results where name='teddy, teddy bear')")
        cursor.execute("""update files set loosly_ok=true where id_files in (select id_files from results where name in ( 
            'toy poodle',
            'standard poodle',
            'miniature poodle',
            'cocker spaniel, English cocker spaniel, cocker',
            'Airedale, Airedale terrier',
            'wire-haired fox terrier',
            'Welsh springer spaniel',
            'Irish water spaniel',
            'Brittany spaniel',
            'Irish terrier',
            'Bedlington terrier',
            'Eskimo dog, husky',
            'English foxhound',
            'French bulldog'
            ))""")

        self.cnx.commit()
        cursor.close()

    def displayStats(self):
        cursor = self.cnx.cursor()

        cursor.execute("SELECT count(*) FROM  `positives` ")
        totalPositives = cursor.next()

        cursor.execute("SELECT count(*) FROM  files ")
        totalFiles = cursor.next()

        cursor.execute("SELECT count(*) FROM  `positives` p left join files f on f.filename=p.filename WHERE f.strict_ok =false")
        missedStrict=cursor.next()

        cursor.execute("SELECT count(*) FROM  `positives` p left join files f on f.filename=p.filename WHERE f.strict_ok =false and f.loosly_ok = false")
        missedLoose=cursor.next()

        cursor.execute("select count(*) from files f left join positives p on f.filename=p.filename where f.strict_ok = true and p.id_positives is null")
        falseStrict=cursor.next()

        cursor.execute("select count(*) from files f left join positives p on f.filename=p.filename where (f.strict_ok =true or loosly_ok = true) and p.id_positives is null")
        falseLoose=cursor.next()

        cursor.close()

        print("Total Files: %s" % totalFiles)
        print("Total Positives: %s" % totalPositives)

        print("\nStrict ('teddy bear' found):")
        print("Missed: %s" % missedStrict)
        print("False:  %s" % falseStrict)

        print("\nLoose match ('teddy bear' and dogs):")
        print("Missed: %s" % missedLoose)
        print("False:  %s" % falseLoose)

runner=Runner()
runner.initDb()
runner.findCandidates("m:\\Google Drive\\Google Photos (1)")
runner.findPositives("e:\\workspace\\znajdz_optymisie\\using_inception_v4\\images","m:\\Google Drive\\Google Photos") 
runner.processFiles()
runner.calculateStats()
runner.displayStats()
runner.cleanUp()

Database schema:

CREATE TABLE `files` (
  `id_files` int(11) NOT NULL,
  `filename` varchar(255) NOT NULL,
  `result` varchar(200) DEFAULT NULL,
  `strict_ok` tinyint(1) DEFAULT NULL,
  `loosly_ok` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

CREATE TABLE `positives` (
  `id_positives` int(11) NOT NULL,
  `filename` varchar(200) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

CREATE TABLE `results` (
  `id_results` int(11) NOT NULL,
  `id_files` int(11) NOT NULL,
  `score` float NOT NULL,
  `name_id` int(11) NOT NULL,
  `name` varchar(200) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

ALTER TABLE `files`
  ADD PRIMARY KEY (`id_files`),
  ADD UNIQUE KEY `filename` (`filename`);

ALTER TABLE `positives`
  ADD PRIMARY KEY (`id_positives`),
  ADD UNIQUE KEY `filename` (`filename`);

ALTER TABLE `results`
  ADD PRIMARY KEY (`id_results`),
  ADD KEY `name` (`name`);
Share and Enjoy:
  • Digg
  • Facebook
  • Blip
  • Kciuk.pl
  • StumbleUpon
  • Wykop
Share