Case Study: Using Machine Learning to find my Teddy Bear


For the last 8 years I’ve been travelling with my Teddy Bear (Optymis) taking “selfies” of him whenever I go. The result is a massive collection of photos like those:

I wanted to find them all to create an album.

My wife and I accumulated a lot of photos – my Google Drive shows over 20000 photos taken with my mobile phone and we have additional 450000 photos stored on NAS drive. I didn’t fancy browsing through all that manually.

Solution idea

Machine Learning excels in image recognition, so I decided to try this approach. My friend suggested looking for a pretrained model instead of starting from scratch. Quick search revealed that Inception model from Tensor Flow contains a “teddy bear” class, so it should work for me well.

Inception CNN (Convolutional Neural Network) developed by Google is mature, very sophisticated network and, luckily for me, is distributed with checkpoint file which contains network state after it was trained on 1.2M images from ImageNet contest. I decided to use latest available version which is Inception v4 with inception_v4_2016_09_09.tar.gz checkpoint (available to download here:


For development I used a subset of 1323 images out of which 650 contained the teddy. I’ve sorted the photos manually to get a benchmark of results from the network.

My first approach was to try to feed the the whole image at once into a network and take five classes with highest score as an answer. This is a naive approach and can be improved by using the score threshold instead of fixed number of best guesses. It’s a first optimization point.

The result was better than expected, but far from ideal.

Total Files: 1323
Total Positives: 650

Missed: 273
False positives:  7

I’ve checked the data and I’ve noticed that the bear was sometimes recognized as a dog, so I’ve tried to massage the data by loosening the criteria and allowing various breeds of dogs to be treated as Optymis:

Total Files: 1323
Total Positives: 650

Loose match ('teddy bear' and various dogs breeds):
Missed: 186
False:  13

There’s an improvement in matches, but I’m getting more false positives. This wasn’t the solution and, as you’ll see later, it gets worse. The point to take away: don’t do silly, random changes just because they seem to work in one specific case. It something sounds wrong, it is wrong.

Making it better – understanding the input data

My photos are “holiday snaps”, not portraits of my bear and thus are meant to show an objects and area behind him. Because of that Optymis is usually located in such way that is takes a small portion of the picture.

My first approach asked the network to recognise objects on the whole picture at once, so in many cases it did find a mountain or a church etc and missed the bear. So I decided to split the image into smaller chunks and process it in bits. I used a sliding window with 50% width and 50% height of the image. I decided to use an overlap in X axis only to limit the amount of images being worked on – it was pretty safe to do as most of my images are landscape and the bear is almost always shown near the bottom edge (to hide my hand). I left full image processing as a 7th step to catch the rare cases. This is an another naive approach. In my data the bear is almost never in 2nd or 3rd box, so I could skip them to optimise the speed.

The sliding window approach gave amazing results:

Total processing time of 1323 items was 863.86130s
	of which ML time was 519.58978s

Strict ('teddy bear' found):
Missed: 23
False:  8

Loose match ('teddy bear' and various dogs breeds):
Missed: 21
False:  25

The error rate is less than 2.5% which is incredible. There are some photos of different teddy bears (the network wasn’t trained on Optymis, so it catches other bears too). The application didn’t work correctly on panoramic images, which was expected – the image is resized to square 299×299 image before it is fed to the network, so wider the image the greater the distortion. This can be easily fixed by improvements to sliding window sizes.

The network also found out nearly 20 my mistakes I made during manual classification – I made both false positive and missed positive errors.

Here’s one of the examples of image I missed, but network recognised correctly. Chapeau bas!

The initial “optimisation” I made (using “dogs” as positive for bear) gave much worse results – a penalty for slightly lower missed rate is the higher increase in false positives. It’s probably not worth it.


The test was done using 1323 photos which are nearly 4.7GB in total stored in a Vera crypt volume.
The code I wrote is quick and dirty without much (premature) optimisations. It runs on a Windows 10 box with i7 and GTX 1070 GPU. The program is single threaded and runs in a loop: open file, cut, scale, recognise, store result in a MySQL database.

Total processing time of 1323 items was 863.86130s
	of which ML time was 519.58978s

The Tensor Flow takes about 11 seconds to initialise. After that it process a photo (7 runs – for each of the sliding windows) in about 0.4s. The average GPU utilisation is 60%, with the remaining 40% time spent preparing the input data and storing result. It should be fairly easy to shave some of the preprocessing it.

The CPU utilisation reported by Windows for this program is about 15% which it more than expected for a single core application (this is a 12 thread CPU). Some of the libraries used must be doing multi-threading by themselves (cool!).

Memory usage is negligible – about 130MB.

Production run on a bigger data set shows consistent results.


This weekend project was very successful. The recognition rate is incredibly high and the performance is acceptable. It would take less than 4 days to process ten years worth of my photos.

Source Code

import tensorflow as tf
from nets.inception_v4 import inception_v4
import nets.inception_utils as inception_utils
from PIL import Image
import numpy as np
from datasets import imagenet
from timeit import default_timer as timer

class FindABear():

    im_size = 299

    def __init__(self):
        start = timer()
        self.num_top_predictions = 5
        self.names = imagenet.create_readable_names_for_imagenet_labels()
        slim = tf.contrib.slim
        self.sess = tf.Session()
        inception_v4.default_image_size = self.im_size
        arg_scope = inception_utils.inception_arg_scope()
        self.inputs = tf.placeholder(tf.float32, (None, self.im_size, self.im_size, 3))

        with slim.arg_scope(arg_scope):
            self.logits, end_points = inception_v4(self.inputs, is_training=False)

        saver = tf.train.Saver()

        end = timer()

    def find(self,image):
        start = timer()
        im =
        im = im.resize((299, 299))
        im = np.array(im)
        im = im.reshape(-1, 299, 299, 3)
        im = 2. * (im / 255.) - 1.

        end = timer()
        return results, (end-start)

    def findWithSlidingWindow(self,image):
        start = timer()


        resultsAll = []

        im =

        width, height = im.size

        # X steps will be overlapping, Y steps won't
        stepsX = 2
        stepsY = 2

        windowwidth = (width / stepsX)
        windowheight = (height / stepsY)

        stepX = (width / (stepsX + 2))
        stepY = (height / stepsY)

        for x in range(0, stepsX + 1):
            for y in range(0, stepsY):
                #print("crop to (%d,%d,%d,%d)" % (stepX * x, stepY * y, stepX * x + windowwidth, stepY * y + windowheight))
                im2 = im.crop((stepX * x, stepY * y, stepX * x + windowwidth, stepY * y + windowheight))
                im2 = im2.resize((299, 299))
                im2 = np.array(im2)
                im2 = im2.reshape(-1, 299, 299, 3)
                im2 = 2. * (im2 / 255.) - 1.
                results, mltime=self.findInImage(im2)
                resultsAll = resultsAll + results

        # and now the whole image
        im = im.resize((299, 299))
        im = np.array(im)
        im = im.reshape(-1, 299, 299, 3)
        im = 2. * (im / 255.) - 1.
        results, mltime = self.findInImage(im)
        resultsAll = resultsAll + results

        results,mltime = self.findInImage(im)
        totalmltime += mltime

        end = timer()
        return resultsAll, (end - start), totalmltime

    def findInImage(self,im):
        start = timer()

        logit_values =, feed_dict={self.inputs: im})

        top_k = predictions.argsort()[-self.num_top_predictions:][::-1]
        for node_id in top_k:
            human_string = self.names[node_id]
            score = predictions[node_id]
            result=(node_id, score, human_string)

        end = timer()
        return results,(end-start)

from find_a_bear import FindABear
import mysql.connector
import os

class Runner:

    def initDb(self):
        self.cnx = mysql.connector.connect(user='****', password='****',

    def cleanUp(self):

    def findCandidates(self, start_path):
        addFileQuery = ("INSERT IGNORE INTO files(filename) values (%(filename)s)")
        cursor = self.cnx.cursor()
        for dirpath, dirnames, filenames in os.walk(start_path):
            for filename in [f for f in filenames if (f.endswith(".jpg") or f.endswith(".JPG"))]:

    def findPositives(self, start_path, data_path):

        addPositivesQuery = ("INSERT IGNORE INTO positives(filename) values (%(filename)s)")

        cursor = self.cnx.cursor()

        for dirpath, dirnames, filenames in os.walk(start_path):
            for filename in [f for f in filenames if (f.endswith(".jpg"))]:
                fullfilename=fullfilename.replace(start_path, data_path)

    def processFiles(self):
        addResultQuery = ("INSERT INTO results (id_files, score, name_id, name) values (%(id_files)s, %(score)s, %(name_id)s, %(name)s)")
        findFilesToProcessQuery = ("select id_files, filename from files where result is null")

        cursor = self.cnx.cursor()

        for(id_files, filename) in cursor:

        if len(files)==0:
            print("No new files")

        print("Init time %.3f" % finder.init_time)

        cursor = self.cnx.cursor()

        for (id_files, filename) in files:

                #print('Processing time %.3f' % processing_time)
                for result in results:
                    name_id, score, name=result
                    cursor.execute(addResultQuery,{"id_files":id_files, "score":float(score), "name":name, "name_id":int(name_id)})

                updateQuery=("update files set result=%(result)s where id_files=%(id_files)s")
                cursor.execute(updateQuery, {"result":allresults, "id_files":id_files})

            except ValueError:
                print("Error processing %s" % filename )

            if (total_items%100==0):
                print("\tProcessing time so far of %d items was %.5f" % (total_items, total_processing_time))
                print("\t\tof which ML time was %.5f" % total_ml_time)

        print ("Total processing time of %d items was %.5f" % (total_items,total_processing_time))
        print ("\tof which ML time was %.5f" % total_ml_time)

    def printResults(self):
        cursor = self.cnx.cursor()
        getAllQuery="select filename, f.id_files, name from files f left join results r on f.id_files=r.id_files"
        for (filename, id_files, name) in cursor:
            print("%d %s %s" % (id_files,filename,name))

    def calculateStats(self):
        cursor = self.cnx.cursor()

        print("Updating stats")
        cursor.execute("update files set loosly_ok=false, strict_ok=false")
        cursor.execute("update files set strict_ok=true where id_files in (select id_files from results where name='teddy, teddy bear')")
        cursor.execute("""update files set loosly_ok=true where id_files in (select id_files from results where name in ( 
            'toy poodle',
            'standard poodle',
            'miniature poodle',
            'cocker spaniel, English cocker spaniel, cocker',
            'Airedale, Airedale terrier',
            'wire-haired fox terrier',
            'Welsh springer spaniel',
            'Irish water spaniel',
            'Brittany spaniel',
            'Irish terrier',
            'Bedlington terrier',
            'Eskimo dog, husky',
            'English foxhound',
            'French bulldog'


    def displayStats(self):
        cursor = self.cnx.cursor()

        cursor.execute("SELECT count(*) FROM  `positives` ")
        totalPositives =

        cursor.execute("SELECT count(*) FROM  files ")
        totalFiles =

        cursor.execute("SELECT count(*) FROM  `positives` p left join files f on f.filename=p.filename WHERE f.strict_ok =false")

        cursor.execute("SELECT count(*) FROM  `positives` p left join files f on f.filename=p.filename WHERE f.strict_ok =false and f.loosly_ok = false")

        cursor.execute("select count(*) from files f left join positives p on f.filename=p.filename where f.strict_ok = true and p.id_positives is null")

        cursor.execute("select count(*) from files f left join positives p on f.filename=p.filename where (f.strict_ok =true or loosly_ok = true) and p.id_positives is null")


        print("Total Files: %s" % totalFiles)
        print("Total Positives: %s" % totalPositives)

        print("\nStrict ('teddy bear' found):")
        print("Missed: %s" % missedStrict)
        print("False:  %s" % falseStrict)

        print("\nLoose match ('teddy bear' and dogs):")
        print("Missed: %s" % missedLoose)
        print("False:  %s" % falseLoose)

runner.findCandidates("m:\\Google Drive\\Google Photos (1)")
runner.findPositives("e:\\workspace\\znajdz_optymisie\\using_inception_v4\\images","m:\\Google Drive\\Google Photos") 

Database schema:

CREATE TABLE `files` (
  `id_files` int(11) NOT NULL,
  `filename` varchar(255) NOT NULL,
  `result` varchar(200) DEFAULT NULL,
  `strict_ok` tinyint(1) DEFAULT NULL,
  `loosly_ok` int(11) DEFAULT NULL

CREATE TABLE `positives` (
  `id_positives` int(11) NOT NULL,
  `filename` varchar(200) NOT NULL

CREATE TABLE `results` (
  `id_results` int(11) NOT NULL,
  `id_files` int(11) NOT NULL,
  `score` float NOT NULL,
  `name_id` int(11) NOT NULL,
  `name` varchar(200) NOT NULL

  ADD PRIMARY KEY (`id_files`),
  ADD UNIQUE KEY `filename` (`filename`);

ALTER TABLE `positives`
  ADD PRIMARY KEY (`id_positives`),
  ADD UNIQUE KEY `filename` (`filename`);

ALTER TABLE `results`
  ADD PRIMARY KEY (`id_results`),
  ADD KEY `name` (`name`);

Tensorflow 1.5 built with AVX support

TL;DR – download tensorflow 1.5 with AVX support from the link on the bottom of this post

When running machine learning code on a new hardware using libraries available on PIP we are not using all capabilities provided by our cpu:
2018-01-10 09:35:05.048387: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\platform\] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2

Last night I’ve rebuilt the tensorflow to support AVX CPU instructions. The set up for build takes about an hour. The build itself took 2 hours 20 minutes on my i7-8700k desktop with Windows 10 and hit the computer quite hard.

I’ve used official build manual (, but it doesn’t mention all requirements:
* you need to install numpy in the environment you use for build
* you need to install wheel in the environment you use for build (otherwise it fails after 2 hours of build – sweet)
* if building against cuda9.1 you need to copy math_functions.h from cuda91/include/crt/ to cuda91/include directory (otherwise it fails after 1h of build)

The results?
Sample program without AVX:
start: 2018-01-10 09:35:04.609053
finish:2018-01-10 09:36:00.339329

total: ~55.5s

The same code with AVX:
start: 2018-01-10 09:36:18.167291
finish:2018-01-10 09:36:55.693329

total: ~37.5s

Here is the wheel file with support for AVX tensorflow_gpu-1.5.0rc0-cp36-cp36m-win_amd64.whl if you don’t want to run the build process itself.

And CPU usage during build (I got a new computer yesterday and I’m still excited by new toy :))

Adding wildcards to Google AIY actions on Raspberry Pi.

I’ve been playing with Google AIY on raspberry pi for nearly an hour now and I love it. If you are lucky you can get your kit from issue 57 of The MagPi Magazine.

Google provided a python based example app that recognises the command you spoke to the box and runs your action. The problem with it is that the command needs to match literally without and option to add variable part (a parameter). In real world I want to give parameters to the commands, for example “Add note my note“. So I’ve hacked the app to do just that. Here are the steps:
1. Modify the to recognize patterns. In class class KeywordHandler(object): change the handle method:

class KeywordHandler(object):

    """Perform the action when the given keyword is in the command."""

    def __init__(self, keyword, action):
        self.keyword = keyword.lower()
        self.action = action

    def get_phrases(self):
        return [self.keyword]

    def handle(self, command):
        if("*" in self.keyword):
            match = re.match(self.keyword.lower(), command.lower())
            if match:
                return True
            if self.keyword in command.lower():
                return True
                return False

2. Make sure the action you are running understands that the param given to it is the variable part. I’ve modified the SpeakAction to do just that:

class SpeakAction(object):

    """Says the given text via TTS."""

    def __init__(self, say, words):
        self.say = say
        self.words = words

    def run(self, voice_command):

3. Add new action in make_actor method:

    actor.add_keyword(_('add note (.*)'),SpeakAction(say,"adding $1"))

Have fun!

How to integrate Prestashop e-commerce site with Shopzilla

So you decided to integrate your shop with Shopzilla. If you are reading this in March 2014 then come back in a month to learn is it worth it. If you are decided then read on.

There are some commercial solutions available that will connect your shop to Shopzilla. The cost roughly 100 GBP. It’s not a huge sum, but I would advise you to spend the money on the Shopzilla ads instead to see if it is worth it. Which causes a small problem of uploading the products to Shopzilla.

Shopzilla’s interface seems to be stuck between 1999 and 2005. The only reasonable way they offer is to upload a CSV file for which they provide a documentation with some infuriating errors. And their system does not provide feedback on what you are doing wrong. Are you ready?

In order to create csv file they require I used SQL script and run it through PHPMySQL to save the result in csv format (please note you need to change it to .txt to upload to Shopzilla). Please save it as tab (\t) separated file without quotes around columns and with new line removal.

The script is below. You will have to adjust:
* case section to provide your category mapping
* change yourshop to base url of your shopin product and image url
* adjust id_lang in where statement to change the language of the export. I’m using 1 for English
* quantity in stock is hardcoded to 1000. That will be fine in most cases.
* Leave the shipping cost and bid empty, because Shopzilla suggests setting those parameters by the admin panel, not the feed itself.

The resulting txt file works fine with SEO optimized urls.

when like 'Earrings' then '14138'
when like 'Bracelets' then '14135'
else '14157'
end as Category, AS Manufacturer, AS Title,
') AS 'Product Description',
concat('http://yourshop/',cat.link_rewrite,'/',cast(p.id_product as char),'-',pl.link_rewrite,'.html') as Link,
concat('http://yourshop/',cast(img.id_image as char),'-thickbox_default/',cast(img.id_image as char),'.jpg') as Image,
concat(,'_',cast(p.id_product as char)) as SKU,
'1000' as Stock,
'New' AS 'Condition',
p.weight AS 'Shipping Weight',
'' as 'Shipping Cost',
'' as Bid,
'9 14' as 'Promotional Description',
'' as 'EAN / UPC',
p.price as Price

FROM ps_product p INNER JOIN
ps_product_lang pl ON p.id_product = pl.id_product LEFT JOIN
ps_manufacturer m ON p.id_manufacturer = m.id_manufacturer LEFT JOIN
ps_image img on p.id_product = img.id_product LEFT JOIN
ps_category_lang cat on p.id_category_default = cat.id_category
where img.position=1
and pl.id_lang=1
and cat.id_lang=1

Please let me know what your results in using Shopzilla are. Was it worth for you?

Mapping sales in UK using google geochart

My Jewellery shop generates slow but steady flow of orders, which lets me play with data. Today I’ve try mapping the shipping addresses to a map.

I’ve decided to try Google’s geochart from charts API. There’s simply no way they could make it any easier. They have even created a playground where you can test the api from the browser.

There are some issues with this chart though – the map looks bland, it can’t be zoomed and there’s not real map overlay. Also it take ages to draw unless the markers are given in lat/lon format.

Here’s the result:
orders in uk map

Raspberry Pi, servo motor, gpio, i2c and soldering weekend

This weekend I decided to work on my soldering skills and finally assemble the PWM driver PCA9685 I bought from Adafruit couple weeks back. Not all solder points looks perfect, but I’ve manage not to burn the PCB which I consider a major success. I thank all the
guys how put soldering tutorials on the youtube!


Then I played with software.
* I2C: I used python to steer servo controller via I2C interface. For some reason the Raspbian image has I2C kernel module disabled, so I had to comment blacklist i2c-bcm2708 in /etc/modprobe.d/raspi-blacklist.conf and the add i2c-dev and i2c-bcm2708 to /etc/modprobe to enable them to start.
* GPIO: Raspbian has all libraries loaded by default for GPIO development in python. I had encountered hardware problem instead – there’re many sources on the web describing pin layout of GPIO port, but non of them says which pin is the physical pin 1! I have some gaps in basic knowledge, so I have missed a small rectangle marking P1 – it’s the one nearest to the side of the board in bottom row. See a picture with pins P3, P5, P9, P10 and P11 connected:

And here it is – Raspberry Pi waving The Flag of the United States of America

MGPlayer on Raspberry PI (javafx!)

I’ve just “successfully” run my MG Podcast Player on Raspberry Pi using just released JDK8 for ARM Preview.

Performance isn’t great compared to desktop system and it doesn’t actually play mp3 files (Media are not supported yet), but having ability to run a java 8 application on $35 is amazing by itself!

If you want to try follow the steps on oracle site to install the image and then run:
$ /opt/jdk1.8.0/bin/java -Djavafx.platform=eglfb -jar MGPlayer.jar

* -Djavafx.platform is crucial as it let javafx work on OpenGL ES 2.0 embedded device
* I’ve installed java on my raspbmc image