Deep Learning for image classification

Training an image classifier and deploying it to production

Posted by Shubham Chaudhary on April 30, 2017

Image classification is the process of categorizing images into bins.

Working at Zomato - a restaurant search and discovery platform, we have two main sources for images uploaded on the platform:

  1. Images uploaded by the moderators when creating new restaurant listing
  2. Images uploaded by users when they visit a restaurant

At Zomato, we had several use-cases for image classification: 1. 2. 3.

Earlier we had two source from where we could gather food, ambience shot data. When moderators were uploading images for any restaurant, they had an option to mark an image as food or ambience shot. But this data was very limited. Moderators only uploaded very few images, ~10-20 per restaurant. Now if you divide that into two categories, show 5-10 images is not much useful from product standpoint.

Humans can only do so much. We badly needed to automate this and have the capability to moderate all the user generated images.

Apart from the food vs ambience image classification, we had another use-case where we wanted to remove any image that contained humans in it. Automated image moderation - we don’t show images with humans in them.


Dataset Creation:

Food & Ambience

At zomato, we had manually tagged images, marked as food and ambience shots. We downloaded 50,000 each - food and ambience images for classification problem.

Generating dataset for menus was the easiest. At zomato we have tons of menus, manually tagged and clustered into categories (that’s kinda how the company started). We downloaded 50,000 menu images from s3 distributed across randomly selected restaurants on zomato.


Finding the right dataset for humans was tricky. There is a public dataset - Youtube dataset. The problem with this dataset is that, it contains shots like the following image. This contains human, but it can also has characteristics of an ambience shot. This confuses the ambience and human classifiers and leads to incorrect classification.

Youtube dataset didn’t have a lot of face shots in it. To help the model learn face shots, we used lfw dataset.

confusing image

F/A classification

Food Ambiance Image show results before and after classification

FIXME: Fix this