knn over Images

Coding bits!

Posted by Ampus on February 8, 2017

In this blog we will be applying knn(k-nearest neighbor) to classify cats and dogs images. We have taken dataset from Kaggle. There are ~25k images of cats and dogs in the dataset.

Approach:

  • We have taken color histogram as feature vector.
  • Then we have taken images from train folder and trained the knn
  • To test the model we have taken images from test folder
  • Model is giving accuracy of 57%

What do we need to run knn?

Please read this article before jumping the code below. We have taken the labels as :
1 for cat and 0 for dog
Cause knn only takes integer as input for labels.

How we taken the label from the images ?

Using below command we have got the labels:
imgPath.split(".")[0]

How to check accuracy of the model

We have used below method to calculate the accuracy:
acc = len([i for i,j in zip(results.tolist(),test_labels) if i==j])*100.0/results.size
That is number of matches between results and test_labels, divided by results.size and scaled to 100 to get the percentage.

Code in Python

    
    import numpy as np
    import cv2
    from matplotlib import pyplot as plt
    import sys
    import os

    lables=[]
    features=[]

    def extract_feature(img):
        img = cv2.cvtColor(img,cv2.COLOR_BGR2HSV)
        hist=cv2.calcHist([img],[0,1,2],None,[8,8,8],[0, 180, 0, 256, 0, 256])
        hist=cv2.normalize(hist)
        return hist.flatten()

    def knnOverImages(trainPath,testPath):                
        for imgPath in [ f for f in os.listdir(trainPath) if f.endswith(".jpg")]:
            img=cv2.imread(trainPath+"/"+imgPath)
            features.append(extract_feature(img))
            if imgPath.split(".")[0] == "cat":
                lables.append(1)
            else:
                lables.append(0)

        features1=np.array(features)
        lables1 = np.array(lables)
        
        # model knn
        knn=cv2.KNearest()
        knn.train(features1,lables1)
        
        # test knn               
        testData=[]
        test_labels=[]

        for imgPath in [ f for f in os.listdir(testPath) if f.endswith(".jpg")]:
            if imgPath.split(".")[0] == "cat":
                test_labels.append(0)
            else:
                test_labels.append(1)
            
            testData.append(extract_feature(cv2.imread(testPath+"/"+imgPath)))

        testData=np.array(testData)
        test_labels=np.array(test_labels)
        ret, results, neighbours, dst = knn.find_nearest(testData,9)
        
        # Now we check the accuracy of classification
        # For that, compare the result with test_labels and check which are wrong
        accuracy = len([i for i,j in zip(results.tolist(),test_labels) if i==j])*100.0/results.size
        print accuracy

    if __name__=="__main__":    
        # get path where train and test images are present
        trainPath=sys.argv[1] # path where test images are present
        testPath =sys.argv[2] # path where train images are present
        knnOverImages(trainPath,testPath)