In this blog we will be applying knn(k-nearest neighbor) to classify cats and dogs images. We have taken dataset from Kaggle. There are ~25k images of cats and dogs in the dataset.
- We have taken color histogram as feature vector.
- Then we have taken images from train folder and trained the knn
- To test the model we have taken images from test folder
- Model is giving accuracy of 57%
What do we need to run knn?Please read this article before jumping the code below. We have taken the labels as :
1 for cat and 0 for dog
Cause knn only takes integer as input for labels.
How we taken the label from the images ?Using below command we have got the labels:
How to check accuracy of the modelWe have used below method to calculate the accuracy:
acc = len([i for i,j in zip(results.tolist(),test_labels) if i==j])*100.0/results.size
That is number of matches between results and test_labels, divided by results.size and scaled to 100 to get the percentage.
Code in Python
import numpy as np import cv2 from matplotlib import pyplot as plt import sys import os lables= features= def extract_feature(img): img = cv2.cvtColor(img,cv2.COLOR_BGR2HSV) hist=cv2.calcHist([img],[0,1,2],None,[8,8,8],[0, 180, 0, 256, 0, 256]) hist=cv2.normalize(hist) return hist.flatten() def knnOverImages(trainPath,testPath): for imgPath in [ f for f in os.listdir(trainPath) if f.endswith(".jpg")]: img=cv2.imread(trainPath+"/"+imgPath) features.append(extract_feature(img)) if imgPath.split(".") == "cat": lables.append(1) else: lables.append(0) features1=np.array(features) lables1 = np.array(lables) # model knn knn=cv2.KNearest() knn.train(features1,lables1) # test knn testData= test_labels= for imgPath in [ f for f in os.listdir(testPath) if f.endswith(".jpg")]: if imgPath.split(".") == "cat": test_labels.append(0) else: test_labels.append(1) testData.append(extract_feature(cv2.imread(testPath+"/"+imgPath))) testData=np.array(testData) test_labels=np.array(test_labels) ret, results, neighbours, dst = knn.find_nearest(testData,9) # Now we check the accuracy of classification # For that, compare the result with test_labels and check which are wrong accuracy = len([i for i,j in zip(results.tolist(),test_labels) if i==j])*100.0/results.size print accuracy if __name__=="__main__": # get path where train and test images are present trainPath=sys.argv # path where test images are present testPath =sys.argv # path where train images are present knnOverImages(trainPath,testPath)