SIFT + XGBoost
View our interactive Python notebook at Kaggle here.
For this method, we used Scale-Invariant Feature Transform (SIFT) to extract the keypoints and descriptors from images. We then used K-Means clustering to create a BoVW (Bag of Visual Words) histogram representation. Finally, we feed the BoVM histograms to XGBoost.
We used SIFT from OpenCV, K-Means from cuML, and XGBoost from the XGBoost library.
Data preprocessing
We performed the exact same data preprocessing with the HOG + SVM method.
Creating the feature extractor
We again used Optuna to tune the hyperparameters. The hyperparameters we tuned are as follows.
- SIFT hyperparameters: Number of features per descriptor, number of octave layers, contrast threshold, edge threshold, and sigma.
- XGBoost hyperparameters: Number of estimators, maximum depth, and learning rate.
We fixed the number of clusters to 200 as we found it to be the ideal value for the dataset.
We then take the hyperparameters and performed additional fine-tuning by hand as we did with the HOG + SVM method. We also managed to improve the accuracy slightly.
The mechanism for trials, k-fold cross validation, and metrics are the same. The models are available to download here.
Evaluating the classifier
The best hyperparameter configuration is as follows:
1{
2 'n_features': 75,
3 'n_octave_layers': 4,
4 'contrast_threshold': 0.053,
5 'edge_threshold': 18,
6 'sigma': 1.8,
7 'n_estimators': 50,
8 'max_depth': 3,
9 'learning_rate': 0.152
10}We ran the best classifier on the test data. The accuracy is 40.0000%. We plot the confusion matrix as follows.

The large difference of accuracy may be caused by the feature extractors themselves, as traffic signs tend to be more rigid and do not have complex keypoints, which SIFT are better at detecting.
Next, we tried running the sliding window approach to several sample images. The results are as follows.


In the first example, the model did not manage to enclose the traffic sign and found a number of false positives. The model managed to enclose the traffic sign in the second example, but it was wrongly classified.