[en] Supervised learning introduces genericity in the field of image classification, thus enabling fast progress in the domain. Genericity does not imply ease-of-use, however, and the best methods in term of accuracy, namely convolutional neural networks, suffer from its lack. In this master thesis, we propose an alternative approach relying on extremely randomized trees and random subwindow extraction combine with elements of the convolutional networks. We explore two modes of utilization of the forest: primarily a direct approach where the forest is the final classifier (ET-DIC) and to a lesser extent, a preprocessing step where the forest is used to build a visual dictionary but where the actual classification is undertaken by a support vector machine (ET-FL).
We show that, in both modes, our scheme performs better than without using the convolutional network elements but we are not quite yet reaching their performances. The ET-DIC variant keeps more in the line of classification forest advantages but performs less well as far as accuracy is concerned. This is further highlighted by the remarkable stability of the ET-DIC mode. This stability accounts for the ease-of-use of the method but also prevents elaborated optimization. We were able to score an accuracy of 0.613 whereas the record for this mode without the convolutional network elements was of 0.5367.
The ET-FL produces better results at the cost of a greater variability of accuracy due to the loss of the ability to favor the interesting filters and a greater overfitting, consequence of the loss of the ensemble smoothing effect. The accuracies range from 0.55 to 0.7431 depending on the choice of hyper-parameters.
The computational cost of both methods is much greater than with a traditional forest, however.