Paper:
Evaluating Bag-of-Visual-Words Representations in Scene Classification
Yang, J., Ngo, C-W., Hauptmann, A., Jiang, Y-G.,
ACM Multimedia Information Retrieval Workshop (MIR 2007) at ACM Multimedia 2007, Augsburg, Germany, September 28-29, 2007

Abstract:   Based on keypoints extracted as salient image patches, an image can be described as a "bag of visual words" and this representation has been frequently used in the classification of imagery data. The representation choices regarding the dimension, selection, and weighting of visual words are crucial to the classifcation performance but have not been thoroughly studied in existing works. Given the analogy between this image representation and the bag-of-words representation of text documents, we apply techniques widely used in text categorization, including term weighting, stop word removal, feature selection, to generate image representations that differ in the dimension, selection, and weighting of visual words. The impact of these representations choices to scene classification is studied through extensive experiments on the TRECVID and PASCAL collections. This study provides an empirical basis for designing visual-word representations that are likely to produce superior classi¯cation performance.

Close