In this paper, we present an efficient video shots retrieval system based on local feature detection, description and matching. A face tracker is first used to obtain information on faces in different viewpoints. A visual vocabulary is built off-line using an invariant descriptor computed on tracked character face regions in all shots. The vocabulary is refined in two ways to make the retrieval system more efficient. Firstly, the visual vocabulary is minimized by only using facial features selected on face regions which are detected by an accurate face detector. Secondly, three criteria, namely Inverted-Occurrence-Frequency Weights, Average Feature Location Distance and Reliable Nearest-Neighbors, are calculated in advance to make the on-line retrieval procedure more efficient and precise. The proposed system is experimented on the movie "Groundhog Day". The results show that our technique is very effective and efficient on video shots retrieval.
|Publication status||Published - Oct 2009|
|Event||IMCE '09 - 1st International Workshop on Interactive Multimedia for Consumer Electronics - Beijing, China|
Duration: 1 Oct 2009 → …
|Workshop||IMCE '09 - 1st International Workshop on Interactive Multimedia for Consumer Electronics|
|Period||1/10/09 → …|