Abstract:
This thesis focuses on a subject of major interest for the computer vision research community: the human computer interaction. The main problem to solve is to allow human subjects to interact with computers in a natural way, without requiring important computational and storage resources or expensive equipment. The thesis presents new solutions related to the tracking of fingers/hands in video sequences and the recognition of dynamic hand gestures based on trajectory. On the tracking related side, new solutions are proposed for the tracking initialization and for the tracking process itself. Multiple features like color, foreground, shape and size are used within a finite state machine to provide a safe tracking initialization with few spatial and temporal constraints. The same basic features are integrated to generate a sparse representation using line strips, followed by multiple cascaded clustering and filtering stages of the new features leading to finger/hand identification and localization. The proposed tracking solution proved robustness in many challenging situations including motion blur, scale changes, fast motion and occlusions. The gesture recognition contributions include the thoughtful choice of the gesture alphabet – the gestures are successions of line strokes – and solutions for the processing of trajectory and gesture recognition. The median filter and mean shift clustering are used for trajectory segmentation, leading to a symbolic representation of the trajectory, which reduces the task of gesture recognition to a comparison of the symbolic representations of the trajectory and gesture prototypes. The results evaluations indicate that the newly proposed solutions achieve the pursued goal – real-time running capability, using inexpensive hardware systems – without important sacrifices in terms of naturalness, precision and robustness.