Retrieval of a complex multimedia event has long been regarded as a challenging task. Multimedia event recounting, other than event detection, focuses on providing comprehensible evidence which justifies a detection result. Recounting enables "video skimming", which not only enhances video exploration, but also makes human-in-the-loop possible for improving the detection result. Most existing systems treat event recounting as a disjoint post-processing step over the result of event detection. Unlike these systems, this doctoral research aims to provide an in-depth understanding of how recounting, i.e., evidence localization, helps in event detection in the first place. It can potentially benefit the overall design of an efficient event detection system with or without human-in-the-loop. More importantly, we propose a framework for detecting and recounting everyday events without any needs of training examples. The system only takes a text description of an event as input, then performs evidence localization, event detection and recounting in a large, unlabelled video corpus. The goal of the system is to take advantage of event recounting which eventually improves zero-example event detection. We present preliminary results and work in progress.
Download Full PDF Version (Non-Commercial Use)