![]() |
|||||||||||||||||
|
Carnegie Mellon University School of Computer Science 5000 Forbes Avenue Pittsburgh, PA 15213 informedia@cs.cmu.edu |
|||||||||||||||||
|
|||||||||||||||||
Project Description The Informedia project is producing significant advancements in the automatic generation of video summarizations over very large archives of video segments utilizing the Informedia Project infrastructure processing and integration techniques. We intend to extend and modularize the underlying Informedia processing, query and display infrastructure with standardized interfaces to enable its use as a demonstration and testbed vehicle for the work of other researchers in video understanding component technology. This proposal addresses the ARDA desired target capabilities for:
In particular, this research will:
We will develop a new prototype video analysis system that automatically processes video data, indexes the extracted data and provides mechanisms for search and retrieval. The system will include the current CMU face and text detection and recognition abilities already under development in the Informedia Project, as well as CMUs Sphinx speech recognition system. Additional image processing will determine shot boundaries and allow for image similarity comparisons. Combining features from text, speech and image analysis will enhance the performance as well as the quality of the video metadata extraction processes, compared to processing each modality in isolation. All derived metadata will be indexed in support of more efficient query interfaces. We will initially populate the VACE digital video library with automatically processed data from broadcast news sources such as CNN, and will start with MPEG-I video formats, with the goal of later handling other formats, including MPEG-II video data streams.
Figure 1: Video summarizer metadata extraction. This system architecture will be modularized to support
tailorability, where the video processing modules for shot detection,
text and face detection/recognition can be replaced with other components
developed at other research organizations, including CMU. The system will
also allow for the insertion of new vehicle and object detection and recognition
modules, based on a standardized interface definition. The metadata extracted
by the new modules will be automatically searchable in the system. This
metadata will also be time-aligned to the processed video, enabling it
to be used in conjunction with other synchronized metadata for building
efficient,
Figure 2: Conceptual view of the processing architecture. Beyond providing a plug and play framework for other processing
modules, the research will also develop video summaries in the form of
"collages", using the metadata generated from the modules along
with any available collateral data such as manually generated transcripts
and closed-captioned text. Video information collages will be built by
advancing information visualization research to effectively deal with
multiple video documents. A video information collage is a presentation
of text, images, audio, and video derived from multiple video sources
in order to summarize, provide context, and communicate aspects of the
content from the originating set of sources. The collages to be investigated
include chrono-collages emphasizing time sequences, geo-collages emphasizing
spatial relationships, and auto-documentaries, which preserve the video's
temporal nature. Users will be able to interact with the video collages
to generate multimodal queries across time, space, and sources. Video
collages can be made adaptive by giving preference to the concepts and
query terms in the user's interaction history. The synthesis and summarization
functions underlying these collages will be made possible through extensions
of text clustering and Expectation-Maximization algorithms to video and
audio features.
|
|||||||||||||||||
| About VACE I | Ongoing Project Info | VACE I Reports || VACE II | Sponsor | INFORMEDIA HOME | |||||||||||||||||