Informedia Digital Video Library:  Digital video library research at Carnegie Mellon School of Computer Science
nav graphic

 

 

 

 

 

 

 

 

 

 

 

 

 


      

      Google
 

  Carnegie Mellon University
  School of Computer Science
  5000 Forbes Avenue
  Pittsburgh, PA 15213
  informedia@cs.cmu.edu


ENVIE: Extensible News Video Information Exploitation
About VACE II  ||   VACE I   |  Sponsors  
PI:
Howard Wactlar
Core Personnel:
Mike Christel, Pinar Duygulu, Alex Hauptmann, Dorbin Ng, Jie Yang
Sponsor:
ARDA VACE Phase II - Advanced Research Development Agency (ARDA) Video Analysis and Content Exploitation (VACE)

Project Description

This project will develop systems and tools to automatically detect, extract, and report high interest people, patterns, and trends in visual content from foreign news. These broadcasts can provide highly valuable information, but are currently not sufficiently exploited for intelligence purposes due to the costs of analyzing such data manually. This project will increase the breadth and quality of core-level visual processing components for automatic detection of features such as speakers, logos, and location. Through integration of the core-level detectors working in multiple modalities, the project will identify people and object relationships across time and place, leading to the derivation of comprehensive video events. The automatic extraction and fusion of foreign news features will be packaged in video browsing and summarization interfaces facilitating efficient, effective access to material meeting the analysts' needs.

This project will build an innovative analyst extensible system for foreign broadcast news exploitation that puts the analyst in control to better accommodate novel situations and source material. Specifically, the analyst will be able to identify a need for a class of video detection, adeptly supply training material for that class, and iteratively evaluate and improve the resulting automatic classification produced via machine learning and rule-based techniques (see figure 1). The extensible system accounts for needs that might arise with future world events, allows for localized optimizations for a new video source from a particular political region, and enables classifiers to be developed and tested within a secure environment with no outside intervention. The resulting system, named ENVIE for Extensible News Video Information Exploitation, builds from ten years of Informedia digital video understanding research conducted at Carnegie Mellon University.


Figure 1: Example of extensible interaction: Analyst refines processing to emphasize tanks.

ENVIE will be designed to deal with the specific challenges of foreign broadcast news, in which visual elements can ameliorate the translation problem, serving as anchors relating stories dealing with the same topic but across languages. Foreign news television broadcasts have a wealth of information, but delivered with cultural and political perspectives that skew the views from different sources. Browsing interfaces will be developed to showcase both commonalities across reports from multiple sources as well as different biases that may be as interesting to the analyst as the source material itself.

ENVIE will make use of Carnegie Mellon's successful legacy of speech recognition, computer vision, machine learning, and information retrieval for multimedia metadata, integrating these technical components into a foreign news exploitation system delivering the following capabilities:

  • Enhanced, broad, shot-level visual classifiers, enabled through an emphasis of their temporal characteristics and improved by the redundancy between frames and relatively accurate determination of camera and object motion.

  • Evidence aggregation to label video sequences with higher order semantics, based on reinforcing cues from visual classifiers, scene text, overlay text, redundant footage, as well as the consideration of additional cues from the foreign broadcast news audio, resulting in multimodal mining where limitations in one modality are compensated with input from others; the ultimate goal being higher order event-based summarization.

  • Support for visual browsing and contextual understanding using automated design of visualizations that can be absorbed at a glance specific to the user, task, data, and analysis history, offer drill-down functionality to supporting material, and have explicit representations for varying amounts of credibility based on the amount of evidence, error rates of automatic processing, and authorship of material.

  • Adaptive, analyst-defined criteria for extending automated processing, detection and extraction, i.e., an analyst toolkit to tailor the system for particular data and needs. The end user is no longer restricted to being a passive consumer of pre-built classifiers, but can reprocess video to emphasize desired aspects and lay out interfaces reporting on significant data.

These points will be demonstrated across languages in a foreign broadcast news corpus, highlighting the importance of visual material for exploiting such data and illustrating the utility of video browsing and summarization interfaces for comparing and contrasting viewpoints across cultural and language boundaries. We will make use of specialized processing of Mandarin news, developed during prior technical collaborations with the Chinese University of Hong Kong, and will further tailor detection of the Chinese character set in overlay and scene text. Similarly, our experience in dealing with the European Chronicles Online community and videos in French, Dutch, and German will help with the multimedia processing of foreign broadcast news in European cultures. Specifically, we will deal with Mandarin news and German news, in addition to U.S. news, as received through our university's cable international channel.

 

nav graphic
About VACE II  ||    VACE I   |   Sponsors   |   INFORMEDIA HOME
topCopyright 1994-2002 Carnegie Mellon and its licensors.  All rights reserved.