Informedia Digital Video Library:  Digital video library research at Carnegie Mellon School of Computer Science
nav graphic

 

 

 

 

 

 

 

 

 

 

 

 

 


      

      Google
 

  Carnegie Mellon University
  School of Computer Science
  5000 Forbes Avenue
  Pittsburgh, PA 15213
  informedia@cs.cmu.edu

 
 
Project Title:
Exploiting Continuously Captured Distributed Video Sources with Errorful Interpretation
PI:
Howard Wactlar
Sponsors:
CIA, NSA, DIA

Project Description

Vast amounts of surveillance video, broadcast television and online multimedia overwhelm the human resources from the worldwide intelligence community who must watch and listen to it all. This research advances the ability to discover and track individual and group relationships from video sources. Emphasizing data mining link analysis and learning from extracted text and co-occurring imagery, the relationships amongst detected individuals and groups across locations and over time are extracted, visualized and summarized from continuously recorded and widely distributed video and other multimedia information sources. These include broadcast radio and television, field captured and surveillance video.

Objective

Discover and track individual and group relationships from video sources.
Extract, visualize and summarize relationships amongst detected individuals across locations and over time from continuously recorded and widely distributed video and other multimedia information sources, such as broadcast radio and tv, field captured and surveillance video. Utilize existing and developing Informedia Digital Video Library processing infrastructure. Develop and integrate technology to automatically detect and resolve named entities and visual attributes (which may be ambiguous and errorful) such as spoken names, pictured faces, mentioned and referenced locations, dates, and times. Augment automatic video analysis results with confidence metrics addressing the varying error rates, and "harden" such output with correlated data to improve its accuracy. Extract and classify link information and assess its reliability or confidence. Generate visualizations revealing temporal (e.g., timelines) and geo-spatial (e.g., map overlays) relationships. These static, dynamic and interactive visualizations of associated individual interactions, group activities, and events will facilitate analysis and knowledge discovery. Long-term objectives will scale the capabilities to worldwide video sources in multiple languages and advance the period of analysis in time from retrospective to contemporaneous.

Figure 1: The analyst queries a large video database for identifications of Bin Laden couriers. The result produces a temporal and graphic plotting of Bin Laden couriers with video segments from corresponding times and/or places. The selected video samples can be played and, at the same time, a map is shown that plots courier sightings at corresponding times and places.
click on image to enlarge
 
Figure 2: The plotted dots on this visualization show relationships among key Al-Qaeda and Taliban members, again culled from visual media reports (simulated) . Relevance and date sliders can be color or size coded and adjusted to specific relevance or time parameters. In this example, the size of the plotted squares denote4 relevance, with the larger squares indicating greater relevance. The color denotes time, with the pinker squares indicating more recent stories and the bluer squares less recent.
click on image to enlarge
 

Background

Exploit operational Informedia DVL infrastructure and technology.
The Informedia Digital Video Library was the only project funded by NSF as part of the Digital Library Initiative in both Phases I and II, to focus specifically on information extraction from video and audio content. Informedia pioneered the extraction of textual, visual and geographical information from video and audio streams. Over two terabytes of online data was collected, with automatically extracted metadata and indices for retrieving videos from this library. Fundamental research and prototyping was done in the following areas as part of this multi-year, multi-researcher activity, shown with a sampling of references to particular work:

  • Integration of speech, language, and image processing: generating multimedia abstractions, segmenting video into stories, and tailoring presentations based on context [Wactlar99, Christel 97, Christel/Martin98].

  • Text processing: capsule generation [Hauptmann97], text clustering and topic classification [Yang98, Lafferty98, Hauptmann/Lee98], and information retrieval from spoken documents [Hauptmann/Wactlar97, Hauptmann/Witbrock97, Hauptmann98].

  • Named entity extraction and disambiguation of named faces and geographical references [Houghton99; Hauptmann/Olligschlaeger99]

  • Image and video processing: face detection [Rowley95, Schneiderman00] and image similarity matching based on regions, textures, and perception [Gong98], generation of video skims [Smith/Kanade97], video OCR [Sato98], and video trails [Kobla97].

The Informedia processing tools provide state of the art access to video through extracted textual and image information. The system is also being utilized as an integration and demonstration testbed for the various research activities supported by the ARDA Video Analysis and Content Extraction (VACE) program. The proposed work will discover individuals linked across time, space, and multimedia sources by emphasizing data mining link analysis and learning from extracted text and co-occurring imagery.

Data Requirements

This effort will primarily utilize open source tv and radio broadcasts and Internet news sources. The CMU Informedia archive has over 2000 hours of broadcast content. Some additional data that is representative of telephone and face-to-face dialogue may also be used to test information extraction and analysis. An understanding of the methodologies of the intelligence analyst will enhance the utility of the visualizations. Access to such specialists will enable studies using contextual inquiry and design, cognitive walkthroughs, and think-aloud protocols to be employed, resulting in revised interfaces with enhanced functionality and increased efficiency and effectiveness.

 

nav grpahic
INFORMEDIA HOME
topCopyright 1994-2002 Carnegie Mellon and its licensors.  All rights reserved.