|
|
|
|
|
Project Title:
|
Exploiting Continuously
Captured Distributed Video Sources with Errorful Interpretation |
|
PI:
|
Howard Wactlar |
|
Sponsors:
|
CIA, NSA, DIA |
Vast amounts of surveillance video, broadcast
television and online multimedia overwhelm the human resources from the
worldwide intelligence community who must watch and listen to it all.
This research advances the ability to discover and track individual and
group relationships from video sources. Emphasizing data mining link analysis
and learning from extracted text and co-occurring imagery, the relationships
amongst detected individuals and groups across locations and over time
are extracted, visualized and summarized from continuously recorded and
widely distributed video and other multimedia information sources. These
include broadcast radio and television, field captured and surveillance
video.
Discover and track individual and group
relationships from video sources.
Extract, visualize and summarize relationships
amongst detected individuals across locations and over time from continuously
recorded and widely distributed video and other multimedia information
sources, such as broadcast radio and tv, field captured and surveillance
video. Utilize existing and developing Informedia Digital Video Library
processing infrastructure. Develop and integrate technology to automatically
detect and resolve named entities and visual attributes (which may be
ambiguous and errorful) such as spoken names, pictured faces, mentioned
and referenced locations, dates, and times. Augment automatic video analysis
results with confidence metrics addressing the varying error rates, and
"harden" such output with correlated data to improve its accuracy.
Extract and classify link information and assess its reliability or confidence.
Generate visualizations revealing temporal (e.g., timelines) and geo-spatial
(e.g., map overlays) relationships. These static, dynamic and interactive
visualizations of associated individual interactions, group activities,
and events will facilitate analysis and knowledge discovery. Long-term
objectives will scale the capabilities to worldwide video sources in multiple
languages and advance the period of analysis in time from retrospective
to contemporaneous.
 |
Figure 1: The
analyst queries a large video database for identifications of Bin
Laden couriers. The result produces a temporal and graphic plotting
of Bin Laden couriers with video segments from corresponding times
and/or places. The selected video samples can be played and, at the
same time, a map is shown that plots courier sightings at corresponding
times and places. |
|
click on image
to enlarge
|
|
 |
Figure 2: The
plotted dots on this visualization show relationships among key Al-Qaeda
and Taliban members, again culled from visual media reports (simulated)
. Relevance and date sliders can be color or size coded and adjusted
to specific relevance or time parameters. In this example, the size
of the plotted squares denote4 relevance, with the larger squares
indicating greater relevance. The color denotes time, with the pinker
squares indicating more recent stories and the bluer squares less
recent. |
|
click on image
to enlarge
|
|
Exploit operational Informedia DVL infrastructure and
technology.
The Informedia Digital Video Library was the only project
funded by NSF as part of the Digital Library Initiative in both Phases
I and II, to focus specifically on information extraction from video and
audio content. Informedia pioneered the extraction of textual, visual
and geographical information from video and audio streams. Over two terabytes
of online data was collected, with automatically extracted metadata and
indices for retrieving videos from this library. Fundamental research
and prototyping was done in the following areas as part of this multi-year,
multi-researcher activity, shown with a sampling of references to particular
work:
- Integration of speech, language, and image processing:
generating multimedia abstractions, segmenting video into stories, and
tailoring presentations based on context [Wactlar99, Christel 97, Christel/Martin98].
- Text processing: capsule generation [Hauptmann97],
text clustering and topic classification [Yang98, Lafferty98, Hauptmann/Lee98],
and information retrieval from spoken documents [Hauptmann/Wactlar97,
Hauptmann/Witbrock97, Hauptmann98].
- Named entity extraction and disambiguation of named
faces and geographical references [Houghton99; Hauptmann/Olligschlaeger99]
- Image and video processing: face detection [Rowley95,
Schneiderman00] and image similarity matching based on regions, textures,
and perception [Gong98], generation of video skims [Smith/Kanade97],
video OCR [Sato98], and video trails [Kobla97].
The Informedia processing tools provide state
of the art access to video through extracted textual and image information.
The system is also being utilized as an integration and demonstration
testbed for the various research activities supported by the ARDA
Video Analysis and Content Extraction (VACE)
program. The proposed work will discover individuals linked across time,
space, and multimedia sources by emphasizing data mining link analysis
and learning from extracted text and co-occurring imagery.
This effort will primarily utilize open source tv and
radio broadcasts and Internet news sources. The CMU Informedia archive
has over 2000 hours of broadcast content. Some additional data that is
representative of telephone and face-to-face dialogue may also be used
to test information extraction and analysis. An understanding of the methodologies
of the intelligence analyst will enhance the utility of the visualizations.
Access to such specialists will enable studies using contextual inquiry
and design, cognitive walkthroughs, and think-aloud protocols to be employed,
resulting in revised interfaces with enhanced functionality and increased
efficiency and effectiveness.
|
|