|
Informedia: Integrated Speech, Image, and Language Understanding for the Creation and Exploration of Digital Video Libraries
Carnegie Mellon University Informedia Digital Video Library NSF Cooperative Agreement IRI 9411299
Quarterly Report, February 1998 Howard D. Wactlar, Project Director
Following is a brief summary of research and implementation progress for the period 1 November 1997 to 31 January 1998. In this period we have: (1) improved skim selection and enhanced image matching, (2) begun constructing a Web version of the Informedia client, and (3) released version 0.91 of the software to the testbed site
Speech, Image, and Language Understanding for Library Creation
Dynamic language models
Incorporated a dynamic language-model specialized to exploit recent context, reducing the speech-recognition word-error rate from 55% to 34% on broadcast-news. By automatically analyzing transcripts and daily generating a new language model tuned to the specific vocabulary used, this approach increases processing speed, reduces perplexity, and significantly improves recognition accuracy.
Video OCR
Integrated Video OCR, which recognizes and extracts video captions, into the database version of Informedia. New data is now processed automatically and continuously. In addition, several improvements have increased recognition rates:
- Better text detection by using matched filtering
- Increased word recognition by using a character-stroke-width evaluation filter for extracting the best image of a caption
- Integrating words and captions that occur together.
The previous VOCR module created a separate annotation for each text area it found within a video image. Frequently, however, texts within an image are closely related. "Wolf Blitzer," for example, may appear over "Atlanta," and a user might wish to include both keywords in a single query. The new VOCR module recognizes this co-occurrence, merging such texts into one annotation. This integration aids user understanding, especially for longer text areas, such as sentences, that were formerly divided across separate annotations.
Color clustering
Improved image-retrieval by exploiting color-cluster and image-region characteristics. Tested image-retrieval performance using a large image set (10,000 images) from the Informedia database and modularized the method to facilitate integrating it into the Informedia system.
Library Exploration
Added a document-distribution visualization capability to the IDVLS retrieval model. The Vibe "parallel query'' technique represents query-result documents as icons positioned on the display according to the frequency of successful key-term matches between query and target document. Recently implemented in the client, the Vibe prototype allows a user to compare and filter large numbers of retrieved segments.
Data Organization, Networking Architecture, and Interoperability
Client and data enhancements
Fully integrated face- and color-histogram matching into IDVLS. The match processes can be easily initiated, the results sorted by date or relevancy, and match locations appear in temporal context (as bars on the video-player scrollbar, for example, or "notches" in "filmstrip" sequences)
Improved the information display that a user sees while browsing hierarchically through the video library and extended "headlines" (video clip titles) to 128 char. Implemented better "filmstrip" heuristics so that, when the system maps an oversegmented set of shot images to a smaller one, the resulting presentation retains those shots containing match locations. Also revised the "video-skim" format to improve playback reliability.
Created a single, front-end application that works with the Informedia video library in either of two formats: the current custom, flat-file design or a relational database housing library metadata. The application facilitates testing the relational database, while enabling continued demonstrations and external customer support during transition to the new design.
Revised video-manipulation routines and playback-interface support by replacing Media Control Interface with ActiveMovie. Finalized the move to RDBMS as a first step toward a single code base and a scaleable, distributed architecture. Also optimized data-access mechanisms to facilitate the transition to a commercial relational database and an architecture that provides complete client/server separation.
Interface evaluation
Revised the IDVLS client interface based on evaluations using several HCI techniques, including contextual inquiry, heuristic evaluation, cognitive walkthrough, and think-aloud protocols.
Conducted informal user interviews and began a descriptive study of extended IDVLS usage in the K-12 school testbed environment.
External Interactions
Visitors and industry contacts
- E. Ehrlich, former Under Secretary of Commerce for Economic Affairs (Feb)
- J. Birnbaum, Hewlett Packard (Feb)
- W. Drabik, Chief, and M. Walker, Program Manager, Electronics Systems Section, Immigration and Naturalization Service, U.S. Department of Justice (Mar)
- J. Nacchio, President and CEO, and N.S. Shafei, Executive VP, Products, Qwest Communications (Mar)
- D. Steier, Price Waterhouse (Mar)
- W. Wulf, National Academy of Engineering Regional Meeting (Mar)
- M. Greis, Director University Relations, IBM (Mar)
- K. Ohnish, Director of Research, and H. Ohnish, Senior Manager, Glory Inc., Japan (Mar)
- B. Schatz, University of Illinois at Urbana-Champaign (Apr)
- P. Dietz, Technical Staff, Walt Disney Imagineering Research & Development, Inc. (Apr)
- R. Scott, Australian Caption Centre (Apr)
- W. Chew (Apr)
- Kay, Disney (Apr)
- Motorola (Apr)
Presentations
M. Christel gave an invited talk, "Multimedia Abstractions in the Informedia Digital Video Library," at the Workshop on Digital Libraries, Ohio State University (Mar).
Demonstrations for Congressional and Executive Staff at the Highway-1 Symposium, sponsored by NCO for HPCC, Washington, DC (Mar).
Publications and Conference Papers
[Christel et al. 98]
Christel, M.G., M.A. Smith, C.R. Taylor, and D.B. Winkler. Evolving Video Skims into Useful Multimedia Abstractions. In C. Karat, A. Lund, J. Coutaz, and J. Karat (editors), Proceedings of the CHI '98 Conference on Human Factors in Computing Systems, pages 171-178. ACM, April, 1998. Los Angeles. URL: http://www.cs.cmu.edu/~christel/CHI98/CHI98.htm.
[Faloutsos 98] Faloutsos, C.
Applications, Requirements, and Databases Tools for Massive Data Mining. In Proceedings of Advances in Digital Libraries (ADL'98), IEEE, April, 1998. Invited talk. Santa Barbara, CA.
[Hauptmann and Witbrock 98]
Hauptmann, A.G., and M.J. Witbrock. Story Segmentation and Detection of Commercials in Broadcast News Video. In Proceedings of Advances in Digital Libraries (ADL'98). IEEE, April, 1998. Santa Barbara, CA.
[Hauptmann et al. 98]
Hauptmann, A.G., R.E. Jones, K. Seymore, M.A. Siegler, S.T. Slattery, and M.J. Witbrock. Experiments in Information Retrieval from Spoken Documents. In Proceedings of the 1998 DARPA Broadcast News Transcription and Understanding Workshop (BNTUW-98). DARPA, February, 1998. Lansdowne, VA.
[Hauptmann, Lee, and Kennedy 98]
Hauptmann, A.G., D. Lee, and P.E. Kennedy. Semantic Topic Labeling of Multilingual Broadcast News in the Informedia Digital Video Library. In Proceedings of the Network Operations and Management Symposium (IFIP'98). IEEE, February, 1998. New Orleans.
[Lafferty 98]
Lafferty, J. Statistical Models for Text Segmentation. Machine Learning, 1998. To appear.
[Sato et al. 98]
Sato, T., T. Kanade, E. Hughes, M. Smith, and S. Satoh. Video OCR: Indexing Digital News Libraries by Recognition of Superimposed Captions. ACM Multimedia Systems, 1998.
[Wactlar 98]
Wactlar, H.D. Lessons learned in the Informedia DVL. In Proceedings of Advances in Digital Libraries (ADL'98), IEEE, April, 1998. Invited talk. Santa Barbara, CA.
____________________
I certify that to the best of my knowledge (1) the statements herein (excluding scientific hypotheses and scientific opinions) are true and complete, and (2) the text and graphics in this report as well as any accompanying publications or other documents, unless otherwise indicated, are the original work of the signatories or individuals working under their supervision. I understand that the willful provision of false information or concealing a material fact in this report(s) or any other communication submitted to NSF is a criminal offense (U.S. Code, Title 18, Section 1011).
Howard D. Wactlar Project Director 16 Oct 98
|