|
Informedia: Integrated Speech, Image, and Language Understanding for the Creation and Exploration of Digital Video Libraries
Carnegie Mellon University Informedia Digital Video Library NSF Cooperative Agreement IRI 9411299
Quarterly Report, May 19966
Howard D. Wactlar, Project Director
Following is a brief summary of research and implementation progress for the period 1 February to 30 April. In this period we have: (1) improved skim selection and enhanced image matching, (2) begun constructing a Web version of the Informedia client, and (3) released version 0.91 of the software to the testbed site
Speech, Image and Language Understanding for Library Creation
Video Skimming We have automated poster frame creation using scene breaks, camera motion, and detection of blank frames and low intensit images. Poster frames are used for displaying filmstrip and iconic representations of video segments. We are currently conducting a user study to test the usefulness of poster frames. Video skims are now created from keywords generated by TF/IDF weights, video structure from camera motion, and detection of human faces and text. Previously, skims only incorporated scene breaks and TF/IDF keywords. We have also incorporated audio level data into skim creation, but this will require further work before it is streamlined into the production process. Face and text detection results have recently been made available for use in characterization of news video. Face & Name Association We developed a face and name association method called "Name-It." The system is given news video footage (including image sequences and transcriptions). The system extracts faces from image sequences using image understanding techniques, and simultaneously extracts names from the footage using natural language processing techniques. It then evaluates the "co-occurrence" of faces and names - as well as face similarity - using eigen-image based methods, finally to achieve face and name association. After the system has obtained face and name association from appropriate video footages, it can determine the name of a given unknown face, or produce candidate faces from a given name.
Library Exploration
Color Image Similarity Matching We have implemented color image similarity matching using hue-based color histograms. We use a database loader program which reads color images and generates color histograms, and a client library which provides image similarity by matching referred color histograms. The technique works well, requiring only a few seconds retrieval time with several thousand images. The library was ported to PC platforms and has been linked preliminarily into Informedia clients. Information Retrieval We have built a pursuit search engine that uses keyword spotting, stop words, and synonyms, and that grants a bonus to search terms with primacy. We are working to improve this simple but effective tool by measuring and improving search performance. Future work will focus on phrase matching, more complex matches (e.g., war in Vietnam == Vietnam war), using information from the search to improve the rank of the correct story, and multimodal integration. Spoken Query Interface Our spoken query interface now has a modified 20,000 word language model. It uses a network server for speech recognition and has been ported to Windows95. We will explore continuous listening and moving the recognition engine to a PC platform. integration.
Testbeds, Specialized Corpuses, and User Studies
Winchester Thurston Testbed We delivered our first testbed system to Winchester Thurston with 8 client stations and a 50 hour video library, and provided training for Winchester Thurston faculty on how to use the system. We have collected feedback, and presented early interface and transaction log results at the NSF site visit and at the DLI meeting in Michigan.
Data Organization, Networking Architecture, and Interoperability
We ported the database API and other utilities used in library creation to 32-bit OS (Windows95). This allows more flexibility in the client which often ran into memory barriers in 16-bit Windows. We changed the format of our catalog and database to binary versions to increase speed and efficiency for use in client. In an effort to leverage off of commercial products to deliver the media within our library, we visited Dec's facility in Shrewsbury, MA, to discuss their latest "MediaPlex" product. It seems to hold promise both in terms of scalability to a WAN, and additionally provides streaming support over (high bandwidth) Ethernets. The server does not solve the bandwidth problem of serving video over the current Internet infrastructure. We began a joint effort with ISI and ARPA (under Steve Gersh's direction), to apply Informedia technology to content provided by ARPA, the goal being both to seed their own Video library as well as to showcase the Informedia client to access the library. We worked with the ISI to spec out the necessary hardware requirements as well as the logistics of a work flow between them and us to produce the library. Early attempts were made at the beginning of the year to continue the previous years' interoperability experiments with either the same (NMIS/MIT) or other partners. We did find some interest on the part of the Princeton DVL, and met with one of their PIs, Wayne Wolf, to discuss some possibilities. While we haven't yet begun any such partnership, we are organizing an interoperability workshop together (specifically for Video Libraries) at this year's ACM Multimedia Conference in November. We began building RPC versions of the current database API as a strategy to make our library accessible to other clients. An accompanying RPC-based "library server" would communicate with clients linked to the RPC-based calls already in use in our own client.
External Interactions Visitors and Industry Contacts
- 2/6/96 Hewlett Packard.
Jim Olson, General Manager, Video Communications Division. - 2/8/96 ACOM. Col. Jim Wirth, USAF
ACOM Advanced Concept Technology Demo project leader; Maj. Paul Gilles, USMC, assistant project leader. - 3/4/96 Corporation for Public Broadcasting.
Maria Borges. - 3/6/96 Hewlett Packard Computer Research Center.
Dick Lampman, CRC Director; Denny Georg, Director of the Computer Systems Lab of CRC; Gary Herman, Director of the Broadband Information Systems Lab in CRC; Moise Zloof; Steven Rosenberg. - 3/7/96 CNN America Incorporated.
Frank Sesno, Washington News Bureau. - 3/7/96 DARPA Headquarters, DC.
Demo for members of Intelligence community. - 3/13/96 Heinz Technology & Learning Forum Advisory Board
Pittsburgh PA. - 3/15/96 Honeywell.
Garry Nordenstam, Manager, Training Technologies. - 3/21/96 Lawrence Livermore National Laboratory
Advanced Video Research Group. - 3/96 Digital Equipment Corporation
Video Server Technology Group. - 4/4/96 Office of Naval Research.
Cmdr. Timothy D. Warren, Director, Automated Information Systems; William E. Smith, Sr. Research Physicist, Neural Network Development Lab. - 4/8/96 Pixar.
Ralph Guggenheim, Vice President, Feature Animation. - 4/11/96 NBC News, News Archives.
Dr. Richard S. Alben, Business Interface Planning Manager, GE Corporate R&D. NBC Interactive Media - Edmond P. Sanctis, Sr. Vice President & Executive Producer; David Britton, Director of Production; Julie Buchholz, Director/Sr. Producer Interactive Programming; Mark Kortekaas, Director Technical Operations; Eric F. Pohl, Principal Engineer, Recording Systems. - 4/26/96 Global Field Consortium
Daimler Benz - Dieter Hege, Vice President, IT-Infrastructures; Xerox - Shirley Edwards, Malcolm Goslee. - 4/96 Princeton Video Library project.
Wayne Wolf, PI.
Public Presentations and Conference Papers
- 2/18/96 "Informedia: Mews-on-Demand Experiments in Speech Recognition," H. Wactlar, A. Hauptmann, M. Witbrock. In Proceedings of DARPA Speech Recognition Workshop. Arden House, Harriman, NY.
- 2/96 "Immersion into Visual Media: New Applications of Image Understanding," T. Kanade. In IEEE Expert Intelligent Systems and Their Applications, Vol. 11, No. 1, 1996, IEEE Computer Society, pp. 73-80.
- 3/96 "Video Skimming for Quick Browsing Based on Audio and Image Characterization," M. Smith, T. Kanade. In Proceedings of The Second Technical Conference on Telecommunications R&D in Massachusetts.
- 4/4/96 "Informedia Digital Video Library," H. Wactlar. Presentation and demonstration at University of Pittsburgh School of Library and Information Science.
- 4/29/96 "New and Emerging Technologies: What the Future will Bring," H. Wactlar. Presented at Ernest L. Boyer Technology Summit for Educators, sponsored by the Corporation for Public Broadcasting.
I certify that to the best of my knowledge (1) the statements herein (excluding scientific hypotheses and scientific opinions) are true andcomplete, and (2) the text and graphics in this report as well as any accompanying publications or other documents, unless otherwise indicated, are the original work of the signatories or individuals working under their supervision. I understand that the willful provision of false information or concealing a material fact in this report(s) or any other communication submitted to NSF is a criminal offense (U.S. Code, Title 18, Section 1011). Howard D. Wactlar Project Director 08/02/96 |