![]() |
||||||||||||||||||||||||||||||
|
Carnegie Mellon University School of Computer Science 5000 Forbes Avenue Pittsburgh, PA 15213 informedia@cs.cmu.edu |
||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
Project Description This project began in 1997 and was completed in September 2000. The Informedia Experience-on-Demand Project (EOD) develops tools, techniques and systems allowing people to capture a record of their experiences unobtrusively, and share them in collaborative settings spanning both time and space.Users may range from rescue workers carrying personalized information systems in operational situations to remote crisis managers in coordinating roles. Personal EOD units record audio, video, Global Positioning System (GPS) spatial information, and other sensory data, which can be annotated by human participants. The EOD environment synthesizes data from many EOD units into a "collective experience" - a global perspective of ongoing and archived personal experiences. Distributed collaborators can be brought together over time and space to share meaning and perspectives.
The foundation for this work, the Informedia Digital Video Library Project [Christel et al. 1996], has demonstrated the applicability of speech, image, and natural language processing in automatically creating a rich, searchable multimedia information resource holding over 1000 hours of video. We have built a prototype EOD system that builds on these technologies by addressing continuously captured, unstructured, unedited video in which location data is added as another information dimension.This system will be discussed in the context of gathering personal experiences in the Pittsburgh, PA area, querying and displaying those results geographically, and combining multiple views into common perspectives. Multimedia Personal ExperiencesWe have experimented with portable microphones, cameras and GPS units wired into a wearable vest and a hat for capturing audio and video with corresponding time and location in the field. We have collected over 40 hours for our initial trials. While we have not focused on improving the form factor of the recording devices making them less cumbersome and visible - we have instead concentrated on determining whether the quality of the resulting audio and video was suitable for subsequent automatic processing. We found that microphone type and placement with respect to the speaker greatly affected the accuracy of follow-up speech recognition, with usable results of 10% word error rate or less capable from mobile talkers. Through the use of high accuracy GPS and digital cameras we could reliably recreate panoramic views from various perspectives. Field experiences captured in audio and video as a form of personal memory are hence suitable for subsequent processing and use in collaborative settings.
Similarly, we are modifying Informedia image processing modules to better work with field-captured motion video. Our current object detectors, for recognizing and matching faces and overlaid text, work well on broadcast news given certain assumptions, such as a well-lit face looking directly at the camera.These assumptions are less likely to be met with field video, and so we are investigating more robust techniques for object detection within video having varying shades of lighting and where the object of interest may appear at varying resolutions. One example is a face detector that will recognize profiles as well as full frontal shots of faces [Schneiderman and Kanade 1998]. A longer-term goal is to extend these image processing modules so that detection is coherent over time, enabling object tracking. Synthesis of Personal Experience Data
A longer term goal is to enable the user to direct which interesting events should be automatically flagged in the data as requiring further inspection. Gong and his colleagues at Singapore have developed a Scene Description Language (SDL) for video [Gong et al. 1995]. The SDL can be extended into an advanced experience parser. A user directs the image processing analysis via the SDL: which video features to be focused (such as images of flying birds), the events to be tracked (period when a bird is in a particular area), and the changes to be monitored (when a bird enters the field of view for a particular EOD unit).When user-defined significant events are detected, alerts are communicated across the EOD environment.
The Informedia Digital Video Library Project defined numerous abstractions for structured broadcast-quality video, including text titles, thumbnail images, filmstrips, and skims [Wactlar et al. 1996, Christel et al. 1998]. These abstractions are now being extended to address the unbounded continuous nature of experience video, and to move from words as the primary information and indexing source to audio/image interdependence. Our goal for EOD interfaces is to allow the information to be quickly and effectively accessed, queried, viewed, abstracted, navigated, summarized, and annotated along dimensions of time, space, and user perspective. ConclusionExperience-on-Demand addresses collaboration and summarization of multiple simultaneous information generators integrated across people, time, and space. A wealth of information will be collected through the Informedia EOD environment. This data, collectively referred to as a "personal experience," has potential value in the following situations:
The data is by nature voluminous in size yet sparse in information content, with tremendous redundancy along the temporal and spatial dimensions and across points of view. To help analysts rapidly and effectively derive the relevant meaning from this large body of data, the EOD environment is being developed to support intelligent information analysis, organization, and manipulation techniques. AcknowledgementsThis material is based on work supported by the Defense Advanced Research Projects Agency and NRaD under contract number N66001-97-C-8517.The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of DARPA, the Office of Naval Research, or the U.S. Government. Special thanks go to Bryan Maher, Mark Dambacher and Ricky Houghton for their efforts with the initial EOD trials. ReferencesCHRISTEL, M., STEVENS, S., KANADE, T., MAULDIN, M., REDDY, R., and WACTLAR, H. Techniques for the Creation and Exploration of Digital Video Libraries. Multimedia Tools and Applications, B. Furht, ed. Boston, MA: Kluwer Academic Publishers, 1996, Chapter 8. CHRISTEL, M., SMITH, M., TAYLOR, C.R. and WINKLER, D. Evolving Video Skims into Useful Multimedia Abstractions. Proc. ACM CHI '98 (April 1998), 171-178 FURNAS, G. and BEDERSON, B. Space-Scale Diagrams: Understanding Multiscale Interfaces. Proc. ACM CHI '95 (May 1995), 234-241. GONG, Y., SIN, L.T., CHUAN, H.C., ZHANG, H.J., and SAKAUCHI, M. Automatic Parsing of TV Soccer Programs. Proc. 2nd IEEE Conf. Multimedia Computing and Systems (May 1995), 167-174. ROTH, S., LUCAS, P., SENN, J., GOMBERG, C., BURKS, M., STROFFOLINO, P., KOLOJEJCHICK, J., and DUNMIRE, C. Visage: A User Interface Environment for Exploring Information. Proc. IEEE Information Visualization (October 1996), 3-12. SCHAFFER, D., ZUO, Z., GREENBERG, S., BARTRAM, L., DILL, J., DUBS, S., and ROSEMAN, M. Navigating Hierarchically Clustered Networks through Fisheye and Full-Zoom Methods. ACM Trans Computer-Human Interaction 3 (1996), 162-188. SCHNEIDERMAN, H., and KANADE, T. Probabilistic Modeling of Local Appearance and Spatial Relationships for Object Recognition. Proc. IEEE Computer Vision and Pattern Recognition (CVPR) (June, 1998), WACTLAR, H., KANADE, T., SMITH, M., and STEVENS, S. Intelligent Access to Digital Video: Informedia Project. IEEE Computer 29, 5 (1996), 46-52. WITBROCK, M. and HAUPTMANN, A. Using words and phonetic strings for efficient information retrieval from imperfectly transcribed spoken documents. Proc. ACM Digital Libraries '97 (July 1997), 30-35. |
||||||||||||||||||||||||||||||