Accomplishments
Content-searchable speech transcribed broadcast news stories
High-accuracy, fast speech recognition with CMU’s Sphinx speech recognition system
- 2-3xRT, 34% word error rate on broadcast news stories
- 19% WER with “LM du jour” from news web sources
Query by language, image and maps
Multilingual integration (Spanish, Serb-Croatian corpora)
- Query translation and search in target language; English language topics and rough translations
Geo-coded content with map display
Face detection and search
Video OCR index and search
Text summaries through topics and titles
Visual summaries through key-icons, filmstrips and dynamic video skims
Visualization and manipulation of very large data set results
Spoken or typed annotation which are immediately searchable
Cut-and-paste movie clips for briefing preparation (PowerPoint or HTML)
Large, extensible SQL database
- Over 1500 hours w/ 40000 news segments dating back to 1996 - 1.5 Terabyte of data
Daily processing 2xRT to first searchable data in the database
Empirically validated user interface design
Full information retrieval capabilities:
- Boolean or vector-space search engine, with stop words, synonyms, stemming & field-specific search