Notes
Outline
** Return to Informedia DL Colloq Site
____________
"Design and Evaluation Challenges"
Design and Evaluation Challenges
2004 Digital Library Colloquium Series
University of Pittsburgh-Carnegie Mellon University
April 16, 2004
Preview
Interplay between basic research; system development and evaluation; system operation and sustainability
Overview of Open Video DL as a system
Focus on user studies that have informed redesign and future systems, contributed to our understanding of how people make sense of video
Top View
Digital video a burgeoning DL challenge
Substantial activity on storage, retrieval
Many large-scale DLs
InforMedia, Fischlar, ECHO, Internet Archive Prelinger Collection, Open Video
Most attention on system/collection building
Commercial attention on system and management
IBM, MERL, Microsoft, Artesia, Virage
NIST TREC Video Track for retrieval evaluation
Crucial need for evaluation that includes human factors
Open Video Vision/Contributions
An open repository of video files that can be re-used in a variety of ways by the education and research communities
Encourages contributions
A testbed for interactive interfaces
An easy to use DL based upon the agile views interface design framework
Multiple, cascading, easy to control views (pre, over, re, shared, peripheral)
Views based upon empirically validated surrogates
An environment for building theory of human information interaction
A set of methods and metrics that reveal how people understand digital video through surrogates
Background & Status
Begun 1995 with colleagues at UMD & BCPS
Current funding: NSF# IIS-0099538
Collaborators/Contributors: I2-DSI, ibiblio, CMU, UMD, NIST, Internet Archive, NASA, CHI community
~2000+ video segments
~1400 different titles
~24000 unique visitors per month (March 04)
~3,000,000 hits/month (March 04)
I2-DSI video channel
MPEG-1, MPEG-2, MPEG-4, QT
OAI provider
Ongoing user studies
Open Video As a System
Slide 7
Backend Tools and Services
Workstations, servers, disk arrays
Tape players (VHS, Beta SP, PAL), digitization boards (e.g., Broadway), and software for AVI/MOV to MPEG-1, MPEG-2, and QuickTime (Media Cleaner, Adobe Premier, Final Cut Pro)
Bandwidth (UNC-CH switched ethernet)
Linux OS, PHP scripting language, MySQL DBMS, Apache server
Backend Tools and Services (cont’)
Merit (UMCP UMIACS), ported to Linux to extract candidate keyframes
Speech to text (e.g., Sphinx at CMU)
VAST keyframe/posterframe extraction, selection, and management
Transaction logs and scripts (for evaluation and for recommenders)
Peer to peer exchange
ISEE (shared remote video use, e.g., DE)
Indexer workstation (VIVO)
Tools and Services for User Studies
Database driven web pages for user interaction
Usability workstation (multiple camera, mixer, VCR)
eye tracking system
Speech synthesis (for audio keywords)
Java and Perl scripts for managing, moving files, managing server (security, upgrades, etc.)
Slide 11
Slide 12
Slide 13
Slide 14
Slide 15
Slide 16
Agile Views Interface Research
Provide a variety of access representations (e.g., indexes) and control mechanisms
Usual search and browse capabilities
Leverage both visual and linguistic cues
Create and test surrogates for overview preview, shared and history views
Digital Video Surrogates
Classes
Textual
Visual
Audio
Cost benefit analysis: maximize ‘meaning’ per unit time
Transmission time
Compaction rate
Cognitive processing time
Performance vs. Preference
Research Framework
Surrogates Examined
Storyboard with text keywords (20-36 per board@ 500 ms)
Storyboard with audio keywords
Slide show with text keywords (250ms repeated once)
Slide show with audio keywords
Fast forwards 32X, 64X, 128X, 256X
Poster frames (1-3)
Real time clips/excerpts (7 sec)
Text
Visual features (e.g., in/out, people, etc.)
Surrogate Examples
Metrics
User Studies
Qualitative Comparison of Surrogates (Spring 02, ECDL 02)
Fast Forwards (Fall 02, JCDL 03)
Text or Pictures (Spring 03, CIVR 03)
Narrativity (CHI 02, ASIST 03)
Shared views and History Views (Geisler dissertation)
TREC evaluation (Spring/summer 03)
ViSOR (Gruss Master’s paper)
Look vs Read (Hughes Master’s paper)
Current studies
Exploratory Study to Constraint Surrogate Design Space (Spring 02)
What are the strengths and weaknesses of different surrogates from the users’ perspective?
Are any of the surrogates better than the others in supporting user performance?
The Surrogates
Storyboard with text keywords (20-36 per board@ 500 ms)
Storyboard with audio keywords
Slide show with text keywords (250ms repeated once)
Slide show with audio keywords
Fast forward (~ 4X)
Method
7 video segments (2-10 min), 5 surrogates created for each
10 subjects with high video and computer experience
Three phases (all multi-camera videotaped)
View full video then use 3 surrogates, repeat
Participant observation and debriefing
Do NOT view full video, use 3 surrogates, repeat
Participant observation and debriefing
Complete 3 assigned tasks with surrogates of choice
Think aloud and debriefing
Tasks
Gist determination—free text
Gist determination—multiple choice
Object recognition—textual
Object recognition—graphical
Action recognition (2-3 second clips)
Visual gist (predict which frames belong)
Performance
No SRD on gist (both free text and multiple choice)
SRD on action recognition favoring ff
‘Near’ SRD on text object recognition favoring SB/w audio keywords
4:1 to 29:1 compaction rates suitable for tasks
Psychometric and face validity support for the tasks (means and variances; relevant to real tasks)
SRD in gist and visual gist for one video
àHomogeneity of frames diminishes surrogate value
àKeywords help when visual variability decreases
Qualitative Results
Subjects suggested different surrogates for different tasks (e.g., ff for judging kid safe, sb for identifying images, ff for video styles)
Three senses of gist
Topic (T)
Narrativity (N)
T+N+visual style
Individual preferences and experiences influence surrogate effectiveness
Fast Forward Study (Fall 02)
How fast can we make fast forwards?
4 ff conditions (32X, 64X, 128X, 256X)
Four video segments for each condition
45 subjects (1/2 UG, 1/2 grad, 2/3 female)
6 tasks (full text gist, multiple choice gist, word object recognition, graphical object recognition, action recognition, visual gist)
Counterbalance speed and videos
Web-driven experimental condition, 3-camera video tapes, single subject at a time in usability laboratory
Sample A. 9:19 at 32X
Sample B. 19:48 at 64X
Sample C. 14:00 at 128X
Sample D. 14:09 at 256X
Example Image Recognition Stimulus
Results
SRD on 4 of 6 tasks as speed increases, however, reasonable performance at even the highest rate
Video content/genre interacts with performance
Preference does not parallel performance (people can perform well under extreme conditions but do not like/enjoy)
No user characteristic differences (age, sex)
àGive users control but select appropriate defaults
Caveat: controlled, independent focus on FF, likely a lower bound on performance
Speed Effects on Performance
Text or Pictures? (Spring 03)
Research Questions:
Given both textual and visual metadata; which surrogate will be utilized, which surrogate will be preferred?
Does the placement of the surrogates affect how they are used?
Does the assigned task affect how surrogates are used?
Does personal preference play a role in how surrogates are used?
Study Methods / Procedures
12 undergraduate students (paid volunteers)
Pre-Study questionnaire
Demographics
Visual vs. Verbal learning style (VVQ)
10 search problems
Counter-balanced
Design 1 and 2
1 : text on left / visuals on right
2 : visuals on left / text on right
Eyetracking
Post-study questionnaire
Follow up questions
Results
All participants over all tasks:
Mean time looking at text = 29.7 sec.
Mean time looking at pics = 6.8 sec.
75% of fixations over text
18% of fixations over pics
First fixations over text = 65
First fixations over pics = 54
Text requires and gets more user attention
Results cont’d
Design 1 vs. Design 2
When text was placed on the left, mean time per fixation was slightly higher
VVQ
Balanced group spent more time looking at text
Tasks
Varied by task:
Time spent looking at text
Time spent per fixation over text
Frequency of fixations over text
Screen Shots
Screen Shots
Screen Shots
Tasks
Please find a video that discusses the destruction earthquakes can do to buildings. These search results are from a search on the word “Earthquake”.
Please find a video that discusses nurses and their contributions to the United States Army.  These search results are from a search on the word “Work”.
Please choose a video from the following list that you think would be
entertainting for you and your friends to watch.
Discussion
In this restricted situation (i.e. pre-formulated results page) participants used text as the main anchor point
? Because text is a better surrogate?
? Because text contains more information?
? Because text is more familiar to people
? Because tasks directed users to text?
Text or Pictures?
Text was reported as:
Being the search anchor
Containing significant topical information
Taking longer to read than pictures
Visuals were reported as:
Being globally liked
Being used to quickly narrow down choices
Taking less time to decode than text
All participants said the results page would be weaker without them
Often lacking in reference points
Conclusion
Visual metadata was used to make (confirm???) relevance judgments
Combination of visual & verbal stronger than one or the other
Generalize with caution:
Small number of study participants
Specific set of search results pages
Ten specific search tasks.
Narrativity Study (CHI 02)
CHI walk up kiosk, 20 people used
20 one-minute clips (half b&w, no audio) selected on 2 criteria: contain characters, have cause/effect relations between scenes (5 in each category)
SRD on chars, cause, and interaction
Shared Views and History Views Studies (02-03)
Evaluate AV Design Framework by instantiating and evaluating a design
Shared (based on recommendations) and History Views (based on logs)
Phase 1: compare OV to Views interface (28 participants).  OV>accuracy; NSRD on time, but learning effect; AV>navigation/efficiency; AV>satisfaction
Phase 2: qualitative analysis of shared and history views
VisOR study (Fall 03)
Interface effects of automatically extracted features (TREC 02 features); 17 subjects each doing 14 search tasks
Sliders to adjust weights of different features did not affect performance
Keywords, indoors/outdoors and cityscape/landscape most useful
Use of color and brightness helped with exact match searches
General satisfaction with using different features
Look vs Read Study (Sp 03)
Twelve subjects think aloud while viewing results pages for five search tasks with text (titles, descriptions) or visual (3 keyframes, storyboard) surrogates
Surrogates used differently depending on task; neither primary with considerable switching and combining (e.g., find airplane, most used visual first)
Time a factor in deciding which to use and when
TREC 03 Study
Compare transcript only, feature only, and combined surrogates with 36 subjects
NSRD in precision across 3 surrogates, transcript only and combined yielded SR higher recall in less time and SR greater satisfaction results.
Current Studies
Relative value of surrogates in context
Four sets of surrogates (ff, sb, excerpt, combined) compared  (Spring 04)
Mu dissertation: cognitive load effects on collaborative learning with video (ISEE) Investigation of tasks
Yang dissertation: how do people make relevance judgments about video?
Take Away Summary
User studies inform good design
Give people multiple views and easy control mechanisms
No silver bullets (many factors determine performance and preference)
Video offers new kinds of potentials for learning and communication
Slide 56