Notes
Outline
** Return to Informedia DL Colloq Site
____________
The rise of Cultural Informatics
Gregory Crane
Professor of Classics
Winnick Family Chair of Technology and Entrepreneurship
Perseus Project
Tufts University
Perseus Project
DL development 1987-
Ancient Greco-Roman Culture
DLI-2: “A Digital Library for the Hum.”
Up through early 20th century
Calculatedly disparate collections
Production: www.perseus.tufts.edu
9million pages/month, 85% Greco-Roman
Research: what characterizes cultural DLs?
Audience / Services / Content Model Triad
Cross-over: e.g. NSDL work
Cultural Informatics
Why not “Computational Humanities,” “Humanities Computing,” “Computing and the Humanities”?
Too confining
Textual and fine arts
Associations with canonical culture, esp. western
Cultural Informatics -- very broad
Challenging but important perspective
Cultural Informatics
Object of Study:
Geo-spatial open: All cultures of the world
Temporally open: cultures as evolving process
Past, present and future
Goals:
Analysis of cultures
Communication between cultures
Fundamental to world peace and prosperity
Cultural Informatics
Two domains of study
Knowledge as end product
Wissenschaft /Germanic model
Knowledge as means to an end …
But what end?
British tradition: Character formation
Traditional culture: cultivation of sensibilities
How important is cultural knowledge?

Is this a significant “end”?
Does culture matter?
Jerusalem
Kosovo
Baghdad
Mecca
Congo
Rwanda
Applications
Visualization:
Tracking anger against the US
(terrorism/national security)
Identifying cultural trends
(Marketing/trade)
Broad educational
Acquiring information, individual and comparative
Mapping Trends
Slide 10
Culture Matters!
F-Measures for Place Name Identification
Includes semantic classification and identification (Which Springfield)
Greco-Roman Sources: 95%
European Sources: 90%
US Sources: 80%!!!
Slide 12
Slide 13
Applications
Customized knowledge support
What info do readers A vs. B need?
Backgrounds, purposes etc.
Documents: what am I reading?
Objects: what is this thing?
Spaces: where am I moving?
Audiences
Tourists and visitors
Peace-keepers and ground forces
Business
Cultural knowledge is a means for … ?
What am I looking at?
Cambridge Civil War Monument (1870)
Linking to other data
City Directories
Regimental Histories
Period Maps
Old Photographs
Cultural Knowledge
Instantaneous responses shape experience
Can you look up the point of a joke?
Externalized and objectified knowledge
FOUNDATION of modern society
Internalized and personalized knowledge
FOUNDATION of human consciousness
Cultural Knowledge
Internalized, evolving, and dynamic
Requires intensive knowledge acquisition
Cf. Running, finger exercises, practice, practice
Enhances every element of society
We see more where we look
We hear more when we listen
We feel more when we live
Cultural Knowledge
Localized and personal
Universal in human experience
Deeply intertwined with self and identity
Perceived as devalued in the West and US especially
This perception core security danger
Source for much (most?) strategic anti-US feeling
Cultural Informatics
Objective, scientific study of the processes whereby we promote enhanced subjective, emotional experiences in the world
Information --> knowledge --> wisdom
Fundamentally interdisciplinary
Complements humanities computing
Library Becomes Infrastructure
Moving through a neighborhood
When were these houses built? What is their style? Who lived here?
Moving thru an ecosystem
What are the plants/animals?
What systems are in play?
Answers to every quantifiable question delivered in real time on the spot
Reading in a Democratic Society
Continuation of reading revolution
1760-1830, before and after
Now requires a cultural informatics
Includes but transcends textual materials
What is the point of health and prosperity?
Emerson’s American Scholar in the 21st century
System Input
Quantitative data -- easiest
States self-organize into databases (“Seeing like a state”)
Linguistic data -- hard
Minimally dozens, if not hundreds
Varying level of documentation
Cultural data -- hardest
Language/Culture clusters: thousands+
Cultural Informatics begin
at the limits AND intersection of
manual analytic techniques
generic computational techniques
Cross-trained experts
Serve as connectors between specialists
Have intuitive understanding of not-yet-articulated possibilities from BOTH sides
Cultural Informatics
Aggregation and Visualization
Extraction from many examples
Quantified, targetted generalizations
Focus and customization
Start from a document/object/scene
Customized decision support
Yes/No decisions (~search)
Discursive analysis (~browsing)
How do we do it now? Or do we?
Players -- no real specialists
Faculty in higher education
Librarians
Think tanks
Intelligence Community
Broadcast media
Journalists and professional authors
How do we do it now? Or do we?
Computing and the Humanities
Focus on semi-passive analysis
Emphasis on publication
Social science & empirical data
How well do we work with heterogeneous data?
How well do we work with multiple languages?
Computer and Information Science
How far have we gone in document understanding?
Do we distinguish encyclopedic/semantic data?
How do we do it now? Or do we?
Cultural Grant Agencies: IMLS, NEA, NEH
Governmental libraries: LOC to public libs
Governmental museum/sites: SI, NPS
Intelligence agencies: CIA, NSA, etc.
NSF: SBE, experiments with DLI, ITR
What do we need to do?
Provide new kinds of training
Cultural Informatics
As self-standing discipline?
As new specialty in History/Anthro/classics etc.
As new specialty within Computer Science
As logical extension of Lib and Info Science
What do we need to do?
Core cultural informatics experts
50? Able to coordinate many different efforts
History/Information Science
Domain Specific experts
100s/1000s of experts in Area Studies/Lang Tech etc.
Build up to 100? Grad students/postdocs
Research support: $50m/year?
What do we need to do?
Create technological infrastructure
Broaden/expand the evaluation forums
More TREC/ACE/DUC/CLEF/SENSEVAL etc.
Build knowledge resources
Parallel corpora, lexica, portable heuristics
Focus on broad semantic as well as encyclopedic analysis
Homo ignavus (lat.) ~ “bad man” but …
What do we need to do?
First cut: 100 languages in five years
Allow $1,000,000/language $100m
US Knowledge Sources --> 1922 (Pub Dom)
City directories, Census,
Newspapers & Periodicals
Encyclopedias, school texts, manuals
Maps, gazetteers
Allow avg $1,000,000/year @ 300 years: $300m
What do we need to do?
World peace and prosperity are the goal
What US agencies do what?
IMLS, NEH, NEA, LOC, SI, NPS all have roles
But much work must be situated in NSF
Cultural informatics includes scientific and engineering research
NSF should, at the least, incubate these aspects of cultural informatics
How do we know we are there?
Can dynamically plot cultural states across the globe from dozens of language/culture combinations
Can support reading/spatial exploration/object analysis customized for many different categories of user