The Newest AI Computing Tool: People

07/06/07
The problem of categorizing a fast-growing collection of documents on the Web may be solved by the individuals who use it on a daily basis, says a USC Viterbi computer scientist.
By Eric Mankin
ISI computer scientist Kristina Lerman shared her findings at a recent Stanford symposium.

Photo/Eric Mankin
A USC Information Sciences Institute researcher thinks she has found a new source of artificial intelligence computing power to solve difficult IT problems regarding information classification, reliability and meaning.

The tool, according to ISI computer scientist Kristina Lerman, is people: human intelligence at work on the social web, the network of blogs, photo and video-sharing sites, and other meeting places now involving hundreds of thousands of individuals recording observations and sharing opinions and information on a daily basis.

Lerman shared her recent work with others in the growing field of social information processing during a symposium at Stanford University.

She said that extracting “metadata” about transactions who is talking to whom, who is listening, how conclusions are reached and how they spread can help researchers deal with problems regarding documents: their accuracy, quality, categorization and terminology.

One benefit, according to Lerman, a research assistant professor in computer science at the USC Viterbi School of Engineering, is automatic determination of the semantics of content from one kind of metadata: tags.

Tags play a crucial role in a long-running project called the Semantic Web.

For nearly decade, she noted, researchers sought a way to organize data so that someone searching for a specific kind of “check” would not have to weed out unwanted references to symbols, verification procedures, financial documents and political science theories.

Tagging seeks to eliminate ambiguities by affixing “tags,” computer labels peeling apart the multiple meanings of ordinary language into indicators of meaning used to guide computer searches.

But with natural language as complex as it is, making sense of tags is not easy. Attempts to manually attack the vocabulary and build in the intricate interconnections that signal different word meanings have proved to be frustrating.

Lerman hopes she’s onto another way as hundreds of thousands of users are now online, chattering away on all kinds of topics. This volume of directed discourse provides a new way to extract meaning from tags statistical models.

The process has been called folksonomy, or informal classification system. Unlike the traditional approach to the Semantic Web, in which a few knowledgeable professionals attempt to agree on a formal classification system which then will be used to annotate data, folksonomy emerges from collective tagging activities of many individuals.

New Web sites aimed at sharing information such as del.icio.us and Flickr organically grow ways for site members to access each other’s holdings. Typically, the members themselves spontaneously create a tagging system, encouraged by the site’s architecture.

The tags emerging from such systems, Lerman and collaborators have found, can be used for broader purposes.

One of Lerman’s initial tagging investigations used the photo-sharing site Flickr, analyzing results returned by a request for images of beetles, which included pictures of insects, Volkswagens and other entries.

By extracting the tags that Flickr users had described the images with and applying a mathematical technique called the expectation-maximization (EM) algorithm, Lerman found it possible to separate pictures of insects from pictures of cars returned by the “beetle” search.

Lerman has gone beyond tagging, using metadata to acquire more and more accurate information about the content of documents in social networking situations.

“The rise of the social media sites such as blogs, wikis, Digg and Flickr, among others, underscores the transformation of the Web to a participatory medium in which users are collaboratively creating, evaluating and distributing information,” wrote Lerman in a recent paper accepted for publication in Internet Computing, a journal produced by the Institute of Electrical and Electronics Engineers Inc.

“The innovations introduced by social media have led to a new paradigm for interacting with information, what we call ‘social information processing,’ ” she wrote.

In the paper, titled “Social Information Processing in Social News Aggregation,” Lerman showed by tracking stories over time “that social networks play an important role in document recommendation.” In addition to providing a platform for document recommendation, the social Web enables researchers to study collective user behavior quantitatively.

Lerman’s collaborators included ISI graduate students Anon Plangprasopchok and Chio Wong. The research was supported by grants from the National Science Foundation and the Defense Advanced Research Projects Agency.