IBM is betting $100 million that it can whoop Google in the search engine realm.
Its new project, dubbed WebFoundation, is an intelligent database that can separate
the wheat from the chaff on the Internet. Its algorithms check for accuracy and
truthfulness, popularity, translates languages, compares prices, tracks chat rooms
and more, using a cluster of 30 dual Xenon processors and 160 TBytes of disk storage.
IBM plans to sell data like what your company's public reputation is, as gleaned
from newspapers, TV, radio transcripts, magazines, etc., for around $150k. A commercial
service, Factavia, will launch WebFoundation's capabilities in mid-2004. What has
it learned so far? 30% of the web is porn, and 30% is duplicated data. There are
50 M new or changed pages every day, and 65% of web pages are now written in English,
but by 2010, English will be a minority.