Tuesday, March 4, 2008

eTech - Practice Makes Perfect

Presentation from Peter Norvig from Google
How billions of examples lead to better models of images and text

How things are traditionally figured out.
Look at world, think about data, and figure out a model to express the world.
Problem is that this is hard and the model will be wrong.

Instead, let the data do the work.
Computing power is making it possible to make more complex algorithms because we can easily test bad algorithms to find the good ones. Example- image resizing

More data is also allowing this. Example- scene completion

For finding similar images, do a search based on keyword, see the images, user an algorithm to find similarities in photos. Use the Eigenface and SIFT features to find commonalities in images. Then rank the found images by what links to what, not on how often they are linked.

For text, grep the data to find words that are in proximity, or look in structured data, and use probabilistic models to guess the most probably answer. Example- Google Sets

Engineers later dropped the probabilistic model in favor for a liner model. They have moved away from something they can prove, and into something that can observe working.
They have optimized for translating news.

Bayesian: want argmaxc P(c|w), but model argmaxc P(w|c) P(c)

see: How to build a spell checker

No comments: