Algorithms of the Intelligent Web
Haralambos Marmanis and Dmitry Babenko
  • May 2009
  • ISBN 9781933988665
  • 368 pages
  • printed in black & white

Unequivocally outstanding--this is the best technical book I have read all year.

Robert Hanson, Quality Technology Services


Algorithms of the Intelligent Web, Second Edition is now available. An eBook of this older edition is included at no additional cost when you buy the revised edition!

A limited number of pBook copies of this edition are still available. Please contact Manning Support to inquire about purchasing previous edition copies.

Web 2.0 applications are best known for providing a rich user experience, but the parts you can't see are just as important—and impressive. Many Web 2.0 applications use powerful techniques to process information intelligently and offer features based on patterns and relationships in the data that couldn't be discovered manually. Successful examples of these Algorithms of the Intelligent Web include household names like Google Ad Sense, Netflix, and Amazon. These applications use the internet as a platform that not only gathers data at an ever-increasing pace but also systematically transforms the raw data into actionable information.

Algorithms of the Intelligent Web is an example-driven blueprint for creating applications that collect, analyze, and act on the massive quantities of data users leave in their wake as they use the web. You'll learn how to build Amazon- and Netflix-style recommendation engines, and how the same techniques apply to people matches on social-networking sites. See how click-trace analysis can result in smarter ad rotations. With a plethora of examples and extensive detail, this book shows you how to build Web 2.0 applications that are as smart as your users.

Table of Contents detailed table of contents



about this book

1. What is the intelligent web?

1.1. Examples of intelligent web applications

1.2. Basic elements of intelligent applications

1.3. What applications can benefit from intelligence?

1.4. How can I build intelligence in my own application?

1.5. Machine learning, data mining, and all that

1.6. Eight fallacies of intelligent applications

1.7. Summary

1.8. References

2. Searching

2.1. Searching with Lucene

2.2. Why search beyond indexing?

2.4. Improving search results based on user clicks

2.6. Large-scale implementation issues

2.7. Is what you got what you want? Precision and recall

2.8. Summary

2.9. To do

2.10. References

3. Creating suggestions and recommendations

3.1. An online music store: the basic concepts

3.2. How do recommendation engines work?

3.3. Recommending friends, articles, and news stories

3.4. Recommending movies on a site such as

3.5. Large-scale implementation and evaluation issues

3.6. Summary

3.7. To Do

3.8. References

4. Clustering: grouping things together

4.1. The need for clustering

4.2. An overview of clustering algorithms

4.4. The k-means algorithm


4.7. Clustering issues in very large datasets

4.8. Summary

4.9. To Do

4.10. References

5. Classification: placing things where they belong

5.1. The need for classification

5.2. An overview of classifiers

5.3. Automatic categorization of emails and spam filtering

5.4. Fraud detection with neural networks

5.5. Are your results credible?

5.6. Classification with very large datasets

5.7. Summary

5.8. To do

5.9. References

6. Combining classifiers

6.1. Credit worthiness: a case study for combining classifiers

6.2. Credit evaluation with a single classifier

6.3. Comparing multiple classifiers on the same data

6.4. Bagging: bootstrap aggregating

6.5. Boosting: an iterative improvement approach

6.6. Summary

6.7. To Do

6.8. References

7. Putting it all together: an intelligent news portal

7.1. An overview of the functionality

7.2. Getting and cleansing content

7.3. Searching for news stories

7.4. Assigning news categories

7.5. Building news groups with the NewsProcessor class

7.6. Dynamic content based on the user’s ratings

7.7. Summary

7.8. To do

7.9. References

Appendix A: Introduction to BeanShell

Appendix B: Web crawling

Appendix C: Mathematical refresher

Appendix D: Natural language processing

Appendix E: Neural networks


About the book

As you work through the book's many examples, you'll learn about recommendation systems, search and ranking, automatic grouping of similar objects, classification of objects, forecasting models, and autonomous agents. You'll also become familiar with a large number of open-source libraries and SDKs, and freely available APIs from the hottest sites on the internet, such as Facebook, Google, eBay, and Yahoo.

What's inside

  • How to create recommendations just like those on Netflix and Amazon
  • How to implement Google's Pagerank algorithm
  • How to discover matches on social-networking sites
  • How to organize the discussions on your favorite news group
  • How to select topics of interest from shared bookmarks
  • How to leverage user clicks
  • How to categorize emails based on their content
  • How to build applications that do targeted advertising
  • How to implement fraud detection

About the reader

To get the most from this book, you should have a good foundation in Java programming and a general understanding of internet technology.

About the author

Dr. Haralambos (Babis) Marmanis is a pioneer in the adoption of machine learning techniques for industrial solutions, and also a world expert in supply management. He has about twenty years of experience in developing professional software. Currently, he is the director of R&D and chief architect, for expense management solutions, at Emptoris, Inc. Babis holds a Ph.D. in applied mathematics from Brown University, an M.S. degree in theoretical and applied mechanics from the University of Illinois at Urbana-Champaign, and B.S. and M.S. degrees in civil engineering from the Aristotle University of Thessaloniki in Greece. He was the recipient of the Sigma Xi award for innovative research in 2000, and he is the author of numerous publications in peer-reviewed international scientific journals, conferences, and technical periodicals.

Dmitry Babenko is the lead for the data warehouse infrastructure at Emptoris, Inc. He is a software engineer and architect with 13 years of experience in the IT industry. He has designed and built a wide variety of applications and infrastructure frameworks for banking, insurance, supply-chain management, and business intelligence companies. He received a M.S. degree in computer science from Belarussian State University of Informatics and Radioelectronics.

You don't need a PhD to build an intelligent website--pick up this book instead.

Ajay Bhandari,

Very useful...will bring you up to speed quickly.

Sumit Pal, LeapFrogrx

Excellent...perfect blend of theory and practice.

Carlton Gibson, Noumenal Software

Unlock the future of the web by analyzing what we know today!

Eric Swanson, AAA