Collective Intelligence in Action
Satnam Alag
  • October 2008
  • ISBN 9781933988313
  • 424 pages
  • printed in black & white

Use these CI techniques to extract valuable data from your applications.

FROM THE FOREWORD by Richard MacManus, ReadWriteWeb

There's a great deal of wisdom in a crowd, but how do you listen to a thousand people talking at once? Identifying the wants, needs, and knowledge of internet users can be like listening to a mob.

In the Web 2.0 era, leveraging the collective power of user contributions, interactions, and feedback is the key to market dominance. A new category of powerful programming techniques lets you discover the patterns, inter-relationships, and individual profiles—the collective intelligence—locked in the data people leave behind as they surf websites, post blogs, and interact with other users.

Collective Intelligence in Action is a hands-on guidebook for implementing collective-intelligence concepts using Java. It is the first Java-based book to emphasize the underlying algorithms and technical implementation of vital data gathering and mining techniques like analyzing trends, discovering relationships, and making predictions. It provides a pragmatic approach to personalization by combining content-based analysis with collaborative approaches.

Table of Contents detailed table of contents




about this book

Part 1 Gathering data for intelligence

1. Understanding collective intelligence

1.1. What is collective intelligence?

1.2. CI in web applications

1.3. Classifying intelligence

1.4. Summary

1.5. Resources

2. Learning from user interactions

2.1. Architecture for applying intelligence

2.2. Basics of algorithms for applying CI

2.3. Forms of user interaction

2.4. Converting user interaction into collective intelligence

2.5. Summary

2.6. Resources

3. Extracting intelligence from tags

3.1. Introduction to tagging

3.2. How to leverage tags

3.3. Extracting intelligence from user tagging: an example

3.4. Scalable persistence architecture for tagging

3.5. Building tag clouds

3.6. Finding similar tags

3.7. Summary

3.8. Resources

4. Extracting intelligence from content

4.1. Content types and integration

4.3. Extracting intelligence step by step

4.4. Simple and composite content types

4.5. Summary

4.6. Resources

5. Searching the blogosphere

5.1. Introducing the blogosphere

5.2. Building a framework to search the blogosphere

5.3. Implementing the base classes

5.4. Integrating Technorati

5.5. Integrating Bloglines

5.6. Integrating providers using RSS

5.7. Summary

5.8. Resources

6. Intelligent web crawling

6.1. Introducing web crawling

6.2. Building an intelligent crawler step by step

6.3. Scalable crawling with Nutch

6.4. Summary

6.5. Resources

Part 2 Deriving intelligence

7. Data mining: process, toolkits, and standards

7.1. Core concepts of data mining

7.2. Using an open source data mining framework: WEKA

7.3. Standard data mining API: Java Data Mining (JDM)

7.4. Summary

7.5. Resources

8. Building a text analysis toolkit

8.1. Building the text analyzers

8.2. Building the text analysis infrastructure

8.3. Use cases for applying the framework

8.4. Summary

8.5. Resources

9. Discovering patterns with clustering

9.1. Clustering blog entries

9.2. Leveraging WEKA for clustering

9.3. Clustering using the JDM APIs

9.4. Summary

9.5. Resources

10. Making predictions

10.1. Classification fundamentals

10.2. Classifying blog entries using WEKA APIs

10.3. Regression fundamentals

10.4. Regression using WEKA

10.5. Classification and regression using JDM

10.6. Summary

10.7. Resources

Part 3 Applying intelligence in your application

11. Intelligent search

11.1. Search fundamentals

11.2. Indexing with Lucene

11.3. Searching with Lucene

11.4. Useful tools and frameworks

11.6. Summary

11.7. Resources 347

12. Building a recommendation engine

12.1. Recommendation engine fundamentals

12.2. Content-based analysis

12.3. Collaborative filtering

12.4. Real-world solutions

12.5. Summary

12.6. Resources


© 2014 Manning Publications Co.

About the book

Following a running example in which you harvest and use information from blogs, you learn to develop software that you can embed in your own applications. The code examples are immediately reusable and give the Java developer a working collective intelligence toolkit.

Along the way, you work with a number of APIs and open-source toolkits including text analysis and search using Lucene, web-crawling using Nutch, and applying machine learning algorithms using WEKA and the Java Data Mining (JDM) standard.

What's inside

  • Architecture for embedding intelligence in your application
  • Developing metadata about the user and content
  • Gather intelligence from tagging and build tag clouds
  • Introduction to intelligent web crawling and Nutch
  • Harvesting information from the blogosphere
  • Build a text analysis toolkit leveraging Lucene
  • Business intelligence and data mining for recommendations and promotions
  • Leveraging open-source data mining toolkit WEKA and the Java Data Mining (JDM) standard
  • Incorporating intelligent search in your application
  • Building a recommendation engine—finding related users and content
  • Real-world case studies of Amazon, Google News, and Netflix personalization.

About the reader

This book assumes you have a basic level of Java coding skills.

About the author

Satnam Alag, PhD, is currently the Vice President of Engineering at NextBio, a vertical search engine and a Web 2.0 collaboration application for the life sciences community. He is a seasoned software professional with over fifteen years of experience in machine learning and over a decade of experience in commercial software development and management. Dr. Alag worked as a consultant with Johnson & Johnson's BabyCenter where he helped develop their personalization engine. Prior to that he was the Chief Software Architect at Rearden Commerce and began his career at GE R&D. He is a Sun Certified Enterprise Architect (SCEA) for the Java Platform. Dr. Alag earned his PhD in engineering from UC Berkeley and his dissertation was in the area of probabilistic reasoning and machine learning. He has published numerous peer-reviewed articles.

  • combo $44.99 pBook + eBook
  • eBook $35.99 pdf + ePub + kindle

FREE domestic shipping on three or more pBooks

It's technical, it's theoretical - but most importantly, it's practical.

Taran Rampersand,

Harness the untapped power of your imagination.

John Tyler, UBS Investment Bank

Learn practical, hands-on, machine learning.

Robi Sen, Twin Technologies

This is the right book on collective intelligence. I wish I'd had it a few years ago.

Jéröme Bernard, Elastic Grid LLC

I recommend this book for any developer of social networking sites.

Sopan Shewale, TWIKI.NET - Enterprise WIKI