Use these CI techniques to extract valuable data from your applications.
There's a great deal of wisdom in a crowd, but how do you listen to a thousand people talking at once? Identifying the wants, needs, and knowledge of internet users can be like listening to a mob.
In the Web 2.0 era, leveraging the collective power of user contributions, interactions, and feedback is the key to market dominance. A new category of powerful programming techniques lets you discover the patterns, inter-relationships, and individual profiles—the collective intelligence—locked in the data people leave behind as they surf websites, post blogs, and interact with other users.
Collective Intelligence in Action is a hands-on guidebook for implementing collective-intelligence concepts using Java. It is the first Java-based book to emphasize the underlying algorithms and technical implementation of vital data gathering and mining techniques like analyzing trends, discovering relationships, and making predictions. It provides a pragmatic approach to personalization by combining content-based analysis with collaborative approaches.
about this book
Part 1 Gathering data for intelligence
1. Understanding collective intelligence
1.1. What is collective intelligence?
1.2. CI in web applications
1.3. Classifying intelligence
2. Learning from user interactions
2.1. Architecture for applying intelligence
2.2. Basics of algorithms for applying CI
2.3. Forms of user interaction
2.4. Converting user interaction into collective intelligence
3. Extracting intelligence from tags
3.1. Introduction to tagging
3.2. How to leverage tags
3.3. Extracting intelligence from user tagging: an example
3.4. Scalable persistence architecture for tagging
3.5. Building tag clouds
3.6. Finding similar tags
4. Extracting intelligence from content
4.1. Content types and integration
4.2. The main CI-related content types
4.3. Extracting intelligence step by step
4.4. Simple and composite content types
5. Searching the blogosphere
5.1. Introducing the blogosphere
5.2. Building a framework to search the blogosphere
5.3. Implementing the base classes
5.4. Integrating Technorati
5.5. Integrating Bloglines
5.6. Integrating providers using RSS
6. Intelligent web crawling
6.1. Introducing web crawling
6.2. Building an intelligent crawler step by step
6.3. Scalable crawling with Nutch
Part 2 Deriving intelligence
7. Data mining: process, toolkits, and standards
7.1. Core concepts of data mining
7.2. Using an open source data mining framework: WEKA
7.3. Standard data mining API: Java Data Mining (JDM)
8. Building a text analysis toolkit
8.1. Building the text analyzers
8.2. Building the text analysis infrastructure
8.3. Use cases for applying the framework
9. Discovering patterns with clustering
9.1. Clustering blog entries
9.2. Leveraging WEKA for clustering
9.3. Clustering using the JDM APIs
10. Making predictions
10.1. Classification fundamentals
10.2. Classifying blog entries using WEKA APIs
10.3. Regression fundamentals
10.4. Regression using WEKA
10.5. Classification and regression using JDM
Part 3 Applying intelligence in your application
11. Intelligent search
11.1. Search fundamentals
11.2. Indexing with Lucene
11.3. Searching with Lucene
11.4. Useful tools and frameworks
11.5. Approaches to intelligent search
11.7. Resources 347
12. Building a recommendation engine
12.1. Recommendation engine fundamentals
12.2. Content-based analysis
12.3. Collaborative filtering
12.4. Real-world solutions
© 2014 Manning Publications Co.
About the book
Following a running example in which you harvest and use information from blogs, you learn to develop software that you can embed in your own applications. The code examples are immediately reusable and give the Java developer a working collective intelligence toolkit.
Along the way, you work with a number of APIs and open-source toolkits including text analysis and search using Lucene, web-crawling using Nutch, and applying machine learning algorithms using WEKA and the Java Data Mining (JDM) standard.
- Architecture for embedding intelligence in your application
- Developing metadata about the user and content
- Gather intelligence from tagging and build tag clouds
- Introduction to intelligent web crawling and Nutch
- Harvesting information from the blogosphere
- Build a text analysis toolkit leveraging Lucene
- Business intelligence and data mining for recommendations and promotions
- Leveraging open-source data mining toolkit WEKA and the Java Data Mining (JDM) standard
- Incorporating intelligent search in your application
- Building a recommendation engine—finding related users and content
- Real-world case studies of Amazon, Google News, and Netflix personalization.
About the author
Satnam Alag, PhD, is currently the Vice President of Engineering at NextBio, a vertical search engine and a Web 2.0 collaboration application for the life sciences community. He is a seasoned software professional with over fifteen years of experience in machine learning and over a decade of experience in commercial software development and management. Dr. Alag worked as a consultant with Johnson & Johnson's BabyCenter where he helped develop their personalization engine. Prior to that he was the Chief Software Architect at Rearden Commerce and began his career at GE R&D. He is a Sun Certified Enterprise Architect (SCEA) for the Java Platform. Dr. Alag earned his PhD in engineering from UC Berkeley and his dissertation was in the area of probabilistic reasoning and machine learning. He has published numerous peer-reviewed articles.
It's technical, it's theoretical - but most importantly, it's practical.
Harness the untapped power of your imagination.
Learn practical, hands-on, machine learning.
This is the right book on collective intelligence. I wish I'd had it a few years ago.
I recommend this book for any developer of social networking sites.