Table of Contents – Collective Intelligence in Action

Table of Contents

Copyright

Brief Table of Contents

Table of Contents

Foreword

Preface

Acknowledgments

About this book

1. Gathering data for intelligence

Chapter 1. Understanding collective intelligence

1.1. What is collective intelligence?

1.2. CI in web applications

1.2.1. Collective intelligence from the ground up: a sample application

1.2.2. Benefits of collective intelligence

1.2.3. CI is the core component of Web 2.0

1.2.4. Harnessing CI to transform from content-centric to user-centric applications

1.3. Classifying intelligence

1.3.1. Explicit intelligence

1.3.2. Implicit intelligence

1.3.3. Derived intelligence

1.4. Summary

1.5. Resources

Chapter 2. Learning from user interactions

2.1. Architecture for applying intelligence

2.1.1. Synchronous and asynchronous services

2.1.2. Real-time learning in an event-driven system

2.1.3. Polling services for non–event-driven systems

2.1.4. Advantages and disadvantages of event-based and non–event-based architectures

2.2. Basics of algorithms for applying CI

2.2.1. Users and items

2.2.2. Representing user information

2.2.3. Content-based analysis and collaborative filtering

2.2.4. Representing intelligence from unstructured text

2.2.5. Computing similarities

2.2.6. Types of datasets

2.3. Forms of user interaction

2.3.1. Rating and voting

2.3.2. Emailing or forwarding a link

2.3.3. Bookmarking and saving

2.3.4. Purchasing items

2.3.5. Click-stream

2.3.6. Reviews

2.4. Converting user interaction into collective intelligence

2.4.1. Intelligence from ratings via an example

2.4.2. Intelligence from bookmarking, saving, purchasing Items, forwarding, click-stream, and reviews

2.5. Summary

2.6. Resources

Chapter 3. Extracting intelligence from tags

3.1. Introduction to tagging

3.1.1. Tag-related metadata for users and items

3.1.2. Professionally generated tags

3.1.3. User-generated tags

3.1.4. Machine-generated tags

3.1.5. Tips on tagging

3.1.6. Why do users tag?

3.2. How to leverage tags

3.2.1. Building dynamic navigation

3.2.2. Innovative uses of tag clouds

3.2.3. Targeted search

3.2.4. Folksonomies and building a dictionary

3.3. Extracting intelligence from user tagging: an example

3.3.1. Items related to other items

3.3.2. Items of interest for a user

3.3.3. Relevant users for an item

3.4. Scalable persistence architecture for tagging

3.4.1. Reviewing other approaches

3.4.2. Recommended persistence architecture

3.5. Building tag clouds

3.5.1. Persistence design for tag clouds

3.5.2. Algorithm for building a tag cloud

3.5.3. Implementing a tag cloud

3.5.4. Visualizing a tag cloud

3.6. Finding similar tags

3.7. Summary

3.8. Resources

Chapter 4. Extracting intelligence from content

4.1. Content types and integration

4.1.1. Classifying content

4.1.2. Architecture for integrating content

4.2. The main CI-related content types

4.2.1. Blogs

4.2.2. Wikis

4.2.3. Groups and message boards

4.3. Extracting intelligence step by step

4.3.1. Setting up the example

4.3.2. Naïve analysis

4.3.3. Removing common words

4.3.4. Stemming

4.3.5. Detecting phrases

4.4. Simple and composite content types

4.5. Summary

4.6. Resources

Chapter 5. Searching the blogosphere

5.1. Introducing the blogosphere

5.1.1. Leveraging the blogosphere

5.1.2. RSS: the publishing format

5.1.3. Blog-tracking companies

5.2. Building a framework to search the blogosphere

5.2.1. The searcher

5.2.2. The search parameters

5.2.3. The query results

5.2.4. Handling the XML response

5.2.5. Exception handling

5.3. Implementing the base classes

5.3.1. Implementing the search parameters

5.3.2. Implementing the result objects

5.3.3. Implementing the searcher

5.3.4. Parsing XML response

5.3.5. Extending the framework

5.4. Integrating Technorati

5.4.1. Technorati search API overview

5.4.2. Implementing classes for integrating Technorati

5.5. Integrating Bloglines

5.5.1. Bloglines search API overview

5.5.2. Implementing classes for integrating Bloglines

5.6. Integrating providers using RSS

5.6.1. Generalizing the query parameters

5.6.2. Generalizing the blog searcher

5.6.3. Building the RSS 2.0 XML parser

5.7. Summary

5.8. Resources

Chapter 6. Intelligent web crawling

6.1. Introducing web crawling

6.1.1. Why crawl the Web?

6.1.2. The crawling process

6.1.3. Intelligent crawling and focused crawling

6.1.4. Deep crawling

6.1.5. Available crawlers

6.2. Building an intelligent crawler step by step

6.2.1. Implementing the core algorithm

6.2.2. Being polite: following the robots.txt file

6.2.3. Retrieving the content

6.2.4. Extracting URLs

6.2.5. Making the crawler intelligent

6.2.6. Running the crawler

6.2.7. Extending the crawler

6.3. Scalable crawling with Nutch

6.3.1. Setting up Nutch

6.3.2. Running the Nutch crawler

6.3.3. Searching with Nutch

6.3.4. Apache Hadoop, MapReduce, and Dryad

6.4. Summary

6.5. Resources

2. Deriving intelligence

Chapter 7. Data mining: process, toolkits, and standards

7.1. Core concepts of data mining

7.1.1. Attributes

7.1.2. Supervised and unsupervised learning

7.1.3. Key learning algorithms

7.1.4. The mining process

7.2. Using an open source data mining framework: WEKA

7.2.1. Using the WEKA application: a step-by-step tutorial

7.2.2. Understanding the WEKA APIs

7.2.3. Using the WEKA APIs via an example

7.3. Standard data mining API: Java Data Mining (JDM)

7.3.1. JDM architecture

7.3.2. Key JDM objects

7.3.3. Representing the dataset

7.3.4. Learning models

7.3.5. Algorithm settings

7.3.6. JDM tasks

7.3.7. JDM connection

7.3.8. Sample code for accessing DME

7.3.9. JDM models and PMML

7.4. Summary

7.5. Resources

Chapter 8. Building a text analysis toolkit

8.1. Building the text analyzers

8.1.1. Leveraging Lucene

8.1.2. Writing a stemmer analyzer

8.1.3. Writing a TokenFilter to inject synonyms and detect phrases

8.1.4. Writing an analyzer to inject synonyms and detect phrases

8.1.5. Putting our analyzers to work

8.2. Building the text analysis infrastructure

8.2.1. Building the tag infrastructure

8.2.2. Building the term vector infrastructure

8.2.3. Building the Text Analyzer class

8.2.4. Applying the text analysis infrastructure

8.3. Use cases for applying the framework

8.4. Summary

8.5. Resources

Chapter 9. Discovering patterns with clustering

9.1. Clustering blog entries

9.1.1. Defining the text clustering infrastructure

9.1.2. Retrieving blog entries from Technorati

9.1.3. Implementing the k-means algorithms for text processing

9.1.4. Implementing hierarchical clustering algorithms for text processing

9.1.5. Expectation maximization and other examples of clustering high-dimension sparse data

9.2. Leveraging WEKA for clustering

9.2.1. Creating the learning dataset

9.2.2. Creating the clusterer

9.2.3. Evaluating the clustering results

9.3. Clustering using the JDM APIs

9.3.1. Key JDM clustering-related classes

9.3.2. Clustering settings using the JDM APIs

9.3.3. Creating the clustering task using the JDM APIs

9.3.4. Executing the clustering task using the JDM APIs

9.3.5. Retrieving the clustering model using the JDM APIs

9.4. Summary

9.5. Resources

Chapter 10. Making predictions

10.1. Classification fundamentals

10.1.1. Learning decision trees by example

10.1.2. Naïve Bayes’ classifier

10.1.3. Belief networks

10.2. Classifying blog entries using WEKA APIs

10.2.1. Building the dataset for classifying blog entries

10.2.2. Building the classifier class

10.3. Regression fundamentals

10.3.1. Linear regression

10.3.2. Multi-layer perceptron (MLP)

10.3.3. Radial basis functions (RBF)

10.4. Regression using WEKA

10.5. Classification and regression using JDM

10.5.1. Key JDM supervised learning–related classes

10.5.2. Supervised learning settings using the JDM APIs

10.5.3. Creating the classification task using the JDM APIs

10.5.4. Executing the classification task using the JDM APIs

10.5.5. Retrieving the classification model using the JDM APIs

10.5.6. Retrieving the classification model using the JDM APIs

10.6. Summary

10.7. Resources

3. Applying intelligence in your application

Chapter 11. Intelligent search

11.1. Search fundamentals

11.1.1. Search architecture

11.1.2. Core Lucene classes

11.1.3. Basic indexing and searching via example

11.2. Indexing with Lucene

11.2.1. Understanding the index format

11.2.2. Modifying the index

11.2.3. Incremental indexing

11.2.4. Accessing the term frequency vector

11.2.5. Optimizing indexing performance

11.3. Searching with Lucene

11.3.1. Understanding Lucene scoring

11.3.2. Querying Lucene

11.3.3. Sorting search results

11.3.4. Querying on multiple fields

11.3.5. Filtering

11.3.6. Searching multiple indexes

11.3.7. Using a HitCollector

11.3.8. Optimizing search performance

11.4. Useful tools and frameworks

11.4.1. Luke

11.4.2. Solr

11.4.3. Compass

11.4.4. Hibernate search

11.5. Approaches to intelligent search

11.5.1. Augmenting search with classifiers and predictors

11.5.2. Clustering search results

11.5.3. Personalizing results for the user

11.5.4. Community-based search

11.5.5. Linguistic-based search

11.5.6. Data search

11.6. Summary

11.7. Resources

Chapter 12. Building a recommendation engine

12.1. Recommendation engine fundamentals

12.1.1. Introducing the recommendation engine

12.1.2. Item-based and user-based analysis

12.1.3. Computing similarity using content-based and collaborative techniques

12.1.4. Comparison of content-based and collaborative techniques

12.2. Content-based analysis

12.2.1. Finding similar items using a search engine (Lucene)

12.2.2. Building a content-based recommendation engine

12.2.3. Related items for document clusters

12.2.4. Personalizing content for a user

12.3. Collaborative filtering

12.3.1. k-nearest neighbor

12.3.2. Packages for implementing collaborative filtering

12.3.3. Dimensionality reduction with latent semantic indexing

12.3.4. Implementing dimensionality reduction

12.3.5. Probabilistic model–based approach

12.4. Real-world solutions

12.4.1. Amazon item-to-item recommendation

12.4.2. Google News personalization

12.4.3. Netflix and the BellKor Solution for the Netflix Prize

12.5. Summary

12.6. Resources

Index

List of Figures

List of Tables

List of Listings