Comparative Study of Pattern Recognition, Neural Networks, and Statistical Regression Approaches to Information Retrieval

Aitao Chen. Comparative Study of Pattern Recognition, Neural Networks, and Statistical Regression Approaches to Information Retrieval. PhD Dissertation. Advisor: William S. Cooper. University of California, Berkeley. 1998.


This dissertation presents several new retrieval methods that combine the use of Bayes' theorem and probability density estimation techniques. The new methods estimate probability of relevance from a small set of statistical features characterizing document-query pairs, such as query length, within-document term frequency, the number of matching terms between a document and a query, and the like.

The central task of computing the probability of relevance in the proposed methods is to infer the density functions of the feature vector in the relevant and irrelevant classes from training examples. Both parametric and non-parametric methods are employed to estimate the density function from training examples.

A two-layer neural network is presented. It takes as input a feature vector representing a document-query pair and returns the probability of relevance. Simple and complex neural networks are compared for retrieval performance, and the results show that more complex design do not outperform significantly the simplest design.

The performances of seven retrieval methods are compared with each other. The seven retrieval methods are: linear discriminant, quadratic discriminant, k-nearest neighbor, kernel method, neural network, linear regression, and logistic regression. All seven retrieval methods are trained on a common training set and then are applied to two large test sets, the TREC-5 test set and the TREC-6 test set.

The experimental results suggest that the seven retrieval methods may be divided into two groups. The first group consists of the logistic regression, linear regression, linear discriminant, and neural network retrieval methods, whereas the second group consists of the quadratic discriminant, k-nearest neighbor, and the kernel method. The retrieval methods within the first group perform approximately equally well on the test sets. Furthermore, any method in the first group outperforms any method in the second group. In addition to being less effective in retrieval, both the kernel method and the k-nearest neighbor method are computationally intensive.

Last updated:

September 20, 2016