Introduction To Pattern Recognition And Machine Learning

Thể loại: AI ;Công Nghệ Thông Tin
  • Lượt đọc : 431
  • Kích thước : 2.29 MB
  • Số trang : 402
  • Đăng lúc : 2 năm trước
  • Số lượt tải : 151
  • Số lượt xem : 1.737
  • Đọc trên điện thoại :
Pattern recognition (PR) is a classical area and some of the important topics covered in the books on PR include representation of patterns, classification, and clustering. There are different paradigms for pattern recognition including the statistical and structural paradigms.The structural or linguistic paradigm has been studied in the early days using formal language tools. Logic and automata have been used in this context. In linguistic PR, patterns could be represented as sentences in a logic; here, each pattern is represented using a set of primitives or sub-patterns and a set of operators. Further, a class of patterns is viewed as being generated using a grammar; in other words, a grammar is used to generate a collection of sentences or strings where each string corresponds to a pattern. So, the classification model is learnt using some grammatical inference procedure; the collection of sentences corresponding to the patterns in the class are used to learn the grammar. A major problem with the linguistic approach is that it is suited to dealing with structured patterns and the models learnt cannot tolerate noise.

On the contrary the statistical paradigm has gained a lot of momentum in the past three to four decades. Here, patterns are viewed as vectors in a multi-dimensional space and some of the optimal classifiers are based on Bayes rule. Vectors corresponding to patterns in a class are viewed as being generated by the underlying probability density function; Bayes rule helps in converting the prior probabilities of the classes into posterior probabilities using the likelihood values corresponding to the patterns given in each class. So, estimation schemes are used to obtain the probability density function of a class using the vectors corresponding to patterns in the class. There are several other classifiers that work with vector representation of patterns. We deal with statistical pattern recognition in this book.

Some of the simplest classification and clustering algorithms are based on matching or similarity between vectors. Typically, two patterns are similar if the distance between the corresponding vectors is lesser; Euclidean distance is popularly used. Well-known algorithms including the nearest neighbor classifier (NNC), K-nearest neighbor classifier (KNNC), and the K-Means Clustering algorithm are based on such distance computations. However, it is well understood in the literature that distance between two vectors may not be meaningful if the vectors are in large-dimensional spaces which is the case in several state-of-the-art application areas; this is because the distance between a vector and its nearest neighbor can tend to the distance between the pattern and its farthest neighbor as the dimensionality increases. This prompts the need to reduce the dimensionality of the vectors. We deal with the representation of patterns, different types of components of vectors and the associated similarity measures in

Chapters 2 and 3.

Machine learning (ML) also has been around for a while; early efforts have concentrated on logic or formal language-based approaches. Bayesian methods have gained prominence in ML in the recent decade; they have been applied in both classification and clustering. Some of the simple and effective classification schemes are based on simplification of the Bayes classifier using some acceptable assumptions. Bayes classifier and its simplified version called the Naive Bayes classifier are discussed in Chapter 4. Traditionally there has been a contest between the frequentist approaches like the Maximum-likelihood approach and the Bayesian approach. In maximum-likelihood approaches the underlying density is estimated based on the assumption that the unknown parameters are deterministic; on the other hand the Bayesian schemes assume that the parameters characterizing the density are unknown random variables. In order to make the estimation schemes simpler, the notion of conjugate pair is exploited in the Bayesian methods. If for a given prior density, the density of a class of patterns is such that, the posterior has the same density function as the prior, then the prior and the class density form a conjugate prior. One of the most exploited in the context of clustering are the Dirichlet prior and the Multinomial class density which form a conjugate pair. For a variety of such conjugate pairs it is possible to show that when the datasets are large in size, there is no difference between the maximum-likelihood and the Bayesian estimates. So, it is important to examine the role of Bayesian methods in Big Data applications.

Some of the most popular classifiers are based on support vector machines (SVMs), boosting, and Random Forest. These are discussed in Chapter 5 which deals with classification. In large-scale applications like text classification where the dimensionality is large, linear SVMs and Random Forest-based classifiers are popularly used. These classifiers are well understood in terms of their theoretical properties. There are several applications where each pattern belongs to more than one class; soft classification schemes are required to deal with such applications. We discuss soft classification schemes in Chapter 6. Chapter 7 deals with several classical clustering algorithms including the K-Means algorithm and Spectral clustering. The so-called topic models have become popular in the context of soft clustering. We deal with them in Chapter 8.

Social Networks is an important application area related to PR and ML. Most of the earlier work has dealt with the structural aspects of the social networks which is based on their link structure. Currently there is interest in using the text associated with the nodes in the social networks also along with the link information. We deal with this application in Chapter 9.

This book deals with the material at an early graduate level. Beginners are encouraged to read our introductory book Pattern recognition: An Algorithmic Approach published by Springer in 2011 before reading this book.

M. Narasimha Murty
V. Susheela Devi
Bangalore, India