[精品论文]A New LocalPCA SOM Algorithm.doc

上传人：sccc

文档编号：5189796

上传时间：2023-06-12

格式：DOC

页数：20

大小：1.05MB

《[精品论文]A New LocalPCA SOM Algorithm.doc》由会员分享，可在线阅读，更多相关《[精品论文]A New LocalPCA SOM Algorithm.doc（20页珍藏版）》请在三一办公上搜索。

1、精品论文A New Local PCA SOM AlgorithmDong Huang Zhang Yi and Xiaorong PuComputational Intelligence Laboratory, School of Computer Science andEngineering, University of Electronic Science and Technology of China, Chengdu610054, P. R. China.E-mail: donnyhuang, zhangyi, puxiaor.AbstractThis paper proposes

2、a Local PCA-SOM algorithm. The new competition measure is computational efficient, and implicitly incorporates the Mahalanobis distance and the reconstruction error. The matrix inversion or PCA decomposition for each data input is not needed as compared to the previous models. Moreover, the local da

3、ta distribution is completely stored in the covariance matrix instead of the pre-defined numbers of the principal components. Thus, no priori information of the optimal principal subspace is required. Experiments on both the synthesis data and a pattern learning task are carried out to show the perf

4、ormance of the proposed method.Key words: Neural Networks; Unsupervised Learning; Local Principal ComponentAnalysis; Self-Organizing Mapping1 IntroductionPrincipal component analysis (PCA) has been wildly used in dimension re- duction of multi-variate data, data compression, pattern recognition and

5、sta- tistical analysis. A PCA algorithm is designed to approximate the original high-dimensional pattern space by a low-dimensional subspace spanned by the principal eigenvectors of the data covariance matrix. In this way, the data distribution can be represented and reconstructed by the principal e

6、igenvec- tors and their corresponding eigenvalues. However, PCA is globally linear and1 This work was supported by National Science Foundation of China under Grant60471055 and Specialized Research Fund for the Doctoral Program of Higher Edu- cation under Grant 20040614017.Preprint submitted to Elsev

7、ier Science 28 September 20071inefficient for data distribution with non-linear dependencies 1. Several ex- tensions have been suggested to overcome this problem. These extensions fall into two main categories: the global nonlinear approach and locally linear de- scriptions. The former includes the

8、Principal Curves 2 and Kernel PCA 3. We focus on the latter approaches in which the data distribution is represented by a collection of Local PCA units 78.When describing a data distribution with a mixture model, the question arises what kind of units should the mixture contain. In Gaussian mixture

9、models 7, the local iso-density surface of a uniform Gaussian unit is a sphere. And a local PCA unit corresponds to a multivariate Gaussian with an ellipsoid iso-density surface. Despite its greater complexity, the local PCA is favorable over a sphere unit for the following reasons. An ellipsoid can

10、 describe a local structure for which many spheres are needed. Furthermore, data distributions are usually constrained locally to subspaces with fewer dimensions than the space of the training data. Thus, there are directions in which the distribution has locally zero variance (or almost zero becaus

11、e of noise). An ellipsoid unit representation can extend its minor components into the additional noise dimensions. But the computational cost increases over-proportionally with the number of principal components needed.In the recent work NGAS-PCA 4, the local PCA model is based on soft competition

12、scheme of Neural Gas algorithm and the RRLSA 10 online PCA learning in each local unit. The novelty of the NGAS-PCA is that its dis- tance measure for neuron competition is the combination of a normalized Mahalanobis distance and the squared reconstruction error. Note that storing the principal basi

13、s vectors for every unit involves rather complicated updating procedures as in NGAS-PCA and ASSOM 9. Another recent model is called the PCA-SOM 5 which proposes an alternative way by storing the local in- formation in the covariance matrix. This is directly derived from on statistics theories and ha

14、s great advantages over the ASSOM model both in computa- tion burden and reliability of the result. However, PCA-SOM only uses the reconstruction error in the principal subspace in its neuron competition. For this reason, PCA-SOM still need the PCA decomposition or updating with predefined numbers o

15、f the principal components when each training data is presented.In this paper, we propose a new computational efficient Local PCA algorithm that combines the advantages of NGAS-PCA and PCA-SOM. Each unit is associated with its mean vector and covariance matrix. The new competition measure implicitly

16、 incorporate the reconstruction error and distance between the input data and the unit center. In the proposed algorithm, the extra up- dating step of the principal subspaces is eliminated from the data distribution learning process. In addition, the local distribution of the input data is com- plet

17、ely stored in the covariance matrix instead of the predefined numbers of8精品论文the principal components. One potential application of the proposed model is the nonlinear pattern learning and recalling 11. In most models of associative memory, memories are stored as attractive fixed points at discrete

18、locations in state space. Discrete attractors may not be appropriate for patterns with con- tinuous variability 13. For example, the images of a three-dimensional object from different viewpoints 12. After the training process, the data distribution is represented by a collection of local linear uni

19、ts. No priori information for the optimal principal subspace is needed for the pattern representation.The rest of this paper is organized as follows. In section 2, the proposed compe- tition measure and neuron updating method are formulated. Section 3 presents the simulation results and discussions.

20、 Section 4 briefly reviews the NGAS- PCA and PCA-SOM algorithms. And their relations to the proposed method are also discussed in details in this section. Finally, the paper is concluded in Section 5.2 The Local PCA modelIn this section, we present the detailed formulation of the Local PCA models. T

21、he following notations are used throughout the paper. The input data setis X = xi |xi Rd , i = 1, , M , where d is the data dimension. For each local unit, there are the center of the unit cj and the covariance matrixCj where j = 1, , N . The principal subspace of Cj is characterized by W (p) = w1,

22、, wp and (p) = diag1 , , p , where wi is the eigenvector and i the corresponding eigenvalue (i = 1, , p). Throughout the paper the eigenvalues are arranged in the descending order. Thus p is the number of underlying principal components. The rest (d p) eigenvectors spans theunderlying minor subspace

23、.2.1 The Proposed Competition MeasureWe seek to formulate a local PCA algorithm that can simplify both the neuron competition and updating. As in PCA-SOM, each unit in our model is also as- sociated with its center and the covariance matrix. Thus the data distribution is represented by a mixture of

24、the ellipsoid local units. This is closely related to the anisotropic K-means clustering and kernel density estimation where we seek to minimizing the Mahalanobis distanceM ahalanobis(x) = xT C 1 x,or maximizing the normalized Gaussian density function:(x) = 1e x C x,1 T 121Z +(x)dx = 1,(1)2|C | 2 w

25、here |C | = 0. However, due to the matrix inversion C 1 , the computation complexity of the Mahalanobis distance is O(d!). We consider an alternativeneuron competition measure:N R(x, C ) =xT C xtr(C )(xT x)2,(2)dwhere the size of the hyper-ellipsoid is normalized by tr(C ) = X i . The newi=1measure

26、is called the Normalized Rayleigh ratio N R(x, C ). It is the normalizedscattering intensity of the data points. The computation burden of N R(x, C )is reduced to O(d2 ). The definition of N R(x, C ) is based on the assumptionthat xT x = 0. The following analysis start from the the Rayleigh quotient

27、.Given a Hermitian matrix C and nonzero vector x, the Rayleigh quotientR(x, C ) is defined by:R(x, C ) =xT C x xT x .Rayleigh-Ritz Theorem: If the eigenvalues of a Hermitian (symmetric) ma- trix C is ordered as max = 1 2 d1 d = min , thenmin xT C xxT x max ,(3)where the equalities hold when x is an

28、eigenvector of C . The gradient andHessian matrix of R(x, C ) can be computed as:R 2(x) =(C R I )x, (4)x xT x2R 2R T R THR (x) = xxT = xT x C x xwhere I is the identity matrix. x( x )x R I , (5)For the jth local PCA neuron unit with its center cj and covariance matrixCj , the Normalized Rayleigh rat

29、ios N R(, Cj ) is then written asT C N R(, Cj ) = tr(C )(T )2 ,(6)jwhere = x cj . If N R(, Cj ) is used as the competition measure in the probabilistic SOM training process, the winner v = maxN R(x, cj (t), Cj (t).The final result of the training process is obtained when N R(x, cj (t), Cj ) (j =1, ,

30、 N ) is maximized for x x|E(x) = cj , E(xxT ) = Cj . To analyze themaximum property of N R, the neuron index j is omitted.It is easy to know from (4) and (5) that () = 0 only at the critical points, where = wi (i = 1, , d). And at these critical pointsHR (wi )wj = 0j = i, (j i )wjj = i.In real appli

31、cations the covariance matrix C are positive defined, i.e. rank(C ) =d. Note that x = 0, there is no local minima or maxima except for those onthe edge of the domain, i.e. the maxima at = wd , and the minima at = w1.TT This shows that the Rayleigh ratio C is only polarized by the directions ofthe in

32、put vectors.Take the logistic of N R(, C ) and compute the gradient,(ln(N R(, C )= (ln(T C ) 2ln(T )2C = T C 2=4xT C x T 2x T C .(7)T C T Without losing generality, we assume = x c is in the subspace spanned by the eigenvectors of the covariance matrix C .1/2 = 1 w1 + + d wd ,kkdiwhere X 2 = 1. The

33、second term in (7) is written asi=1C T 2 T C d dd d != kk3/2 X i wi wT X i wi 2 X i wi X i 2iiiiii d d j= kk3/2 X i i 2(X j 2)i wi .ijSince wi , i = 1, , d is a basis vector set in Rd , ln(N R)/ = 0 only ifdji i 2(X j 2)i = 0,jfor all i = 1, , d. However, this is not possible (note kk = 0). ThusN R(

34、, C ) has no local maxima except on the edge of the domain. From theRayleigh-Ritz Theorem (3), we have1T C 1 1T T C 1 T max T min max minandminT C maxminT C maxtr(C ) tr(C )T tr(C ) . tr(C )T tr(C )(T )2 tr(C )T .In the probabilistic framework as SOM training, the iterative algorithm uses input data

35、 (x x|E(x) = c, E(xxT ) = C ) over multiple pass. We considerthe property of the training results in the sense of mathematical expectation,E min(T )2 # #= E min T C 1 T= E(tr(C ) Pdi) = = i=1. (8)kk6=0T C kk6=0maxmaxmaxIt can be observed from (8) that maximizing the Normalized Rayleigh ratio (2)thro

36、ugh the SOM training process is equivalent to minimizing the Mahalanobis distance xT C 1 x, with a constrain on the size of the hyper-ellipsoid tr(C ) =dX i .i=12.2 Updating of the NeuronsThe neuron centers and covariance matrices are updated similar to that of PCA-SOM. In practical situation, the N

37、ormalized Rayleigh ratio (6) is regu- lated as:TNR(, C ) =C + ,tr(C )(T )2 + jin case kk = 0 occurs, where = x cj (t) and 0 is a small constant. When a input data point,i.e. x, is randomly selected from the data set X , the winning neuron v is chosen as v = maxN R(x, cj (t), Cj (t) (j = 1, , N ).The

38、n all the units (j = 1, , N ) are updated as follows:cj (t + 1) = cj (t) + (t)hv,j (t) x cj (t) ,(9)C j (t + 1) = (x cj (t + 1)(x cj (t + 1)T,(10)jCj (t + 1) = Cj (t) + (t)hv,j (t) hC (t + 1) Cj (t)i ,(11)where (t) is the learning rate, and hv,j (t) is the neighborhood function of jth neuron given t

39、he winning neuron v. The training process stops when the changes of cj and Cj can be neglected. In the proposed method, the amplitudeof neuron updates (9)-(11) are not dependent on the iteration number, but on how well the neural network fits the input data 6. To achieve this, thelearning rate (t) i

40、s calculated as follows:(t) = kx(t) cv (t)k2 , r(t)wherer(t) = maxkx(t) cv (t)k2, r(t 1),r(0) = kx(0) cv (0)k2.The neighborhood function depends on N R(cj (t), cv (t), Cv (t) and is calcu- lated ashv,j (t) = exp N R(cj (t), cv (t), Cv (t) !(t),(12)where is a constant that scales the size of the neig

41、hborhood. In our experi-Nments in Section 3, it can be estimated as 1of the total variance of the datadistribution.The proposed method is an iterative deterministic algorithm similar to the neural networks for global PCA and SOM . The proposed method updates neurons and converges iteratively using t

42、he sequentially training data over multiple passes. New data can be incorporated to update the outputs by added to the previous training set. The iterations continue until converge. Here, convergence is reached when changes of neuron centers and covariance matrix can be neglected.3 Simulations and D

43、iscussionWe test the proposed local PCA method on synthetic data of two and three dimensions and on high-dimensional data in pattern completion learning and recalling tasks.3.1 Synthetic DataThe following setting was used in the training and demonstration: The initial neuron centers are randomly located at the data points. The initial covariance matrix were identity matrices multiplied with a scaler related to the spatial extension of