-AP&LLE · 无监督学习中的 选代表和被代表问题-AP&LLE 张响亮 XiangliangZhang King...
Transcript of -AP&LLE · 无监督学习中的 选代表和被代表问题-AP&LLE 张响亮 XiangliangZhang King...
无监督学习中的选代表和被代表问题
- AP & LLE张响亮
Xiangliang Zhang
King Abdullah University of Science and Technology
CNCC, Oct 25, 2018
Hangzhou, China
Outline
2
• Affinity Propagation (AP)[Frey and Dueck, Science, 2007]
• Locally Linear Embedding (LLE)[Roweis and Saul, Science, 2000]
Affinity Propagation
3
[Frey and Dueck, Science 2007]
Affinity Propagation
4
[Frey and Dueck, NIPS 2005]
We describe a new method that, for the first time to our knowledge, combines the advantages of model-based clustering and affinity-based clustering.
Component
Mixingcoefficient
Clustering: group the similar points together
21 )( miCx
km x
miµ-SS Î=
iCxm
m xC miÎ
S=||
1µ
Minimize
where
21 )( miCx
km x
miµ-SS Î=
}|{ miim Cxx Î=µ
Minimize
where
K-medoidsK-medians
Inspired by greedy k-medoids
Data to cluster:
The likelihood of belong to the cluster with center
is the Bayesian prior probability that is a cluster center
The responsibility of k-thcomponent generating xi
Assign xi with center si
Choose a new center
Understanding the processMessage sent from xi to eachcenter/exemplar: preference tobe with each exemplar
Hard decision for cluster centers/exemplars
Introduce: “availabilities”, which is sent from exemplars to other points, and provides soft evidence of the preference for each exemplar to be available as a center for each point
The method presented in NIPS‘05
8
• Responsibilities are computed using likelihoods and availabilities• Availabilities are computed using responsibilities, recursively
Affinities
Interpretation by Factor Graph
9
is the index of the exemplar for
Objective function is
Constraints:
Should not be emptywith a single exemplar
An exemplar mustselect itself as exemplar
Input and Output of AP in Science07
10
Preference (prior)
AP: a message passing algorithm
11
Iterations of Message passing in AP
12
Iterations of Message passing in AP
13
Iterations of Message passing in AP
14
Iterations of Message passing in AP
15
Iterations of Message passing in AP
16
Iterations of Message passing in AP
17
Iterations of Message passing in AP
18
Iterations of Message passing in AP
19
Summary AP
Xiangliang Zhang, KAUST CS340: Data Mining 20
Extensive study of AP
21
Outline
22
• Affinity Propagation (AP)[Frey and Dueck, Science, 2007]
• Locally Linear Embedding (LLE)[Roweis and Saul, Science, 2000]
Locally Linear Embedding (LLE)
23
[Roweis and Saul, Science, 2000]
Saul and Roweis. Think globally, fit locally: unsupervised learning of low dimensional manifolds. JMLR 2003
LLE - motivations
24
Inspired by MDS
25
Multidimensional Scaling (MDS), find embedding of objects in low-dim space, preserving pairwise distance
Given pairwise similarity Embedding to find
Eliminate the need to estimate pairwise distancesbetween widely separated data points?
LLE – general idea
� Locally, on a fine enough scale, everything looks linear
� Represent object as linear combination of its neighbors
� Assumption: same linear representation will hold in the low dimensional space
� Find a low dimensional embedding which minimizes reconstruction loss
26
LLE – matrix representation
1.Select k nearest neighbors
2.Reconstruct xi by its K nearest neighbors
27
2
i jjiji ||xwx|| (W) å å-=e
Find W by minimizing
LLE – matrix representation
3.Need to solve system Y = W*Y
28
Find the embedding vectors Yi by minimizing:
)econvariancunit with ( IYYN1 and
origin) on the (centered 0Y s.t.
W)(IW)(IM where
)Y(YM||YWY|| (Y)
N
ii
Ti
N
ii
T
N
i
N
jjiij
2
i jjiji
å =
å =
--=
åå •=å å-=e
LLE – algorithm summary
1. Find k nearest neighbors in X space
2. Solve for reconstruction weights W
3. Compute embedding coordinates Y using weights W:
� Create a sparse matrix M = (I-W)T(I-W)
� Set Y to be the eigenvectors corresponding to the bottom dnon-zero eigenvectors of M
29
neighborsnearest K s x'of one is and ),x()x(C whereC
C w jkT
jjk1-T
-1hhh --==
111
Continuing, SNE
30
Allows “many-to-one” mappings in which a single ambiguous object really belongs in several disparate locations in the low-dimensional space, while LLE makes one-to-one mapping.
pj|i is the asymmetric probability that i would pick j as its neighbor
Gaussian Neighborhood in original space
qj|i is induced probability that point i picks point j as its neighbor
Gaussian Neighborhood in low-dim space
Continuing, t-SNE
� uses a Student-t distribution (heavier tail) rather than a Gaussian to compute the similarity between two points in the low-dimensional space
� symmetrized version of the SNE cost functionwith simpler gradients
31
LLE - Follow up work
LLE output strongly depends on selection of k
Jing Chen and Yang Liu. Locally linear embedding: a survey.Artificial Intelligence Review (2011)
32
[Chen and Liu, 2011]
[Ting and Jordan, 2018]
Thank you for your attention!
Lab of Machine Intelligence and kNowledge Engineering (MINE): http://mine.kaust.edu.sa/