Τρίτη 4 Νοεμβρίου 2014

The Gibbs Sampler

In the previous blog post the details that are needed for an implementation of a Collapsed CRP Gibbs sampler were explicitly presented.
In this post, an implementation in c++ will be presented.
The post will be relatively short as all the technical details that are needed were explicitly given previously.
The code of the c++ implementation can be found here.  Since I will use the package a lot I will regularly commit to fix some bug or add an extra feature.
The videos displays the behaviour of the algorithm for different values of alpha and a different number of iterations.
More specificaly, the hyper parameter alpha is set to 20 the datapoints are given during initialization through a .txt file in the form of [a,b] making it easy to test different datasets. The data were created from 4 different random processes and the task is to find the mixture model that best fits them. The algorithm is set to 100 iterations and the output shows the visualization of cluster assignments to the data until the convergence to 4 clusters. The colormap is used to represent the group each particle belongs and the black circles represent the mean of every cluster. As can be seen, the algorithm converges to three or four clusters which is a reasonable classification given that the data came from 4 different random processes with two of which giving neighbor datapoints. It must be noted that octave uses playback data to correctly visualize the cluster allocation and the sampling even on 100 iterations takes about 1.5 seconds.





Finally, the cluster convergence is also given in the form of a tables below. Each line represents a new iteration on the algorithm and the progression of cluster allocations is shown. Even in 20 iterations the algorithm correctly converges to 4 clusters as can be seen below.
The dataset is the same as the one that was used in the video example. It must be noted that the hyper-parameter alpha plays an important factor on the number of clusters that the model will converge to. Since in the Dirichlet process the hyper-parameter alpha serves as inverse variance smaller values lead to a smaller number of clusters which leads to higher inter-cluster variance whereas bigger alpha's lead to more clusters with smaller inter-cluster variance.


Cluster allocationNumber of clusters
15-17-8-14-3-21-22-6-3-20-111
12-9-4-16-1-41-22-2-1-21-111
12-1-5-8-0-51-19-8-0-17-99
7-1-6-10-0-58-19-8-0-13-89
7-1-9-9-0-53-19-10-0-15-79
4-1-8-10-0-69-15-11-0-11-19
14-1-6-13-0-55-15-18-0-88
13-1-5-15-0-48-13-28-0-78
20-1-6-10-0-47-12-29-0-58
23-1-6-14-0-44-12-25-0-58
21-1-9-21-0-43-6-24-0-58
18-1-9-21-0-46-1-29-0-58
17-1-5-23-0-53-2-26-0-38
21-1-9-23-0-51-0-23-0-27
25-1-3-26-0-54-0-216
28-1-2-28-0-57-0-146
32-1-0-25-0-58-0-145
36-1-0-26-0-53-0-145
41-1-0-27-0-58-0-35
43-1-0-27-0-594
As can be seen smaller values of alpha lead to a smaller number of clusters with high element concentration on small amoun of clusters. More videos with the sampler on different datasets will follow further displaying the samplers behaviour

Δεν υπάρχουν σχόλια:

Δημοσίευση σχολίου