Τρίτη 4 Νοεμβρίου 2014

The Gibbs Sampler

In the previous blog post the details that are needed for an implementation of a Collapsed CRP Gibbs sampler were explicitly presented.
In this post, an implementation in c++ will be presented.
The post will be relatively short as all the technical details that are needed were explicitly given previously.
The code of the c++ implementation can be found here.  Since I will use the package a lot I will regularly commit to fix some bug or add an extra feature.
The videos displays the behaviour of the algorithm for different values of alpha and a different number of iterations.
More specificaly, the hyper parameter alpha is set to 20 the datapoints are given during initialization through a .txt file in the form of [a,b] making it easy to test different datasets. The data were created from 4 different random processes and the task is to find the mixture model that best fits them. The algorithm is set to 100 iterations and the output shows the visualization of cluster assignments to the data until the convergence to 4 clusters. The colormap is used to represent the group each particle belongs and the black circles represent the mean of every cluster. As can be seen, the algorithm converges to three or four clusters which is a reasonable classification given that the data came from 4 different random processes with two of which giving neighbor datapoints. It must be noted that octave uses playback data to correctly visualize the cluster allocation and the sampling even on 100 iterations takes about 1.5 seconds.