Πέμπτη 9 Οκτωβρίου 2014

Model description-Warning long post!


Post #3

Introduction

In this post I will describe the basic structure of the filter that I plan on implementing in the resampling step of the FastSLAM .The post is structured as follows:
First a brief introduction regarding Dirichlet processes as well as the basic finite and infinite mixture models using such processes are given. Then the basic inference methods on finite and infinite mixture models are presented. Finally the relation that connects such models to my research topic is analysed.
Reading material regarding DP's can be found here and here. Wikipedia pages have useful but long articles regarding the Dirichlet Distribution and Dirichlet Process and their respective properties(exchangeability, aggregation etc) .

The Dirichlet process, a fast introduction.

Dirichlet distribution: Let :

Be a random pmf.  In addition, suppose that 


Is a vector of length k, with α0 defining the sum of all α. Then q is said to have a Dirichlet distribution with parameter α, which is denoted by Q~ Dir(α), if it has f(q;a)=0 if q is not a pmf and when q is a pmf then:

Parameter vector α affects the density of the distribution. For parameter vectors consisting only of 1's, the distribution reduces to the uniform distribution. For α values >>1 the density is moved towards the center of the simplex, whereas for alpha  0<α<1 the density is moved towards the vetrices of the simplex.
The scater plots show random samples from a dirichlet distribution for different values of the α vector.
α=[1,1,1] - Uniform distribution

α= [.1,.1,.1]- Distribution skewed towards the vertices

α=[2,5,10]-Distribution skewed towards parameters Q2,Q3

                           
There are a many ways that one can sample from a Dirichlet distribution. I only will enumerate them as they are covered extensively in the cited bibliography. Some of the ways that one can sample from a Dirichlet distribution are the following:

  • Polya's Urn
  • Chinese restaurant Process
  • Stick breaking process
  • Gamma distribution Random variables.
The Gama R.V's was the method used to create the scatter plots regarding the density of the Dirichlet.


To shift from a Dirichlet distribution to a Dirichlet process, consider the following. Supose we have a bag containing dice and every die has a Probability Mass Function(pmf) regarding the probability of rolling a one, a two etc. Due to the reality of manufacturing, the pmf of every die will be close to but not identical to a uniform pmf. If we pick up a die at random from the bag, we draw a specific PMF, which is a realization of a Dirichlet distribution a single point in the previous plots. If we allow the number of sides a dice has to go to infinity, that is the number of possible labels in a distribution, we can model this random distribution of distributions over an infinite label space using a Dirichlet Process.

Using simple plate notation a Dirichlet process can be modeled as:
Dirichlet Process
Where G follows a Dirichlet process which is denoted as G~ DP(a,G0), where a is the concentration parameter and G0 the base distribution.
Dirichlet Process: Let α be a positive, real valued scalar. Let G0 be a non-atomic probability distribution over support set A.  If  G~ DP(a,G0) then for any finite set of partitions
A1 A2 ...An of A:  (G(A1),... ,G(An)) ~ Dirichlet(αG0(A1),... αG0(An)).

Finite and Infinite mixture models:

Finite mixture models assume that the data come from a mixture of a finite number of distributions. Therefore the basic plate graph of an FMM is the following:
Finite mixture model for K labels

 The model can be described as:

π ~ Dirichlet(α/Κ,....,α/Κ)
c ~ Multinomial(π)
ηk ~ G0
yn | c, η ~ F(.| η(c(n))) 


Α and π parameters define the mixture weight for a given class and the base distribution G0 the cluster where the datum will be assigned to. Note that Dirichlet is a discrete distribution and so there exists a probability>0 that two points will belong to the same cluster. The data are further distributed within their clusters according to a distribution F. The F(.| η(c(n)))  part of the model defines the distribution at cluster k where the η(c(n)) defines the sufficient statistics for that given distribution. e.g for a gaussian that would be the mean and variance.

Infinite mixture models assume that data come from an infinite number of  distributions. Since the number of distributions is now infinite, the labels K lose their meaning as they become continuous. The model then becomes the following:
Infinite mixture model










Inference on Dirichlet process mixture models can be performed via two basic approaches.
  1. MCMC(Markov Chain Monte Carlo) methods
  2. Variational Inference
Both approaches have a big variety of implemented methods with two extensively cited papers being the following :
Most of my research is as of now focused in MCMC solutions of Dirichlet process mixture inference, but plans for future variational methods research exist.

Research question, revisited.

A quick recap regarding the research question will shed more light on how the Dirichlet Process should be used in the context of particle resamplig.
The basic motive is to "smart-up" the resampling phase making it more elegant to the particles that it outputs without closing the randomness of this process. Problem definition and basic motivation are presented next.

Problem Definition:

Given a set of particles along with their significance weight, cluster them to similar particles and then sample the infered distribution taking into account the inter-cluster as well as the intra-cluster distribution significance of particles.
The process can be split into three basic steps:
  1. Given the current particle predicted distribution, output relevant particle clusters namely K.
  2. Given the K clusters, model inter-cluster distribution using the particle weights.
  3. Sample from the created model.

Problem analysis:

In the resampling step of a particle filter, we resample using the posterior belief of the particle position and their significance weights. The basic idea behind this approach is to use the particle distribution after the significance weighs are calculated as an input to the Gibbs sampler to perform inference over the position distribution of the particles. The clustered data can then be used to update a Dirichlet distribution over every cluster, which has on top a parent dirichlet distribution with alpha parameters proportional to the sum of the particle weights that belong to a given cluster. This step I have performed on stationary data and seems to produce valid results. Particle poses will not be taken into account.

Note that using Gibbs Sampler the data can be modeled without predefining the number of clusters. Instead you give the sampler a distribution over which it would be reasonable for the data to be distributed from. Note that the sampler is very flexible as it can also correclty cluster data that have strong deviations from the belief distribution. Such an example can be found here where Anne used the sampler to cluster lines with initial belief of being Gaussian.

My current target is to have a matlab implementation as soon as possible and then migrate the method to ROS and the turtlebot simulation.


Collapsed CRP Gibbs sampler inference results over a small dataset
Finally a screenshot of a CollapsedCRP sampler in action. Data are divided into 5 clusters.

Δεν υπάρχουν σχόλια:

Δημοσίευση σχολίου