=======================
Gaussian Mixture Models
=======================

A Gaussian mixture model is a probabilistic model that assumes that data are generated from a finite mixture of Gaussians with unknown parameters. The model likelihood can be written as:

.. math::

    p(x|\theta) = \sum_{i=1}^{K}{\pi_i \mathcal{N}(x|\mu_i, \Sigma_i)}

where :math:`p(x|\theta)` is probability distribution given :math:`\theta:=\{\pi_i, \mu_i, \Sigma_i\}_{i=1}^K`, :math:`K` denotes number of mixture components, :math:`\pi_i` denotes weight for :math:`i`-th component, :math:`\mathcal{N}` denotes a multivariate normal distribution with mean vector :math:`\mu_i` and covariance matrix :math:`\Sigma_i`.

The expectation maximization (EM) algorithm is used to learn parameters of the model, via finding a local maximum of a lower bound on the likelihood.

See Chapter 20 in :cite:`barber2012bayesian` for a detailed introduction.

-------
Example
-------

We start by creating CDenseFeatures (here 64 bit floats aka RealFeatures) as

.. sgexample:: gmm.sg:create_features

We initialize :sgclass:`CGMM`, passing the desired number of mixture components.

.. sgexample:: gmm.sg:create_gmm_instance

We provide training features to the :sgclass:`CGMM` object, train it by using EM algorithm and sample data-points from the trained model.

.. sgexample:: gmm.sg:train_sample

We extract parameters like :math:`\pi`, :math:`\mu_i` and :math:`\Sigma_i` for any componenet from the trained model.

.. sgexample:: gmm.sg:extract_params

We obtain log likelihood of belonging to clusters and being generated by this model.

.. sgexample:: gmm.sg:cluster_output

We can also use Split-Merge Expectation-Maximization algorithm :cite:`ueda2000smem` for training.

.. sgexample:: gmm.sg:training_smem

----------
References
----------
:wiki:`Mixture_model`

:wiki:`Expectation–maximization_algorithm`

.. bibliography:: ../../references.bib
    :filter: docname in docnames
