homeconsultancytechnologycontact

 

 

 

General Probabilistic Modelling

Background

Obtaining a comprehensive mathematical representation of the complete structure of a set of data remains a signficant challenge, but an effective model can be both a valuable aid in the understanding of data as well as an essential component of more sophisticated analysis systems. In contrast to predictive modelling, for example, there is typically no distinguished "dependent" variables. Instead, potential dependencies between all variables may be of interest, and structural features across the entire data set would ideally be captured to some extent by an appropriately-specified model. This type of general modelling approach is referred to as "unsupervised learning" in the machine learning vernacular.

 

A model which "represents" the data to an appropriate level of accuracy can be effectively exploited within many practical information processing applications. Examples are the implementation of novelty detection, the estimation of missing or corrupt data, the principled fusion of multiple models, and the implementation of dynamically-reconfigurable pattern recognition systems.

Expertise

At Vector Anomaly we advocate a statistical approach to the problem of data representation, and aim to derive a faithful model of the probability distribution of the data. This approach is a flexible one, and can be applied to both discriptive/interpretive applications or "black-box" processing systems. As a result, we are able to model a range of dependencies and features — e.g. co-linearities (principal components) and clusters (mixture models) — all within a unified probabilistic framework.

 

Much of our earlier work has focussed on this type of modelling including, for example, our popular probabilistic PCA mixture model.

 

Schematic illustration of Bayesian application of "Ockham's razor".Robust probabilistic modelling via a variational mixture approach.Simple Bayesian hierarchical graphical model.

Key Technology

  • General probability density estimation, from simple univariate distributions to multivariate mixture models.
  • Bayesian methodology to constrain model complexity and maximise fidelity.
  • Use of advanced estimation algorithms, exploiting "expectation-maximisation" and "variational inference" techniques.
  • Linear latent variable models for continuous data, including factor analysis and probabilistic PCA.
  • Mixture extensions of single models (including PCA) for greater flexibility.
  • Nonlinear latent variable models.
  • Latent category models for discrete data.