| General Probabilistic Modelling |
Background
Obtaining a comprehensive mathematical representation of the complete structure of a set of data remains a significant challenge, but an effective model can be both a valuable aid in the understanding of data as well as an essential component of more sophisticated analysis systems. In contrast to predictive modelling, for example, there is typically no distinguished "dependent" variables. Instead, potential dependencies between all variables may be of interest, and structural features across the entire data set would ideally be captured to some extent by an appropriately-specified model. This type of general modelling approach is referred to as "unsupervised learning" in the machine learning vernacular.
A model which "represents" the data to an appropriate level of accuracy can be effectively exploited within many practical information processing applications. Examples are the implementation of novelty detection, the estimation of missing or corrupt data, the principled fusion of multiple models, and the implementation of dynamically-reconfigurable pattern recognition systems.
Expertise
At Vector Anomaly we advocate a statistical approach to the problem of data representation, and aim to derive a faithful model of the probability distribution of the data. This approach is a flexible one, and can be applied to both descriptive/interpretive applications or "black-box" processing systems. As a result, we are able to model a range of dependencies and features — e.g. co-linearities (principal components) and clusters (mixture models) — all within a unified probabilistic framework.
Much of our earlier work has focussed on this type of modelling including, for example, our popular probabilistic PCA mixture model.
Key Technology
- General probability density estimation, from simple univariate distributions to multivariate mixture models.
- Bayesian methodology to constrain model complexity and maximise fidelity.
- Use of advanced estimation algorithms, exploiting "expectation-maximisation" and "variational inference" techniques.
- Linear latent variable models for continuous data, including factor analysis and probabilistic PCA.
- Mixture extensions of single models (including PCA) for greater flexibility.
- Nonlinear latent variable models.
- Latent category models for discrete data.

