Background
Obtaining a comprehensive mathematical representation of the complete structure
of a set of data remains a signficant challenge, but an effective model can be
both a valuable aid in the understanding of data as well as an
essential component of more sophisticated analysis systems. In contrast to predictive modelling, for example, there is typically no distinguished
"dependent" variables. Instead, potential dependencies between
all variables may be
of interest, and structural features across the entire data set would ideally be captured to some extent by an appropriately-specified model. This type of general modelling approach is referred to as "unsupervised learning" in
the machine learning vernacular.
A model which "represents" the data to an appropriate level of
accuracy can be effectively exploited within many practical
information processing applications. Examples are the implementation
of novelty detection, the estimation of missing or corrupt data, the
principled fusion of multiple models, and the implementation of
dynamically-reconfigurable pattern recognition systems.
Expertise
At Vector Anomaly we advocate a statistical approach to
the problem of data representation, and aim to derive a faithful model
of the probability distribution of the data. This approach is
a flexible one, and can be applied to both discriptive/interpretive applications
or "black-box" processing systems. As a result, we are able to model
a range of dependencies and features — e.g. co-linearities
(principal components) and clusters (mixture models) — all within
a unified probabilistic framework.
Much of our earlier work has focussed on this type of modelling including, for example, our popular
probabilistic PCA mixture model.
 
Key Technology
- General probability density estimation, from simple univariate
distributions to multivariate mixture models.
- Bayesian methodology to constrain model complexity and maximise
fidelity.
- Use of advanced estimation algorithms, exploiting
"expectation-maximisation" and "variational inference" techniques.
- Linear latent variable models for continuous data, including
factor analysis and probabilistic PCA.
- Mixture extensions of single models (including PCA) for greater flexibility.
- Nonlinear latent
variable models.
- Latent category models for discrete data.
|