Vector Anomaly :: Data Analysis Evolved
Predictive Modelling

Background

A sparse binary classifier based on "relevance vectors"One of the most powerful and potentially valuable data modelling mechanisms is that of automated prediction. The classic problem is, given relevant data with appropriate annotation or labelling, to estimate a model capable of making accurate future predictions of one or more dependent "target" quantities of interest. Those quantities of interest may be continuous real values (a scenario often referred to as "regression" or "interpolation"), or can be discrete labels ("classification" or "pattern recognition"). Both these cases are referred to as "supervised learning" in the "machine learning" vernacular.

Example predictive modelling applications might include the estimation of chemical concentration from disparate sensor readings, prediction of financial indices based on underlying indicators, recognition of handwritten characters from pixel arrays, or disease susceptibility based on gene microarray outputs.

Deducing meaningful predictive relationships within data variables remains a challenging task. Choice of a particular model type and its configuration, along with implementation of appropriate data pre-processing and parameter estimation algorithms are all non-trivial tasks. Of crucial importance is the need to manage model complexity, to avoid the all-too-common phenomenon of "over-fitting", the principled avoidance of which is still an open problem. Ultimately, a specification or implementation error in any component in the predictive modelling chain can severely compromise the value of the final results.

Expertise

A single-layer network There exists an extensive, and growing, collection of established modelling tools all aimed at solving prediction tasks such as those outlined above. At Vector Anomaly, we are conversant with the most effective of these tools, ranging from the long-established statistical estimation techniques, through to the very latest contemporary machine learning technologies. From simple linear regression analysis, on to the "neural network" models popular in the 1980's and 90's (which can still have value when applied diligently), and more recently to "Gaussian processes" and the "support vector machine".

In particular, we ourselves originated the concept of "sparse Bayesian" predictive modelling, as well as the extremely popular "relevance vector machine". This invention was the product of a probabilistic philosophy to predictive modelling (which many tools crucially lack), allied with a belief in the use of Bayesian principles. These two tenets continue to underpin our modelling practices today.

Key Technology

Regression analysis of corrupted and noisy data