homeconsultancytechnologycontact

 

 

 

Predictive Modelling

Background

A sparse binary classifier based on "relevance vectors" One of the most powerful and potentially valuable data modelling mechanisms is that of automated prediction. The classic problem is, given relevant data with appropriate annotation or labelling, to estimate a model capable of making accurate future predictions of one or more dependent "target" quantities of interest. Those quantities of interest may be continuous real values (a scenario often referred to as "regression" or "interpolation"), or can be discrete labels ("classification" or "pattern recognition"). Both these cases are referred to as "supervised learning" in the "machine learning" vernacular.

 

Example predictive modelling applications might include the estimation of chemical concentration from disparate sensor readings, prediction of financial indices based on underlying indicators, recognition of handwritten characters from pixel arrays, or disease susceptibility based on gene microarray outputs.

 

Deducing meaningful predictive relationships within data variables remains a challenging task. Choice of a particular model type and its configuration, along with implementation of appropriate data pre-processing and parameter estimation algorithms are all non-trivial tasks. Of crucial importance is the need to manage model complexity, to avoid the all-too-common phenomenon of "over-fitting", the principled avoidance of which is still an open problem. Ultimately, a specification or implementation error in any component in the predictive modelling chain can severely compromise the value of the final results. 

Expertise

A single-layer network There exists an extensive, and growing, collection of established modelling tools all aimed at solving prediction tasks such as those outlined above. At Vector Anomaly, we are conversant with the most effective of these tools, ranging from the long-established statistical estimation techniques, through to the very latest contemporary machine learning technologies. From simple linear regression analysis, on to the "neural network" models popular in the 1980's and 90's (which can still have value when applied diligently), and more recently to "Gaussian processes" and the "support vector machine".

 

In particular, we ourselves originated the concept of "sparse Bayesian" predicitve modelling, as well as the extremely popular "relevance vector machine". This invention was the product of a probabilistic philosophy to predictive modelling (which many tools crucially lack), allied with a belief in the use of Bayesian principles. These two tenets continue to underpin our modelling practices today.

Key Technology

  • Regression analysis of corrupted and noisy dataInterpolation, regression and smoothing: estimation of real-valued functional models.
  • Pattern recognition and classification: assignment of data into categories.
  • Utilisation of probabilistic techniques for more meaningful predictions (e.g. interpolated values with error bars, or recognition probabilities). Essential if predictive technology is to be reliably used for decision making, or combined with other system elements.
  • Exploitation of Bayesian inference techniques to obtain superior models in terms of accuracy, relevance, reliability, parsimony and efficiency. Bayesian techniques are particularly advantageous if data is in short supply or its dimensionality is high.
  • Tailoring of even the most advanced models for specific applications, to take account of non-standard error penalties, impose bespoke misclassification costs, manage distribution of false negatives/positives or compensate for irregular distribution of data.
  • Validation, model diagnostics and rejection options can be incorporated.
  • Expertise in a wide range of effective technologies including Bayesian inference, neural networks, Gaussian processes, relevance vector machines, radial basis functions, non-parametric methods.