homeconsultancytechnologycontact

 

 

 

Visualisation and Data Mining

Background

When undertaking the analysis of systems or phenomena of interest, there may often be a considerable quantity of relevant multivariate data available. However, making sense of this data in its raw form, and eliciting some preliminary interpretation, can be a perplexing task. This situation is particularly severe when the data is of high dimension (i.e. there are many variables associated with each "data point"). In this context, "many" can be as few as five, whereas in practice, the dimensionality of many data sets can easily exceed a thousand.

 

Visualisation, "data mining" and exploratory analysis techniques can be extremely valuable mechanisms in the initial stages of data analysis. In their own right, they can contribute informally to a preliminary understanding of the data structure, or may perhaps serve as a helpfully transparent adjunct to other more opaque analyses. In most cases, exploratory techniques and visualisation methods can be very sensible precursors to more advanced subsequent processing (e.g. predictive modelling).

 

sports performance analysis football coaching

Expertise

At Vector Anomaly, we are conversant with a wide range of the more effective methods for data mining and visualisation, including many recent developments in the field. Again, this is an area where we have previously made significant contributions in our own right with the development of topographic methods such as NeuroScale, innovation in hierarchical visualisation and in particular, the derivation of probabilistic PCA.

 

A nonlinear "Sammon" projection exploiting a metric derived from a mixture model.A schematic of the "NeuroScale" topographic visualisation system.A dendrogram derived from probabilistic models of hand-written digits.

Key Technology

  • Linear and non-linear projections of data using a wide range of general and specialised methods.
  • Static and dynamic visualisation techniques, manually and automatically generated.
  • Exploitation of probabilistic methods to improve and augment analysis (e.g. to infer missing data or to combine multiple projections).
  • Auto-enhancement for visual detection of variable dependencies or elucidation of clusters.
  • Bespoke approaches tailored for emphasis of desired characteristics. E.g. retaining variance of key variables, visually separating particular categories of interest, maintaining specified data metrics.
  • Expertise in a wide range of conventional and state-of-the-art technologies including latent variable projections, probabilistic PCA, topographic maps (GTM), GPLVM, discriminant analysis, projection pursuit, multidimensional scaling (MDS), NeuroScale.