One of the great challenges in systems biology is the coupling of mathematical models and experimental data. To construct a quantitative model one needs many steps in the iterative cycle "experiment -> data -> model -> experiment". For each new model, parameters that cannot be measured should be estimated using available experimental data. In the first few stages the model will be crude and parameter estimation is in general not critical. These models have limited predictive value, but can still be used to guide new experiments. However, systems biology now enters the stage that data becomes abundant and models complicated and are expected to give realistic quantitative predictions.

To fit a mathematical model to experimental data and design new experiments e.g. to discriminate between rivalry models is in itself a series of mathematical and computational challenges: (i) a priori parameter identification – prove that the parameters of the model can be identified if there were continuous and error-free data available for the experimental observables, (ii) the actual optimization procedure to minimize a chosen measure or fitness function with global, local or hybrid search methods, (iii) a posteriori parameter identification - the statistical analysis of the obtained parameters corresponding to the minimum, (iv) optimal experimental design. The current methods in systems biology for steps (ii)-(iv) are based on the Maximum Likelihood Estimation, i.e. the observations have a joint probability density function, and on the assumptions that the observations are independent and that the data contain normally distributed errors. In this case the maximum likelihood solution is the least squares solution (i.e. the measure in (ii) is the least squares sum).

However, in practice the assumption of independent, normally distributed observations is often not valid. The data used to fit the models may be gene-expression data (cDNA, SAGE, Affy), proteomics data (MS based), metabolomics data (NMR or MS based) or spectroscopic data (UV, NIR, Raman). The instruments used to generate these data have their characteristics resulting in heteroscedastic and colored instrumental error. The sampling process also contributes in a nonhomogenous way to the error distribution.

The focus of this workshop is to develop a more general strategy for steps (ii)-(iv) that take into account the influence of the experimental heteroscedastic error structure, where possibly the PDF is not even available in a closed form, and certain requirements for the models like robustness. Topics to be discussed during the workshop are: i) (design of) biological experiments, (ii) multivariate data analysis, (iii) parameter identification, (iv) model discrimination and experimental design.

The aim of this workshop is to bring together scientists working on the above subjects but with a different disciplinary background in statistics, biology, mathematics. The workshop will thus provide a forum to interact and make progress in the integrative approach.

×