The size and variety of machine-readable data sets have increased dramatically and the problem of “data explosion” has become apparent. On the other hand, recent developments in computing have provided the basic infrastructure for fast data access as well as many advanced computational methods for extracting information from large quantities of data. These developments have created a new range of problems and challenges for data analysts as well as new opportunities for intelligent systems in data analysis, and have led to the emergence of the field of Intelligent Data Analysis (IDA).
IDA is an interdisciplinary study concerned with the effective analysis of data, which draws the techniques from diverse fields including artificial intelligence, databases, high-performance computing, pattern recognition and statistics. These fields often complement each other, e.g. many statistical methods, particularly those for large data sets, rely on computation, but brute computing power is no substitute for statistical knowledge.
In response to the challenge of extracting useful information from large quantities of on-line data from emerging areas such as bioinformatics and systems biology, many interesting IDA systems and applications have been built and a better understanding of the IDA principles has been obtained over the last decade or so. However, there remain many important but difficult issues in the field.
The goal of this workshop is to bring together a number of researchers from statistics, machine learning, computer science, pattern recognition, bioinformatics, systems biology and other areas to discuss important issues in IDA, review current progress in the field, and identify those challenging and fruitful areas for further research. The workshop is intended to stimulate interaction between these different areas, so that more powerful tools can emerge for extracting knowledge from data and a better understanding is developed of the process of intelligent data analysis. In particular we would like to focus on the following key issues from both application and theoretical viewpoints:Synergy between disciplines: IDA lies in the interface between statistics and computing, so progress within these two fields will have important impacts on its development. A recent meeting of the Royal Statistical Society has asked a specific question “Is statistics being left behind by computing”. So have we really seen these two fields being out of synch? What are the key contributions from these two fields anyway? Strategies: There is a strategic aspect to data analysis, beyond the tactical choice of this or that test, visualisation or variable. Analysts often bring exogenous knowledge about data to bear when they decide how to analyze it. The question of how data analysis may be carried out effectively should lead us to having a close look not only at those individual components in the data analysis process, but also at the process as a whole, asking what would constitute a sensible analysis strategy. Data Quality: Real-world data are often noisy, incomplete, and inconsistent, and it is not always easy to handle these problems. Research on data quality has attracted a significant amount of attention from different communities and progress has been made, but further work is urgently needed to come up with practical and effective methods for managing different kinds of data quality problems in large databases. Novel methods: Recent developments in bioinformatics and systems biology have demanded new algorithms and solutions for many challenging issues. Examples include the analysis of very high dimensional but small sample microarray data, the integration of a variety of data for constructing biological networks and pathways, and the development of scalable distributed algorithms for extracting scientific knowledge from multiple databases and resources on the internet. In addition, the evaluation of how a system performs may need to go beyond the traditional statistical and computational methods, e.g. into biological validation or wet lab experiments.
Since this is an interdisciplinary meeting we believe that the initiation of fruitful interactions across different disciplines is the key to the successful running of the workshop. In this regard we will plan several ice-breaking activities such as arranging introductory tutorial-style presentations, aiming to familiarize researchers with concepts from the various fields; inviting data owners to describe challenging biomedical applications, particularly those related to high-throughput bioinformatics. Participants of the workshop will also be invited beforehand to tackle some challenging biological data analysis problems, e.g. in DNA or Protein microarray areas so that various approaches can be discussed and compared during the workshop.