To Explain or To Predict?
Abstract
Statistical modelling is a powerful tool for developing and testing theories by way of causal explanation, prediction, and description. In information systems, as in many other disciplines, there is near-exclusive use of statistical modelling for causal explanation with the assumption that models with high explanatory power are inherently of high predictive power. Conflation between explanation and prediction is common, yet the distinction must be understood for progressing scientific knowledge. While this distinction has been recognized in the philosophy of science, the statistical and data mining literature lacks a thorough discussion of the many differences that arise in the process of modelling for an explanatory versus a predictive goal. The purpose of this talk is to clarify the distinction between explanatory and predictive modelling, to emphasize the importance of both for scientific research, and to reveal the practical implications of the distinction within each step of the modelling process.