What’s the difference between variability, uncertainty, and error? What should be displayed when presenting predictions and simulation results?
A quick cut, attempting a reply as pithy as the question itself:
Variability is something you observe across individuals or within individuals over time.
This informs your uncertainty, which is your subjective relationship to a decision ex ante.
Error is your objective assessment of your decision (action) ex post.
Know your audience. If you’re trying to convince modelers that you have the right model, displaying uncertainty of estimates on top of variability is fine. If you’re trying to influence a decision taken by non-modelers, either variability or uncertainty is appropriate (and not overwhelming). How to choose in that case? As an example, consider a population, typical estimate of X (5% RSE) and a between-subject variability of 45 %CV (23% RSE). Getting the prediction right at the individual level is strongly influence by the 45 % BSV, so show that.
If you have a data set that’s independent of the model’s estimates, that’s a reasonable moment to bring in error and bias calculations.
In stat’s terms: variability is a part of the characteristics of an response (or a variable) among a population. It is to measure how different of the values of this variable in this population. For the same individual, if a variable is measured multiple times, the variability reflects the change of the values over time for this subject. So it is also called within-subject variation. When observations are summarized statistically, variations should always presented numerically. There are different ways to present it: standard deviation, %CV and confidence intervals are the common ones and well accepted. There are many reasons to cause variations in a variable. The main objective of ANOVA (analysis of variance) is to figure out whether one or more factors/covariates are the sources for the variation for a variable.
The term uncertainty is often associated with likelihood of an event happening after a decision making. It is often measured using probability. If the probability of an event happening is 50%, then it is uncertain to me because the probability of it not happening is also 50%.
The term Error is often associated with the accuracy of an estimate relative to the true value. When a model is mis-specified for the data, the estimates of the parameters in the model can be very very précises (i.e., small standard errors) if the sample size is big. But the estimated values are usually far from the true values. I have seen this to be illustrated by shooting arrows at a target. When all the arrows hitting at the center of the target, it means the no error with high precision. When all the arrows are at the left side of the target, it means high precision but big error.
Thanks! These are 3 great answers. I’m not sure they are saying the same thing. This is fundamental to communication. Even more important, it is fundamental to describing the reliability of models. Is this something that is up to each modeler or are there best practices? If not, should there be?