When is a result worth noting? A quick thought on pharmacometrics and multiplicity

I was asked to get a conversation started between pharmacometricians and biostatisticians on multiplicity. Before I get started, this is one of those subjects that has subtlety, which can cause reasonable people to disagree.

Assumption: Whether from an experiment or an observational study, data analysis is performed in order to infer something about the universe from the summarization of data. What does it mean to infer? I looked up some synonyms of infer on Thesaurus.com and was given the following: ascertain, assume, construe, deduce, derive, figure out, glean, interpret, presume, presuppose, reckon, speculate, surmise, believe, collect, conjecture, draw, figure, gather, induce, intuit, judge, reason, suppose, think, understand, arrive at. If you notice that most of these words imply a little bit of uncertainty. To infer means to make a guess. To infer well is to make a good guess.

Statistical inference uses probability statements such as frequentist p-values and confidence intervals or Bayesian posterior probabilities, credible intervals, or Bayes factors to try and make a good guess. Sometimes the question of interest is something like “Do A and B differ?” But it does not have to be. It could be something like “Is there a strong relationship between C and D?” Or it could be, “Will this model make a good prediction about an untested patient’s drug concentration?” Whether or not we are using statistical inference techniques or not, if we are looking at data and make some sort of conclusion then there is always uncertainty with this conclusion.

It makes sense then, that the more questions I try to answer, the higher the likelihood that at least one of my answers is incorrect. This is multiplicity.

That is, if we are in the business of taking data and inferring something about the universe, then we are going to make incorrect inferences. We can try to control this (either formally or informally) by using appropriate statistical inference techniques. (What I mean by formally is through multiplicity adjustment techniques.)

How should a statistician, pharmacometrician, and/or clinician think about multiplicity? I think it depends on the situation, but the two most important things is 1) to be aware that it exists and 2) truly understand what your chosen statistical inference techniques actually mean. There are too many otherwise brilliant scientists that do not truly understand the inference technique that is being used and thus cannot properly weigh the evidence being presented. After you take stock of these two things, then what you do about multiplicity depends on the situation. Sometimes one can and should control the overall type 1 error for the multiple endpoints that one is interested. This, I would think, would be most useful when you are in the “confirm” phase (stealing from Sheiner). If you are in the “learn” phase there is less need for this, but do not assume that you still do not have to be concerned about multiplicity. It is everywhere.

I am truly interested in any of your thoughts on the subject. Feel free to shred me (I can take it), or expand on this if you would like.


Thank you, Brian. I remember a comment from my boss in the first popPK report I was working on 12 years ago: there are only 100 subjects but 30 covariates were evaluated. What is the overall type 1 error rate? Of course he was a statistician. I agree with your on the two things to consider. I would like to add one more to consider: the primary objective of the PopPK/PD analysis: estimation or hypothesis testing. Many times the project starts with estimation in mind, but by the time of writing the report, somehow it becomes hypothesis testing.

1 Like

Good points to raise Brian, and well stated.

As a Bayesian statistician I can get away with making glib statements about p-values, but I can’t ignore the multiplicity problem… It seems to me there are a couple of positions that define the ends of the spectrum:

  1. Scepticism - there’s no effect. I need to be very careful about finding something that’s not there.
  2. Pokemon - there are lots of effects to find in this data. I need to collect them all.

The Sceptic creeps forward in model building, preferring parsimony and sparse / simple / “defensible” models. The Pokemon modeller tests as many models and effects as possible relying on significance tests to drive towards the “best” model.

Of course, there’s probably a sweet spot in between. We need to weigh up whether effects we’ve found in the data are in some sense “reasonable”. Bayesians do this formally through prior distributions or Bayes Factors. Frequentists protect against spurious findings by adjusting alpha at each stage to ensure overall Type I error control (scepticism). Both should be looking at the effects in the final model and deciding whether they are predictive (would they hold up for a future set of data drawn under similar circumstances) and reasonable given the observed set of data (sample space).

Pragmatically, I think the Pokemon modeller needs to take a step back from time to time and assess whether what they have found is sensible and if their hunt for lowest objective function value is leading them into the wrong gym. Similarly, the Sceptic needs to come back to GEP Box’s premise that “All models are wrong, but some models are useful” and recognise when a good, descriptive model is being useful.

Ultimately I hope that together statisticians and modellers can keep each other honest, rather than taking adversarial stances. Hopefully that will prevent the Squirtle being thrown out with the bath water…



I love the fact that you have used Pokémon to make a point. 15 years ago when my oldest son was 5, he use to sit on my lap and play Pokémon; however, he could not read yet. So, I read for him. Thing is he would play the game when I was at work and would not save it and all of our progress was lost. This frustrated me so, that my wife bought me my own Pokémon game (the red version) and yes I caught them all. I am now doing my best at Pokémon Go. :slight_smile:

This has nothing to do with the issue. Just a memory from the past.

In my mind the bad stuff comes those that want to model, but just do not have a good feeling for probability. Bayes thinking, rather formally applied or informally implied is important. I have challenged scientists with why not include shoe size in a model. The reason is we know shoe size is not an important covariate, so why include it. An informal Bayesian thought goes like the following 1) we are pretty sure that there is not an effect using this covariate, 2) thus, if I do find an effect with that covariate, it is likely only a type 1 error. 3) So, if a positive effect on a covariate will likely be a type 1 error, why even exam it. Regardless of the inference system used, the believability of a result is predicated on past knowledge. Although an ardent hater of Bayesian approaches in statistics, RA Fisher in explaining p-values says the following

“If one in twenty does not seem high enough odds, we may, if we prefer it, draw the line at one in fifty (the 2 per cent point), or one in a hundred (the 1 per cent point). Personally, the writer prefers to set a low standard of significance at the 5 per cent point, and ignore entirely all results which fail to reach this level. A scientific fact should be regarded as experimentally established only if a properly designed experiment rarely fails to give this level of significance.”

Notice the last sentence where he says “rarely fails to give…” This means that Fisher is not convinced until he saw replication of results. Which means that he was informally weighing evidence as a Bayesian would.

OK, so I didn’t know what a Squirtle was and had to ask my son (a political science type, and apparently a gamer as well). He responded with the following.

One problem not addressed is the Ditto problem. Ditto is powerful because it can imitate other pokemon, and you dont know if you catch it until you maybe pick up one that would otherwise not be defensible to try to catch. That being said, while we shouldn’t throw the Squirtle out with the bathwater, we must also be careful of maybe missing big ditto events that remain otherwise unseen. Certainly this is more true in social sciences than in the hard sciences, but it is still worth considering.

So then I had to google Ditto. Most posts indicate an intense desire to find one. Others seem to think they are not so important or perhaps just a novelty and not genuinely useful.

It seems to be random. I’ve caught 6 so far. All from different places. Home in bed, rural gas station, state park and three downtown. I’d stop focusing on trying “hunt” for one. Just play as you normally do and I’m sure you’ll get one. It’s too stressful to worry about when there is no real way to track them.

5 so far. One outside my house, the others just around town whilst grinding my way through Pidgeys and Rattattas. You’ll get one eventually, and they’re a novelty rather than genuinely useful (I reported experiments with a decent lvl29 in one of the recent ditto posts a day or two ago). Good luck.

So are we all gamers at heart? What is it that they know and we’re still trying to figure out?

1 Like

One way to think of this as far as multiplicity is concerned (if you find a Ditto useless) is that look for Squirtles (interesting results that were not primary), but recognize that you may be fooled and only have a Ditto.