When I moved to the University of Arizona, I quickly discovered something that I’ll politely call “heterogeneity” in Institutional Review Board’s (IRB) policies and practices. All of a sudden, some of the rather vanilla human subjects research practices I had been doing for years, with IRB approval, were now forbidden. Turns out that I had been exerting “undue influence” on human subjects all along by — wait for it — telling them how much I proposed to pay them. (Good thing there is no IRB jail for miscreants like me.)
I’ve since learned that there is a scholarly literature about the tendencies of IRBs to vary in how they decide the same cases. For example, Green and colleagues (2006) described their experience getting IRB approval for an observational study in 43 Department of Veterans Affairs medical centers. They explain:
The study was designed to be qualified under U.S. government regulations for expedited review. One site exempted it from review (although it did not qualify for exemption), 10 granted expedited review, 31 required full review, and one rejected it as being too risky to be permitted. Twenty-three required inapplicable sections in the consent form and five required HIPAA … consent from physicians although no health information was asked of them. … Twelve sites requested, and two insisted upon, provisions that directly increased the risk to participants. Seventy-six percent of sites required at least one resubmission, and 15 percent of sites required three or more (up to six) resubmissions.
How can such disparate results come out of IRBs all applying the same Federal common rule? And is that sort of variability justified by the 4,680 hours of staff time that Green reports spending navigating through this regulatory maze? As I explain below the fold, this incident is not isolated.
Similarly, in a 2003 JAMA article, McWilliams and colleagues described the IRB reviews of their multi-center study.
Evaluation of the risk of the same genetic epidemiology study by 31 IRBs ranged from minimal to high, resulting in 7 expedited reviews (23%) and 24 full reviews (77%). The number of consents required by the IRBs ranged from 1 to 4; 15 IRBs (48%) required 2 or more consents, while 10 (32%) did not require assent for children.
Additionally, check out Ravina and colleagues’ 2010 description of their efforts to get a phase III drug trial study approved at 52 sites. They report spending over $100,000 in staff time, primarily just to go back and forth with local IRBs wanting to add their own localized boilerplate to the consent forms.
For other research in this vein, see Silverman et al (2001), Mansbach et al (2007), and Vick et al (2005). It seems that the leading journal for each medical specialty has one of these articles. These multi-center trials provide gorgeous natural experiments with real stakes, where we can observe what multiple agencies do when confronted with the exact same proposal. This presents a gold-mine for scholars of administrative law. How better to study the exercise of discretion? I applaud scholars like Michelle Meyer, who recognize that IRBs are really regulatory agencies, and analyze them in that light.
I’m not aware of any literature that really gets to the root cause of this variation. There seems to be variation in local policies, such as implementation guidelines and boilerplate informed consent language. Indeed, on a particular policy question that I am now studying, I have found an entire gamut, with some schools forbidding precisely what others mandate (and everything in between), with practically nobody in the normative literature arguing for either of those polar extremes. This sort of heterogeneity may reflect the fact that for local implementing regulations, there is no single set of Model Rules that can serve as a focal point, like the ABA Model Rules of Professional Conduct that the states adopt with slight modifications for lawyers or the various laws proposed by the Uniform Law Commission. (To put it differently, we do have a Common Rule, but it is apparently not specific enough to preempt the need for local IRBs to make lots of policies of their own.) For low-level regulations, it seems like we sorely need a default rule that local IRBs can depart from only when they make a quite deliberate decision to do so.
There also seems to be variation in how the IRB staffers exercise their professional discretion about whether to exempt a study from further review, and related threshold questions. This heterogeneity may reflect the fact that there is no standardized career-path for IRB administrators (they range from ministers to nurses to student affairs professionals to lawyers), and that many line administrators have no training beyond a bachelor’s degree in a random field. It is perhaps not surprising that heterogeneity in backgrounds leads to heterogeneity in outcomes. (Maybe we need more lawyers in these jobs? One might suppose their training to be relevant to the making and application of policy, and for balancing things like the speech rights of investigators with the interests of subjects.)
Is this sort of variation a good thing? One could talk about the need for local IRBs to reflect local values, and invoke the “laboratory of democracy” idea from the theory of federalism. (Of course, to perpetuate the “laboratory” analogy, we would have to be able to see and understand what each one is doing, and evaluate whether it is working.) Should we think of each university as a little sovereign, reflecting localized concerns? Heimer and Petty have argued that, research “ethics are contextual and subject to social construction.” Are human subjects in Idaho more susceptible to risks than those in Arizona?
On the other hand, I’m reminded of the debates about prosecutorial discretion. Sixty years ago, Herbert Wechsler wrote that,
A society that holds, as we do, to belief in law, cannot regard with unconcern the fact that prosecuting agencies can exercise so large an influence on dispositions that involve the penal sanction, without reference to any norms but those that they may create for themselves . . . .
[To] a large extent we have, in this important sense, abandoned law.
In a paper called The Black Box Marc Miller and Ron Wright tend to disagree with the last sentence, arguing that beneath the apparent randomness there is actually order. Drawing on some really interesting datasets from local prosecutors’ offices, they argue that “the internal office policies and practices of thoughtful chief prosecutors can produce the predictable and consistent choices, respectful of statutory and doctrinal constraints, that lawyers expect from traditional legal regulation.” Somebody needs to do the same for IRB regulation — getting into the black box to figure out what, if anything, really explains all this variation in outcomes for identical cases. Are there in fact internal constraints?
I would love to hear from you in the comments, especially if you have observed or experienced variation in IRB policies or practices.
Hi Chris. Thanks for the shout-out. FWIW, my own hypothesis (which I note in The Heterogeneity Problem, though I don’t give it sustained attention) is that much heterogeneity in IRB decision-making reflects underlying heterogeneity in preferences regarding risks, expected benefits, and the reasonableness of tradeoffs between them of the kind that I discuss in The Heterogeneity Problem. People, including experts in research ethics, will and do disagree about whether paying subjects undermines the voluntariness of their consent or, on the other hand, is an appropriate way of avoiding exploitation (to touch on the example with which you opened your post). Although it may not always seem so to researchers, human beings comprise IRBs, and so long as human beings differ in their preferences for risk-taking, altruism, and so on, we shouldn’t be terribly surprised by heterogeneous IRB decisions. Even decisions that might seem more purely administrative (e.g., decisions about whether a study qualifies for expedited review or exempt status) involve decisions about the probability and magnitude of risk; my educated hunch is that variation in such decisions has at least as much to do with heterogeneity in IRB member preferences and experiences as with heterogeneity in IRB member education, training or resources.
Thanks for the comment, Michelle. I note that you spoke in terms of IRB “members,” while I focused on IRB administrators. Is that an intentional difference in emphasis? My sense is that a lot of the variation described in these articles is due to discretionary staff decisions. I also wonder if we could reduce the degree of heterogeneity by creating more precise rules (to reduce discretion) and by creating a more homogeneous pool of decision makers, through selection and training. It can’t all be “I know it when I see it.”
I will also mention that your comment is suggesting heterogeneity in individual preferences, not anything like heterogeneity in institutional preferences, of the sort that would justify a federalist-type approach.
I thought that Haidt’s book, The Righteous Mind, was a great choice at PRIM&R this year because I agree with Meyer’s hypothesis that there is heterogeneity in how to prioritize ethical principles. Laura Stark’s book also describes the way new members tend to conform to a Board’s conventions rather than the other way around. I feel that to be true in my experience as a human subjects protection scientist. (My own research/education was in brain and cognitive sciences, so I’m sympathetic to the interdisciplinary approach to these boards.)
I wonder whether there are local context effects at play, too. Do studies that seem “different” from those normally reviewed by your board get elevated? I’ve seen that happen in a medical IRB, where a lot of social/behavioral projects get increased scrutiny because the members and administrators are less familiar with the methods and because there aren’t local precedents established (which the Stark book discusses).
I would love to do some behavioral research on context effects on assessing risk, etc. Anyone want to collaborate?