Legal Scales: An Empirical Methods Question

By Scott Burris

The most important topic we did NOT address in our PHLR methods book was valid methods for rating laws for characteristics like “stringency.”  I am not aware of any general work on this.  Nonetheless, it is not uncommon for researchers to create scales purporting to measure the distribution of some characteristic(s) over a group of laws. It seems often to be done by some facially plausible means (e.g., penalties) or through a Delphi or similar expert process.  For example, Woodruff and colleagues1 developed a stringency scale for tanning laws that distributes characteristics of laws (age covered, standards for eye protection) on 2-5 point scales based on a priori judgments.  Chriqui and colleagues used an expert advisory committee to rate the strength of clean indoor air laws.2 For alcohol control policies, Nelson et al used a Delphi method.3  As a field, however, I can’t see that we have much consensus on how to create and validate such scales. So to start a discussion, some thoughts on a basic typology of scales, with some possible measures and examples off the top of my head:


Approach Possible Measures Examples
Assessment based on apparent features of the legal text Magnitude of penalty Fine; imprisonment
Comprehensiveness of coverage of categories of actors engaged in the regulated activity Distracted driving: all drivers, novice drivers, bus drivers
Comprehensiveness of coverage of specific behaviors constituting or relating to the regulated activity Distracted driving: all device use; manual use; texting
Procedural efficiency Number of distinct steps required to enforce or comply with law
Legal assessments Consistency with other requirements (constitutionality or preemption, for example)
Qualitative assessments “Clarity” or “specificity” of rule
Assessment based on evidence, expert knowledge or prediction of the implementation of the law by legal agents Incentives for enforcement Resources, required reporting of enforcement actions/outcomes
Social marketing investment MADD social marketing against drunk driving
“Technical” feasibility of enforcement Consistency with mechanisms/methods already in use, cost, procedural complexity
“Social” feasibility of enforcement Normative consistency with current practices or values,* political constraints
“Legal” feasibility of enforcement Probability of legal challenge, procedural complexity
Qualitative assessments “Clarity” or “specificity” of rule as perceived by enforcers
Assessment based on evidence, expert knowledge or prediction of the reaction to the law of regulated parties Likelihood that regulated parties will learn of the law Social marketing, publicity, high enforcement levels
Consistency with general theories of compliance Deterrence, legitimacy, procedural fairness, expected utility of compliance
Social support for compliance Consistency of required behavior with current norms*
Feasibility of compliance Availability of technology,
Qualitative assessments “Clarity” or “specificity” of rule as perceived by regulated parties
Hybrid methods

* The normativity of the required behavior or enforcement mechanism would not be a stable measure, since we would expect passage and enforcement of the law to change norms over time (e.g., drunk driving)


Has my admittedly quick search for methods guidance on this missed some good sources? How does this rough typology and examples strike you?  I’d be very happy to get some comments and suggestions.


1.            Woodruff SI, Pichon LC, Hoerster KD, Forster JL, Gilmer T, Mayer JA. Measuring the stringency of states’ indoor tanning regulations: Instrument development and outcomes. Journal of the American Academy of Dermatology. 2007;56(5):774-780.

2.            Chriqui JF, Frosh M, Brownson RC, et al. Application of a rating system to state clean indoor air laws (USA). Tob Control. Mar 2002;11(1):26-34.

3.            Nelson TF, Xuan Z, Babor TF, et al. Efficacy and the Strength of Evidence of U.S. Alcohol Control Policies. American Journal of Preventive Medicine. 2013;45(1):19-28.


Temple University Center for Public Health Law Research

Based at the Temple University Beasley School of Law, the Center for Public Health Law Research supports the widespread adoption of scientific tools and methods for mapping and evaluating the impact of law on health. It works by developing and teaching public health law research and legal epidemiology methods (including legal mapping and policy surveillance); researching laws and policies that improve health, increase access to care, and create or remove barriers to health (e.g., laws or policies that create or remove inequity); and communicating and disseminating evidence to facilitate innovation.

5 thoughts to “Legal Scales: An Empirical Methods Question”

  1. This is a very useful typology of approaches to how one might code laws. I would just make the rudimentary comment about validation. For internal validation of any such scale, of course, an analyst would want to use multiple coders and test for interrater reliability. One particular concern here is that disagreements between coders are often resolved by the unblinded author, which can introduce bias.

    To assess external validity, one would want to somehow correlate the words in the law to some behavior on the ground. What may seem like an important difference to a law professor could be trivial in the real world. I could imagine experiments in which actual police officers were asked to respond to the different statutory texts, or where arrest records were queried in one jurisdiction versus another. These investigations could be interesting in themselves, but they could also be used to validate the analysts’ own coding schemes more broadly. (A similar methodology — investing hugely in a sub-sample — is used in survey research to check for response bias.)

  2. All of Scott’s approaches listed are what as a scientist I would call theory-based–they implicitly are based on concepts about (or in) the law and how it might work. There is a whole alternative approach, most refined in the field of psychometrics, where the latent constructs, and determining which individual measures are best combined (and weighted) to obtain a useful scale representing such constructs, are empirically determined by statistical analyses of how the various particular measures “hang together” or not. In practice, of course, good measurement/index/scale development is a combination of deductive (theory) and inductive (data) thinking/analyses. But the field of PHLR needs some folks to take the ideas in Nunnally’s classic book “Psychometric Theory” and adapt and apply those ideas to development of measures of law.

  3. In public health law, we tend to use theory-based scales without always explicitly stating as much. For example, measuring penalties presumes deterrence theory is at work (e.g., the greater the penalty, the “better” the law) and measuring absoluteness of the degree to which a law prevents exposure to some product ( food advertising) presumes the SEM model is at play. So, I agree with Alex. Perhaps in thinking about scales, Scott, you might expand your discussion from not only validating them, but testing them for reliability and generalizability. And, within validity, both internal and external validity need to be considered. Your discussion above focuses on external validity– how do we know if the scales we generate to measure public health laws really measure anything important at all? The way we can get at this is to propose a measurement scale and then test it, using statistical techniques that Alex describes. And, the external validity is important, but of course we also need to ensure we’re creating scales with internal validity and that are able to be replicated. In a study published in the Journal of Drug Issues on methamphetamine (found at:, my colleagues and I learned some things about our measurement model, designed through induction, only by testing it. This is yet another reason why public health law research calls for interdisciplinary teams, as you have pointed out often; it’s unusual for someone to have the skills both to do the legal analysis and the statistical analysis required for this kind of work.

  4. The discussion about stringency in public health laws recalls the debates on impaired driving laws and alcohol and drug regulation. Theory does not conform to actual circumstances. For example, consider the use of interlock devices and how a state might best draft the statutes to require and monitor the installation and use of interlock. My conclusion is that the language of the statute means little until one determines the actual level of enforcement of the statute. Different law enforcement agencies report differences in the level of enforcement of the interlock orders. Civil code (and administrative) enforcement appears to be less predictable than criminal code enforcement. Perhaps, an answer to Scott’s question about stringency depends up on whether one is examining actual or theoretical value.

  5. I agree with both Jean and Alex regarding the value of structural equation modelling (SEM). SEM helps test the strength of relationship between hard-to-quantify latent constructs and measurable (but not obviously related) variables. Visual models of suspected relationships can be built and easily tested with SEM. As Alex notes, SEM is a more scientific approach than simply ‘building your own index’. It also is much less expensive and less time-consuming than Delphi methods.
    Following up on Scott’s points, we might also look for measures of “strength of enforcement”. A law can have all of best characteristics but fail to improve outcomes if it is weakly enforced. Staffing, expenditures, and actual fines collected are concrete measures of the effort that is put behind enforcement. Linking resources to a specific law is not easy, but even a broad measure of the entire environmental health department budget should give a rough idea of the ability of that department to enforce laws and statutes.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.