Say Whaaat? The Sixth Circuit Debates “Corpus Linguistics” as a Tool for Statutory InterpretationA seemingly routine Sixth Circuit appeal involving the interpretation of the federal Employee Retirement Income Security Act statute (ERISA) recently sparked an interesting debate between two Circuit Judges — Amul Thapar and Jane Stranch — on the use of “corpus linguistics” in statutory interpretation. In Wilson v. Safelite Group, Inc., the court affirmed summary judgment for the defendant employer on federal preemption grounds. But, as explained below, the decision is noteworthy less for the result and Judge Stranch’s majority opinion than for the two separate concurrences from Judges Thapar and Stranch. In those concurrences, Judges Thapar and Stranch explore the use of “corpus linguistics” as an interpretive tool and perhaps have begun a jurisprudential discussion that will spread throughout the federal courts.

What is “corpus linguistics”?

Before discussing the three separate writings in Wilson, here’s a brief introduction to “corpus linguistics.” To paraphrase the description in Judge Thapar’s concurrence, the process begins with searchable online databases that contain millions of examples of word usage from a variety of “ordinary” sources — e.g., spoken words, works of fiction, magazines, newspapers, and academic works. Lawyers and judges can search these databases to find specific examples of how a given word or phrase was used at a given period in time. Lawyers and judges then can apply those examples of word usage to interpret the statutory text at issue.

For instance (as described by Judge Thapar), the U.S. Supreme Court’s decision in Smith v. United States, 508 U.S. 223 (1993), turned on the interpretative question whether exchanging a firearm for drugs qualified as “use” of that firearm for purposes of a certain statute criminalizing the “use” of a firearm “during and in relation to a … drug trafficking crime.” A search of one of the currently available databases — conducted many years after the decision — demonstrated that there were 159 instances in which the verb “use” was followed by a noun indicating a weapon (such as “gun”), and that there was not a single instance in which the word “use” meant exchange or barter in that context. Applying corpus linguistics, a lawyer would argue (and a judge would rule) that the statute should be interpreted consistent with those search results.

So, what role did the corpus-linguistics analysis play in Wilson?

Judge Stranch’s majority opinion

In Wilson, the plaintiff sued his former employer in federal court, asserting state law claims for breach of contract and negligent misrepresentation in connection with the defendant employer’s deferred compensation plan. The district court granted the defendant employer’s summary judgment motion, ruling that those state law claims were preempted by ERISA because in relevant part the deferred compensation plan met the statutory definition of an “employee pension benefit plan” under 29 U.S.C. § 1002(2).

According to Judge Stranch’s majority opinion, the “starting point is the language of the statute,” although the opinion added that the statutory language must be interpreted in light of the “structure, history, and purpose” of the statutory scheme — not in a “vacuum.”

The key was the interpretation of the word “results” and the phrase “for periods extending to the termination of covered employment or beyond” in ERISA’s § 1002(2). That section defines an “employee pension plan” as any “plan, fund, or program” that “results in a deferral of income by employees for periods extending to the termination of covered employment or beyond.”

Based on the “ordinary” dictionary definition of “results” and the text of other ERISA sections, the court rejected the plaintiff’s argument that, for purposes of § 1002(2), “results in a deferral of income” means the plan “[must require] a deferral of income.” The court reasoned that “results in” and “requires” are not synonymous.

Likewise, the court rejected the plaintiff’s argument that the phrase “for periods extending to the termination of covered employment or beyond” means an employee must defer income until termination. Based on the statutory language, the court ruled that the plaintiff’s reading would not give effect to the word “periods.” Instead, the court concluded that deferrals can occur for various “periods,” both before and after termination.

Judge Thapar’s concurrence

Concurring in part and in the judgment, Judge Thapar wrote that he agreed with the majority’s textual analysis, and that the text of ERISA’s § 1002(2) is clear, as many “tried-and-true tools of interpretation confirm.” Judge Thapar continued that corpus linguistics also confirms the result, and urged that courts should “consider adding this tool to their belts.”

Judge Thapar reasoned that words in a statute often have many “permissible” meanings, and it is the role of the courts to construe those words according to their “ordinary” meaning at the time Congress enacted the statute. According to Judge Thapar, corpus linguistics is “one tool” of statutory interpretation, but not the “whole toolbox.” For example, he noted that the majority opinion relied on a dictionary definition of “results,” and stated that corpus linguistics may be most valuable as a “cross-check” on other interpretative tools, particularly in those “difficult cases where statutes split and dictionaries diverge.”

With respect to the interpretive question at issue in Wilson, Judge Thapar wrote that corpus linguistics “confirms” the majority’s textual analysis. Database searching for the time period immediately before and during ERISA’s enactment overwhelmingly refuted the plaintiff’s suggested reading of § 1002(2); there was not a single database result that supported the plaintiff’s interpretation of “results,” and only one example that arguably could have been read to support his interpretation of “extending to.”

Otherwise, much of Judge Thapar’s concurrence addressed the “concerns” raised by Judge Stranch in her separate concurrence, discussed below.

Judge Thapar concluded that, in Wilson, corpus linguistics served as a “method to check” the panel’s “work,” but that in a case where the “ordinary” meaning of the words in a statute is “debatable” the analysis could be dispositive — corpus linguistics “can help courts as they roll up their sleeves and grapple with a term’s ordinary meaning.”

Judge Stranch’s concurrence (and Judge Thapar’s “rebuttal”)

As noted above, Judge Stranch wrote separately to “express some concerns” about Judge Thapar’s “endorsement” of corpus linguistics. The first concern was a “practical problem”: How is a judge to “make sense” of dozens, hundreds, or thousands of database examples of a term’s usage? How should a judge determine which results are relevant and which are irrelevant? Should a judge simply take the most frequently used meaning as the “ordinary” meaning? Illustratively, Judge Stranch asked whether it matters — for purposes of interpreting the word “results” in ERISA’s § 1002(2) — how that term was used in a “book about farm animal management in 1976,” or in an “article from Sports Illustrated about New York’s cool spring weather in 1964.”

Next, Judge Stranch reasoned that the use of corpus linguistics is a “difficult and complex exercise” that should be left to “trained lexicographers” — i.e., those who author dictionaries, which already serve as a frequent interpretative tool — or to “qualified experts” such as professors of applied linguistics.

Judge Thapar’s concurrence included his “rebuttal” to Judge Stranch’s concerns. Briefly, on the question of determining relevant results, Judge Thapar responded that the “entire practice of law” and “certainly the practice of interpretation” involve “judgment calls about whether a particular source is relevant.” Similarly, on the question whether the most frequently used meaning should be considered the “ordinary” meaning, Judge Thapar said that judges “still will need to exercise judgment” — “sometimes the most frequent use of a word will line up with its ordinary meaning,” but “sometimes it will not.” And, with regard to dictionaries, Judge Thapar argued that the usage examples in a dictionary often come from a time before the dictionary is published, and that corpus linguistics offers a “broader picture” of how words were used at the time Congress passed a given statute.

Ultimately, Judge Stranch conceded that she was not suggesting corpus linguistics never could assist judges in the “difficult project” of statutory interpretation. But, she warned, corpus linguistics “brings us no closer to an objective method” of statutory interpretation. Instead, Judge Stranch would continue to focus statutory interpretation on “historic and common-sense considerations,” including the “text, structure, history, and purpose” of the statute at issue.

The implications of Wilson — the genesis of corpus linguistics in federal court?

At this point, the debate summarized above gives us more questions — and discussion topics — than answers. For starters, one only can guess at what role corpus linguistics will come to play in federal court. As of this writing, a Westlaw search demonstrates that, since the Wilson decision issued two months ago, not a single federal court decision even has cited Wilson. Likewise, before Wilson, it appears that no federal court had used corpus linguistics in statutory interpretation, at least in a published opinion. And in state courts nationwide, both before and after Wilson, corpus linguistics has seen only limited action and application (mostly in Utah, where at least one of the databases is compiled).

However, Wilson already has been cited in at least three amicus briefs, including in the high-profile case New York State Rifle & Pistol Association, Inc. v. The City of New York that currently is pending in the Supreme Court. In addition, a recent Third Circuit opinion — written by Circuit Judge Thomas Hardiman — included a brief corpus-linguistics analysis (see Caesars Entm’t Corp. v. International Union of Operating Engineers Local 68 Pension Fund, 932 F.3d 91 (3d Cir. 2019)). And if, as Supreme Court Justice Elena Kagan remarked in a 2015 lecture, “we’re all textualists now,” then it may be that this interpretative “tool” will become a fixture in our “toolbox.”

As a technical matter, is corpus linguistics simply a tool to discern the “ordinary” meaning of plain and unambiguous language in a statute, as the panel in Wilson analyzed ERISA’s § 1002(2)? Or, is the corpus-linguistics analysis better suited to resolving ambiguity where statutory terms are reasonably susceptible to more than one meaning? That distinction may not matter; the analysis of a given statutory term based on the results in a corpus-linguistics database presumably wouldn’t differ based on a threshold determination regarding the term’s clarity or ambiguity. That said, an additional interpretive tool may be useful more when a court is faced with an ambiguous statute or statutory term than as a “cross-check” to confirm the meaning of plain language.

Regardless, practically speaking, the real question is whether the use of corpus linguistics in statutory interpretation will increase over time — in the Sixth Circuit and other federal courts. Going forward, only time will tell how prominent a role the corpus-linguistics analysis will have generally or in any given case.