Blog

Silver & Selzer

November 3, 2024 9:06am in California


This post is in response to my friend Stan Veuger's tweet, which came as part of a discussion following my tweet about what I think is a weird combination of results in Nate Silver's model following the Selzer poll release yesterday. I wrote it originally intending it as a twitter thread, but it was too long. And now it's not nicely formatted, so you gotta really wanna read it. But here it is.

To explain in words, my criticism of Silver’s model is that it manages to spit out what I think is a sizable increase in the Iowa margin but with what he calls minimal increase in the Harris win probability. I find that odd. I thought that the following points are uncontroversial:

(a)   If Harris wins all three Blue Wall states, she’ll win the election.

(b)   The margins for the three Blue Wall states are highly correlated, such that if Harris wins any one of them, she is likely to win all three (ditto for Trump tbc).

(c)    Strong performance in Iowa would require Harris to perform well with constituencies that are present in the other Blue Wall states (esp WI & MI), making it more likely she would win those states.

I think that (a), (b), and (c) together imply that Silver’s result for the Harris Iowa win probability should lead to a non-minimal increase in the Harris overall election win probability. It has nothing at all to do with the tipping point stuff (tho, eg, if Harris won NV+ IA+PA+MI she would not need WI, so it’s not impossible for IA to be the tipper). Rather, it has to do with correlations between the Blue Wall states and Iowa discussed in (b) and (c).

People’s skeptical responses to me have involved some version of the following two claims:

(A)   There have been lots of polls of the Blue Wall states, so one Iowa poll can’t be expected to move the Blue Wall state distribution much.

(B)   There have been few Iowa polls, so one poll *can* move the Iowa distribution.

The problem with this response is that if the Iowa forecast is really high variance due to so few polls, then one poll *shouldn’t* move it much. Beyond that, though, if something happens that shifts your view about the probable *outcome* in Iowa by a lot, and if you think the Iowa outcome and the Blue Wall state outcomes are linked reasonably well, then you simply cannot get the Silver combination of results. The point here is that whatever high-variance issue is involved with the small number of Iowa polls should prevent you from finding much change IN IOWA FIRST. If you find a *big* change in Iowa, you’ve already decided that the event that caused it was a big deal relative to the Iowa variance at issue. People making the (A)+(B) argument are confusing states’ margin of victory—which is relative to variance—with states’ *probability* of victory, which is computed after taking account of any underlying variance terms.

Now for some algebra to illustrate, using a toy model. Tbc, I don’t claim this is Silver’s model. But I think the discussion below clarifies the points.

Denote the ultimate Wisconsin margin (ie, Harris vote share less Trump vote share in Wisconsin) as MW and denote the ultimate Iowa margin as MI.

Let XW and XIAA be some aggregation of polling margins for the Wisconsin and Iowa polls (eg XW/XIAA could be simple means across polls of relevant state’s margin in the polls, could be weighted by pollster quality metric, etc).

Let A and D be random variables, and let b, c, e, and f all be fixed scalars. Suppose

(1)   MW = A + bXW + cXIA

(2)   MIA = D + eXW + fXIA

i.e., the vector of ultimate Harris margins are an affine transformation of the Wisconsin and Iowa polling averages. Because polling averages can be observed, they can’t be the source of randomness in the model. In the simple model of (1) & (2), the randomness has to come from (A, D).

To be clear, A and D also can subsume polling information about other states, which I’ll treat as fixed for the purposes of this discussion anyway, so they don’t need to be centered on 0. I will, though, assume that (A, D) has a joint-normal distribution conditional on the values of XIA and XW. Because any margin is bounded in [-1, 1], this can’t be correct, but it’s just for ease of exposition—we never see results that extreme at the state level, and it would just add notation to account for truncation of what must be the far-out tails of the (A, D) distribution. Beyond that the normality assumption is for computational convenience; the general point will hold for any unimodal/inverse-U shaped distribution.

A critical point to understand is that, given the conditional normality of (A, D), its variance is unrelated to the variance in the Iowa or Wisconsin polling terms.

But people are right that with so many fewer polls in Iowa than Wisconsin, Iowa polling aggregates must be noisier than Wisconsin polling aggregates. Let’s address that point up front by insisting that XW and XIA be standardized. Let XW* and XIA* be the unstandardized versions of these aggregates, and let the coefficients on the version of (1) and (2) that would involve unstandardized values be b*, c*, e*, and f*. Now multiply and divide XW* by its standard deviation, so that XW*=SW(XW*/SW), where SW is the standard deviation of XW*, and similarly for Iowa. Now write b=SWb*, c=SIc*, e=SWe*, and f=SIf*. The result is the pair of equations in (1) and (2), but with standardized polling aggregates.

Now, Harris wins Iowa if and only if the event MIA>0 occurs. The probability of this event given the state polling averages is

(3)   P[MIA>0|XW=xw, XIA=xia] = P[D>(-exi - fxia)] = P[Z>(-exw - fxia)/SD],

where Z is a standard normal random variable and SD is the standard deviation of d. (The middle term in (3) is the centered version of MIA, and the last one is the standardized version.)

Silver reported yesterday that the probability in (3) rose from 0.09 to 0.17 after he included the Selzer poll in his model. Given the normal distribution assumption on D (joint normality implies marginal normality), a probability of 0.09 implies that the term zthreshold=(-exw - fxia)/SD initially would equal 1.34. A probability of 0.17 implies that zthreshold must have fallen to 0.95. Thus, Selzer’s poll caused the mean of the ultimate Iowa margin to shift right by 0.39 units of standard deviation (tbc, this is the sd *conditional* on the poll aggregates, i.e., the sd of MIA involved here is the sd of D).

My criticism of Silver’s model can be described as the combination of two things: first, it manages to spit out a sizable, 0.39-sd-unit increase in the Iowa margin, but second it does so with what he calls minimal increase in the Harris win probability. Let’s see what’s required for this minimal increase.

Given that we know that zthreshold=(-exw - fxia)/SD fell by 0.39, and also that e, f, xw, and SD were fixed, it must be the case that following the Selzer poll, Delta(xia) = (0.39/f)SD.

Now consider the win probability for Wisconsin. Repeating the analysis for Iowa, but now using equation (1) for the Wisconsin margin, we see that in general,

(4)   P[MW>0|XW=xw, XIA=xia] = P[A>(-bxw - cxia)] = P[Z>(-bxw - cxia)/SA],

where SA is the standard deviation of the random term A.

One thing we know about Wisconsin is that Silver’s model suggests it was essentially tied before the Selzer poll. Thus, the Harris win probability for Wisconsin was basically 0.5. This implies that zthresholdW=(-bxw - cxia)/SA was approximately 0 before the Selzer poll.

Now, we know from above that the effect of the Selzer polls was to cause Delta(xia)=(0.39/f)SD. To find the effect of this change on the Harris win probability for Wisconsin, we just plug that into zthresholdW.

So, we have

Delta(zthresholdW)=(-c/SA)*Delta(xia)=(-c/SA)*(0.39/f)SD.

This may be rewritten as

(5)   Delta(zthresholdW) = -0.39(c/f)(SD/SA).

Recall from above that we standardized the state polling aggregates, and that for Iowa we did so by writing (1) and (2) in terms of c=c*SI and f=f*SI, where the parameters with the asterisks are the underlying ones. The Iowa polling standard deviation drops out of equation (5) because (c/f)= (c*SI)/(f*SI)=c*/f*.

It follows that the effect of the Selzer poll update on the Wisconsin win threshold, zthresholdW, is unrelated to the standard deviation of the Iowa polling aggregate. What matters is only the ratio (c*/f*) and the ratio (SD/SA).

Suppose for a moment that SD=SA. This does not mean assuming that the Wisconsin and Iowa poll averages have equal precision, because the terms A and D are the non-polling random part of the state margins. Under that assumption, c*/f* is the whole ballgame.

This term is the ratio of the impact of Iowa polling on the expected Wisconsin margin, to the impact of Iowa polling on the expected Iowa margin. One would naturally expect the numerator term to be smaller, so r=c*/f* should be less than 1. The key question is how much less. With a value of r=0.5, the win probability for Wisconsin rises from the assumed baseline of 50% to 58% (matching the 8-point *Iowa* win probability). With a value of r=0.25, the Wisconsin win probability becomes 54%, ie, half the rise as observed for Iowa. Because we’re smack in the middle of the normal pdf when the margin is 0.5, the Wisconsin win probability is close to linear in r, as the ChatGPT-produced graph below shows.

Of course, the non-poll components A and D could have different standard deviations, and that could dampen the Wisconsin-Iowa win-probability relationship.

But the key point to understand is that the Wisconsin win probability can be insulated from the Iowa win probability in this situation only if (c*/f*)(SD/SA) is small. If that happens because c*/f* is small, then the model is building in an assumption that even after standardization to eliminate differences in precision, Iowa polls are relatively uninformative about Wisconsin results. If it happens because SD/SA is small, then the model would be building in the assumption that Iowa’s nonpolling component is more precise than Wisconsin’s. Neither seems like an assumption anyone would defend.

To conclude, my point is not that the Selzer poll is “right”, nor that it should have a big impact on any particular model as such. Rather, my point is that IF the Selzer poll has a big impact on your model’s predictions about IOWA, then it is hard to see how it could also have a “minimal” impact on your model’s predictions about the Blue Wall states, and therefore on Harris’s win probability, UNLESS you are assuming something that I think most people would doubt.

Comments About Attempts to Obtain Data from District Court AO12 Forms

May 10, 2021 

2:39pm Eastern time


(This post is drawn from an earlier version of my essay, "Free PACER". Due to length limits, I had to remove this material from the version of that essay that would ultimately be published. I have posted it here to provide this information publicly and also to have a citable source.)


A final example involves a data source I have not discussed previously in this essay—the AO12 form. Both the Constitution and federal statutory law require that federal jury pools be selected in a way that fairly represents the local community.[1] Federal law further requires the courts to collect data about the demographic composition of the jury “wheels” from which prospective jurors are selected; each district court records information on a form designated the AO12 form.[2] Professors Mary Rose, Raul Casarez, and Carmen Gutierrez recently “wrote to all federal district courts to ask them for their AO12 forms,” describing “our interest in understanding the racial and ethnic composition of modern U.S. federal jury pools.”[3] Here is how the courts responded:[4]

We heard back from 23 districts…. Fourteen districts…sent their AO12 forms (or they sent the internal reports used to produce the AO12 form, called “JS12” reports). Another nine areas responded but did not provide information: six wrote to inform us that we had read the law on “public inspection” incorrectly and that they did not have to provide the data….;[5] one required a court order, which we requested but to which we received no response; and two others stated that they had no forms in their possession. There was no particular geographical pattern to responses. For example, both the districts of Northern Illinois and Northern Indiana provided data, but the districts of Southern Illinois and Southern Indiana refused…. [The] form from Eastern Wisconsin had over two-thirds (69 – 77%) of the race and ethnicity data listed as missing…. [O]ther areas also sent AO12 forms that had missing data.

 

I regret to say that I had a similar experience after I responded to a particular district court’s request for help with its jury plan. Members of the Jury Committee of the court in question had noted that its AO12 data showed a notable discrepancy between Census data on the demographic composition of the district and the jury wheel. Members of the Committee asked whether, given my expertise with statistics, I could help the court identify a solution to this discrepancy, which I gladly agreed to do pro bono. I agreed to do so and then became interested in doing a national study of jury wheel representativeness. With the generous assistance of Penn Law’s Biddle Law Library and its research assistants, I sought to collect data on jury composition from districts around the country. The research assistants found that no district court posts its AO12 form publicly.

Our attempts to obtain them from various district courts ultimately yielded AO12 reports for one or more divisions from just 10 of the 94 districts, despite follow-up contacts made the following month. I was informed by multiple sources in a position to know that the AO circulated a memorandum informing Clerks that they had no obligation to provide me the data in question. And on June 23, I am informed, the Clerk of Court for the District of Delaware wrote to our team that their decision to refuse our data request came “After careful review and consideration of your request, and consultation with the [AO] Office of General Council [sic].”[6]

For obvious reasons, I abandoned my research project. I don’t know why some at the AO harbor such apparent hostility to public access for data about our jury system’s functioning. Maybe it’s out of fear that courts might be embarrassed by public awareness of jury pool non-representativeness.[7] But the solution to unrepresentative jury pools should be Jury Plan reforms—not aggressive data sequestration.[8]



[1] U.S. Const. amend. VI; Taylor v. Louisiana, 419 U.S. 522, 527 (1975); 28 U.S.C. § 1861 (“It is the policy of the United States that all litigants in Federal courts entitled to trial by jury shall have the right to grand and petit juries selected at random from a fair cross section of the community in the district or division wherein the court convenes”).

[2] Mary R. Rose, Raul S. Casarez, and Carmen M. Gutierrez, Jury Pool Underrepresentation in the Modern Era: Evidence from Federal Courts, 15 J. Empirical Leg. Stud. 378, 385 38(2018) (“federal law (28 U.S.C. § 1869(h)) requires federal courts to gather records about the racial, Hispanic origin, and gender profile of jury wheels—both the master and qualified wheels—recording them on so-called AO12 forms”).

[3] Id.

[4] Id.

[5] [Gelbach note: The letter cited 28 U.S.C. § 1868, according to which the courts are direct to preserve “all records and papers” related to the jury wheel for at least four years to allow “public inspection for the purpose of determining the validity of the selection of any jury.” 28 U.S.C. § 1868; Rose, Casarez, and Gutierrez, supra note 226 at 385.]

[6] Spreadsheet contents provided to author by University of Pennsylvania Law Student Zachary Manning; on file with author.

[7] Professors Rose, Casarez and Gutierrez report that their results using the data they could acquire “show that, by far, some amount of underrepresentation in jury pools is the norm”—reason to doubt the proposition that jury pools are fairly and reasonably composed. Rose, Casarez & Gutierrez, supra note 226 at 378.

[8] I gladly recognize the welcome contrast provided by the recent substantial efforts of the Eastern District of Pennsylvania, led by Chief Judge Juan R. Sánchez, in proactively designing and implementing reforms to improve jury pool representativeness. Chief Judge Juan R. Sánchez, A Plan of Our Own: The Eastern District of Pennsylvania’s Initiative to Increase Jury Diversity, 91 Temple L. Rev. Online 1 (2019).

Comments About Two 2015 Workshops on Federal Court Data 

May 5, 2021 5:00pm Eastern Time


(This post is drawn from an earlier version of my essay, "Free PACER". Due to length limits, I had to remove this material from the version of that essay that would ultimately be published. I have posted it here to provide this information publicly and also to have a citable source.)


In 2015 I attended two workshops about access to federal court data. The first was one I organized at Penn Law, which was funded by the National Science Foundation.[1] The workshop participants included many academics who work with federal data; staff researchers and/or attorneys from federal agencies including the Federal Judicial Center (FJC), the Administrative Office of the United States Courts (AO), the Administrative Conference of the United States; a representative from the National Center for State Courts;[2] and Michael Lissner, Executive Director of the Free Law Project.

There was broad but non-unanimous support for expanding public access to federal data. The only academic who expressed opposition was a scholar with experience working with the Judiciary. According to my notes, this scholar noted that “sometimes studying things in great detail doesn’t yield useful insights”.[3] Additionally, this scholar expressed doubt that non-FJC researchers would be likely to answer questions of policy interest any better than the FJC’s own researchers.[4] To be sure, the observation that studying things in great detail doesn’t always yield useful insights is unremarkable. On the other hand, logical thinking indicates that not studying things in detail never yields useful insights. And in my experience, it is also true that studying things in great detail sometimes does yield useful insights. That is, after all, why the FJC’s able researchers spend so much time and hard work on the studies they do conduct. At the same time, the FJC is a small agency, with a smaller set of staff who focus on research. At the workshop, we were told that there were no more than about ten FJC staff who do quantitative research using federal court data.[5] The notion that a handful of researchers is enough to answer all policy-relevant questions involving federal court data is risible. My conversations over the years with FJC staff members themselves indicate that they welcome outsiders’ involvement in quantitative research using federal court data. I regard FJC researchers as colleagues in the broad sense—colleagues with more data access but less freedom to formulate research agendas of their own choosing.


The AO generously sent three representatives to the Penn Law workshop. These representatives were very helpful in describing institutional and technical facts that were unfamiliar to many if not all the scholars present, and they were very open to the idea of creating a multi-court fee waiver exemption process (indeed, although I do not recall, it is entirely possible that the suggestion was their idea). The AO representatives were also skeptical about the economic, bureaucratic, and technological feasibility of opening up PACER. If the Judiciary took an alternative view, that could be expected to surface in a different attitude from AO representatives at a public event. The AO is the agency that administers the United States courts, and its employees’ job is to facilitate the courts’ activities. They can be expected to support and facilitate public access to data if the Judiciary supports that institutionally, and to oppose such access if the Judiciary does not support it.


That brings me to the second conference on federal court data in the Fall of 2015, the Federal Courts Civil Data Project Roundtable and hosted by the ABA Standing Committee on the American Judicial System and the Duke Law Center for Judicial Studies. This conference was held in Washington, D.C., and included several academics, practicing attorneys from various areas of the law, representatives from the FJC and the AO, and sitting judges from a variety of federal courts.


As I recall, there was broad support there as well for opening up access to federal court data, with two primary exceptions. One was the AO. One of the AO representatives explained his opposition to freer access by suggesting that it could endanger or otherwise lead to pressure on federal judges, because some of what they must properly do is controversial. Given the events of the last several years—such as the incursion into the U.S. Capitol on January 6, 2021, and the shooting of a district court judge’s family members[6]—I take very seriously the need for measures to keep judges safe from angry and/or misinformed members of the public. But it is unclear why scholarly access to federal court data would increase such dangers. Individual cases can already be accessed inexpensively, and the news media cover controversial cases anyway. What dangers exist most likely already exist. Still, if this concern is a binding one, the Judiciary could mitigate it substantially if not totally by making scholarly access to data contingent on approval and institutional affiliation as discussed above.


The other exception I remember to the broad support for liberalizing federal court data access was an Article III judge who spoke forcefully against it. This judge, whom I know from prior communications to be a devoted public servant, pointed to the example of a colleague on the bench who was the subject of a news about his crowded docket. Information about his docket had been made public due to the 1990 Civil Justice Reform Act, which mandated reporting of all motions pending longer than six months, and all cases pending longer than 3 years.[7] As I recall the story, what had happened was that when the subject of the article was appointed, his colleagues dumped their most complex and/or long-running cases on him, so that his docket was full of cases and motions that could not be expeditiously adjudicated despite the judge’s best efforts. According to the judge who spoke at the conference, the news story was both unfair and professionally embarrassing to his colleague. The judge at the conference expressed sincere and profound concern that the judiciary would be unduly pressured in this and other ways by the public availability of additional data that could be searched and filtered easily.


To be sure, there is some evidence that the Civil Justice Reform Act does distort how judges approach their dockets. In a fascinating recent study, Professors Miguel de Figueiredo, Alexandra Lahav and Peter Siegelman present evidence indicating that judges “close substantially more cases and decide more motions in the week immediately before [the CJRA six-month list] is compiled.”[8] Although they determined that average motion processing time was lower, by between 10-30 days, “duration is actually lengthened for some motions (those for which the deadline is least pressing)”.[9] Another study of the same issue, by Jonathan Petkun, uses a more rigorous statistical approach known as regression discontinuity design, together with a larger and better data set.[10] In line with de Figueiredo, Lahav and Siegelman, Petkun finds that motion and case dispositions speed up thanks to the timing of the six-month list; he also finds some different results. These studies provide evidence that publicity affects judicial behavior.


To an economist like me, who believes that people usually respond to incentives, these findings aren’t surprising. At the same time, Article III judges have substantial Constitutional protections precisely to allow them to buck public pressure and embarrassment in the service of judicial independence. Most obviously, they have life tenure, and their salaries cannot be reduced.[11] Their jobs carry considerable prestige, and no small amount of power; Chief Justice Stone once remarked, “the only protection against unwise decisions, and even judicial usurpation, is careful scrutiny of their action and fearless comment upon it.”[12] I oppose unfair treatment of judges as much as anyone else, but it should go without saying that unfair treatment is not the same as informed public scrutiny. Citizens should have confidence that the federal judiciary will not be deterred from doing its job with integrity by the availability of such scrutiny, and for this reason the Judiciary should welcome rather than avoid it.


[1] Increasing Access to Federal Court Data Workshop held at Penn Law in Fall 2015, NSF Grant No. 1551564, https://tinyurl.com/yxcnzgzd.

[2] I invited several federal judges, but unfortunately none was able to attend. I do not attribute this to anything other than the fact that the conference came together on something of a short timeline; many judges are likely to be booked far in advance.

[3] Jonah B. Gelbach, Final Report and Summary of Workshop on Increasing Access to Federal Court Data, NSF Grant No. 1551564 (on file with author) (also noting that as of 2015 there were roughly 2 million PACER user accounts). Other information about the grant, including the required Project Outcomes Report, may be viewed at https://tinyurl.com/yxcnzgzd

[4] Id.

[5] Further, we were told that even the FJC has only limited access to PACER data without bureaucratic approval. Id. at 11.

[6] See Nicole Hong, William K. Rashbaum and Mihir Zaveri, “‘Anti-Feminist’ Lawyer Is Suspect in Killing of Son of Federal Judge in N.J.,” July 20, 2020, https://www.nytimes.com/2020/07/20/nyregion/esther-salas.html (reporting the shooting death of the son, and serious wounding of the husband, of Judge Esther Salas of the United States District Court for the District of New Jersey.

[7] This requirement is now codified at 28 U.S.C. § 476.

[8] Miguel F. P. de Figueiredo, Alexandra D. Lahav & Peter Siegelman, The Six-Month List and the Unintended Consequences of Judicial Accountability, 105 Cornell L. Rev. 363, 364 (2020).

[9] Id.

[10] See Jonathan Petkun, Can (and Should) Judges Be Shamed? Evidence from the “Six-Month List”, March 2020 (available at https://jbpetkun.github.io/pages/working_papers/SixMonthList_WorkingDraft_20200327.pdf).

[11] Bankruptcy judges do not have such protections. However, the advent of the private service “AACER”—the Automatic Access to Court Electronic Records—has already made bankruptcy data available for public bulk searches. See https://www.aacer.com/. AACER charges for use, however, so it is not freely available to the public.

[12] Schultze, supra note 5 (quoting ALPHEUS THOMAS MASON, HARLAN FISKE STONE: PILLAR OF THE LAW 398 (1956)).