Projects > Media Accuracy

How accurate is the news about policy? Is it more accurate in some policy areas than others? How does coverage (and misinformation) vary across news outlets? Do media consumers get the information they need to have informed policy preferences?

These are the questions motivating the data presented here. The goal is to disseminate measures of both the volume and accuracy of media coverage of public policy, across a broad range of topics and news sources. Our measures are based on computer-automated content analyses of millions of newspaper articles, television transcripts, and social media posts. Analyses focus on media coverage of government spending in five spending domains: defense, welfare, health, education and the environment.

Our approach to capturing media accuracy is relatively straightforward: we compare the “signal” in media coverage to actual budgetary policy change. When media coverage matches policy change, media accuracy is high. When media coverage does not match policy change, media accuracy is low. On the pages that follow, there are examples of both.

The data and resources on this page are drawn from Information and Democracy: Public Policy in the News, a book by myself and Christopher Wlezien (UT) published in 2022 by Cambridge University Press. We outline the ideas behind this project below, and discuss some of the advantages and disadvantages of our approach. But you can look at results immediately, tracing the accuracy of 17 major US newspapers and 6 television network over the past 20-30 years, by clicking on the following policy domains:

Defense | Welfare | Health | Education | Environment

Background

The rise of online and social media news has led to heightened concerns about the accuracy of media coverage. This has been especially true following the 2016 and 2020 Presidential elections. Concerns about the accuracy of media are not new, however – they have been around for as long as society has valued the need for informed citizens. Indeed, these concerns are evident as early as Plato’s Republic and in the Federalist Papers, and they continue in nearly every policy debate in the post-war era. Scholars, journalists, and citizens alike have questioned the accuracy of media coverage of the Gulf War, the War on Terror, the Affordable Care Act, Deferred Action for Childhood Arrivals, welfare policy, crime policy, Supreme Court decisions, and much, much more. That said, current anxieties about media accuracy are also growing beyond this historical context, fueled by a novel technological environment that facilitates targeted and user-disseminated content.

It is not all that surprising, then, that we find trust in media at an all-time low. This may be driven in part by polarization in the sources that partisans use to learn about national politics. But are media truly leading citizens away from the truth? Or does the highly politicized climate in the U.S. just make it feel as so? There is ample evidence that Americans respond, often sensibly, to government decisions and policy change. And there clearly are media sources that are more accurate, and those that are not, depending on topic or policy domain.

We have developed large-scale automated content-analytic methods allowing us to capture both the volume and accuracy of news coverage of policy. This approach is outlined in detail in Information and Democracy (Cambridge University Press), and this website provides an easily interpretable measures that distinguish more and less accurate media coverage, from the 1990s to the present, across a broad range of media outlets and policy areas.

Relying on content drawn from full-text news indices alongside trends in budgetary policy from the Office of Management and Budget (OMB), we have produced estimates of the volume and accuracy across dozens of newspapers and all major US television networks. We have done so for five policy domains: defense, welfare, health, education, and environmental policies. And this site provides a simple interface to explore the nature and quality of the news content Americans consume.

Our aim is to offer a straightforward resource for those interested in media accuracy, not focused only on recent political events but on rather actions taken over many decades by the federal government. (We do not consider actions taken by state and local governments, which may be of special relevance to newspaper coverage; see our Explainer for a discussion of this and number of other issues in the construction and interpretation of our measures.) We hope that these data will enhance our understanding of media coverage in both the past and present, facilitate further research on the subject, both substantive and methodological, and stimulate discussion amongst practitioners and audiences about the role that accurate media play in the successful functioning of representative democracy.

How do we measure accuracy?

A detailed account of our methodology is included in Information and Democracy. (Although we rescale our measures here, a process which we describe below.) Analyses including in Information and Democracy can be replicated using data archived at the Harvard Dataverse.

Data

Newspapers

Our full-text newspaper corpus is drawn from Lexis-Nexis using the Web Services Kit (WSK), which was available for the several years over which we gathered data for this project. We focused on a set of 17 major daily newspapers selected based on availability and circulation, with some consideration given to regional coverage. The list of papers on which our analyses are based is as follows: Arizona Republic, Arkansas Democrat-Gazette, Atlanta Journal-Constitution, Boston Globe, Chicago Tribune, Denver Post, Houston Chronicle, Los Angeles Times, Minneapolis Star-Tribune, New York Times, Orange County Register, Philadelphia Inquirer, Seattle Times, St. Louis Post-Dispatch, Tampa Bay Tribune, USA Today, and Washington Post. These are 17 of the highest-circulation newspapers in the US, three of which claim national audiences, and seven of which cover considerably large regions in the northeastern, southern, midwestern, and western parts of the country.

We begin our data-gathering in fiscal year (FY) 1980, but that at time only the New York Times and Washington Post are available. 1985 sees the entry of the Los Angeles Times, the Chicago Tribune and the Houston Chronicle and other papers enter the dataset through the later 1980s and 1990s. We have 16 newspapers by 1994, and the full set of 17 by 1999. We then have access to all papers up to the end of 2018. Information and Democracy considers all available data for all sources. Here, in order to make results more directly comparable, we consider all available data from FY1995 to FY 2018. (This means that all newspapers are considered over the same time period, except the Arizona Republic, which is available only from FY1999.)

We do not collect these newspapers in their entirety, but rather focus on content related to each of our five policy domains. We do so using a search in the Lexis-Nexis system that combines assigned subject codes and full-text keywords. (Search details are included in Information and Democracy.) Our searches were intended to be relatively broad – we used expansive searches, capturing some irrelevant content but also the vast majority of relevant content. We did this because, as we shall see, our focus is not on entire articles but rather on relevant sentences that we extract from this downloaded content.

Television

Our corpus of television news broadcasts also is extracted from Lexis-Nexis, again using the WSK. Television transcripts are stored in Lexis-Nexis in a somewhat different format than newspaper articles. In some cases, content is stored at the story level, like newspapers; in other cases, content is stored at the show level, i.e., there is a single transcript for an entire half-hour program. This makes a subject-focused search across networks rather complex: for the broadcast networks we extract just parts of a show, and for the cable networks we extract the entire show. Because we eventually focus on relevant sentences, however, our approach to television transcripts can be broad, as we can download all transcripts, story- or show-level, and extract relevant sentences afterwards.For the three major broadcasters, ABC, CBS and NBC, we download all available content from 1990 onwards in any morning or evening news broadcast or major “newsmagazine” program.

The cable news networks, CNN, MSNBC and Fox, do not have feature news programs, so we cannot quite get comparable programming from the cable networks. We download all available content, drop infrequent programs, and keep the major recurring programs.

Government Spending

Government spending is based on appropriations (spending commitments) in each of the five policy domains, as reported in the Historical Tables produced by the Office of Management and Budget (OMB). In some cases our domain-level spending corresponds exactly to major categories in OMB data; in other cases we make small (but straightforward) adjustments, as in Wlezien (2004) and and Soroka and Wlezien (2010).

Defense: “National Defense”

Welfare: “Income Security” excluding “General Retirement and Disability Insurance,” “Federal Employee Retirement and Disability,” and “Unemployment Compensation”

Health: “Health”

Education: “Education,” excluding “Training and Employment”

Environment: “Environment”

For more information see the Historical Tables, Budget of the United States Government.

Measures

Volume

Volume is the average number of sentences about spending in each policy domain per year. We rely on a hierarchical dictionary approach to extract sentences about spending in each policy domain. As noted above, this approach is outlined in more detail in Information and Democracy, as well as in several project papers including “Tracking the Coverage of Policy in Mass Media,” “Dictionaries, Supervised Learning, and Media Coverage of Public Policy,” and “Mass Media as a Source of Public Responsiveness.”

Accuracy

Starting with the same sentences extracted for the measure of Volume, we use a dictionary to code each sentence as indicating either upward change (scored as +1), downward change (scored as -1), or no change (scored as 0). We then take the sum of all of these codes, across all sentences, for each fiscal year, to produce a measure of the “media policy signal.” In years in which upward change sentences outnumber downward change sentences, the signal is positive. In years in which downward change sentences outnumber upward change sentences, the signal is negative.

Accuracy is then based on a model regressing this media signal on lagged changes in spending (at t-1), current changes in spending (at t), and future changes in spending (at t+1). This modeling decision is based on analyses in Information and Democracy suggesting that media coverage reflects, to varying degrees, spending in the previous, current, and future fiscal years — keeping in mind that decisions on the future (usually) are made in the current year.

We standardize all measures spending change by domain, and all media signals by domain and outlet; and we sum the coefficients for lagged, current, and future spending to arrive at our measure of accuracy. That measure captures the estimated impact of a standard-deviation change in each spending measure on standardized units of the media signal. This allows us to compare estimates across domains, media, and outlets, where the levels and variation in spending and media coverage differ dramatically. The value 0 indicates no impact of spending on media coverage, and thus inaccurate coverage. Positive values indicate correspondence between spending change and the media signal, and thus accuracy, where the larger the value the greater the accuracy. Negative values indicate that the media signal moves against, not with, spending change. This would of course be highly inaccurate coverage.

Our measure of accuracy has no upper or lower bound, but the observed range in our analyses is from roughly -1 to +4.5. In Information and Democracy, we consider these summed coefficients as they are. For the purposes of this site, we divide these scores by 5 to produce a more easily-understood index for which 0 is entirely inaccurate and +1 is highly accurate.

The value +1 on our Accuracy scale is purposefully a little higher than we observe in our current data (where our highest observed value is roughly 0.86). This allows for the possibility of higher accuracy scores calculated in future estimates.

(Note that in diagnostic analyses we examine these accuracy estimates alongside the R-squared for each model, which indicates the proportion of variance in media coverage that is “explained” by spending coefficients. R-squareds do not distinguish between positive and negative correlations, so they are of limited use as a measure of accuracy.)

How should we interpret measures of accuracy?

The one-sentence description of the measure is as follows: “accuracy” captures the degree to which media coverage of changes in spending correspond with actual changes in spending by the federal government in the previous, current and upcoming fiscal year. This is, we believe, a relatively straightforward working definition of accuracy.

There are however a good number of measurement-related decisions required to get to this kind of accuracy measure. And there are real advantages of — but also limitations to — this approach to measuring the accuracy of media coverage. We consider some of these below.

Federal Policy. Our policy measure focuses on federal policy change. This means that the version of “policy” to which we compare our media signal does not account for spending at the state or local level. If local- or state-level budgetary policy differs from federal policy, and if media coverage focuses on local- or state-level policy, we may observe inaccuracy that is more about the level of government that media focus on than it is about inaccurate reporting of federal policy change. We expect this to be a minor issue for national television broadcasters, and for newspapers with large national audiences as well, but it may be more impactful for more regionally-focused newspapers.

Spending Change. Our measures of policy and media coverage focus on changes in spending. This means that our account of accuracy does not consider policy changes that aren’t reflected in spending (although we do attempt to account for environmental regulation in our measures in Information and Democracy). Put differently, our measures allow us to look at the accuracy of coverage about budgetary policy change. In the large domains we consider here, this is much – but not all – policy.

Accuracy Over Time. Our calculation of accuracy captures average levels of accuracy over decades, not the accuracy of media sources on any specific single piece of legislation. This approach to accuracy does not consider, for instance, whether media accurately reported the changes implemented in Obamacare. It does however capture the degree to which, over several decades, media coverage reflected annual changes in health care spending by the federal.

Sources of (In)accuracy. Accuracy (and inaccuracy) in media coverage may be the consequence of independent reporting by individual media outlets, but there are other sources. A news outlet’s use of syndicated work by affiliated outlets, or the use of news agencies/wire services, may determine reporting accuracy, as might direct reporting of information provided by government agencies or elected officials . We cannot easily distinguish the initial sources of accurate or inaccurate information in media outlets. We can however capture the accuracy of information that finds its way into news coverage, regardless of its source. Insofar as one outlet is more accurate than another one, then, we can likely attribute that difference to some combination of journalistic and editorial decisions.

Social Media. We do not examine the accuracy of content on social media here. We offer a preliminary analysis of Facebook content in Information and Democracy. And a good amount of news coverage in social media comes from the “legacy” outlets we consider here.

Congruence vs Correspondence. Our approach to measuring accuracy focuses on the degree to which spending and media coverage of spending are correlated. This is what might be called correspondence between policy and coverage. This is not the same thing as what is referred to as congruence, the degree to which policy and coverage reflect the same level of spending. Put differently: our measure captures over-time parallelism between change in budgetary policy and the media policy signal, but it ignores whether actual levels of spending are accurately captured in media content. We do this for a number of substantive and methodological reasons, outlined in Information and Democracy.

Standardized Variables. Our measure of accuracy relies on standardized variables for spending and the media policy signal. This means that our measures of accuracy “factor out” differences in the volume of spending across domains, as well as differences in the volume of content across both policy domains and media outlets. This has some consequences for media outlets’ accuracy scores, of course. A media outlet scores better on our accuracy measure with a moderate amount of highly accurate coverage than with a large amount of only moderately accurate coverage. Our graphics show accuracy alongside volume, of course — so we can consider a combination of accuracy and volume there. But it is worth noting that our approach to accuracy scores means that they will on their own privilege accuracy over volume.

In sum, there are limitations to what we can show with a single measure of media accuracy. In this case, we have focused on all media content about federal budgetary policy in five major spending domains, over time. We regard this as a valuable measure; and as a useful starting point for further considerations of media accuracy as well — in other policy domains, focused on spending or otherwise, zeroing in on individual policies, considering social media and other news outlets, and in other countries as well.