Stuart

Soroka

Professor
Communication Studies & Political Science
University of Michigan

Home

Research

Data

Software

Teaching

Links

 

 

 

Media Observatory: Methodology

[or return to Media Observatory:Home / Media Observatory:Research]

Much of our work at the Media Observatory involved the content analysis of news stories, in either newspapers or from television transcripts. In 2008, we began regularly using automated procedures; before then, we relied entirely on human coding. Papers and information on our automated content analytic software is available at lexicoder.com.

Brief descriptions of our approaches to manual and automated coding are as follow:

Manual Coding

Coding is performed by a team of coders, randomly assigned and rotated through newspapers during the study period (for instance, over the election campaign). The vast majority of codes are purely objective. We code, for instance, the first three parties mentioned in each article, in the order in which they are mentioned; we do the same for leaders, as well as for issues. We can track, then, the total number of times that parties, leaders or issues are mentioned in election stories. We can also track 'first mentions' - the number of times in which a given party, leader or issue is the first mentioned in election articles. We believe these 'first mentions' are a particularly valuable indication of the most prominent parties, leaders and issues at any given time during the campaign.

We also categorize each story as either (1) primarily horserace- or campaign-focused, or (2) primarily issue-focused. This gives us a sense for how much coverage is focused on polling results, or stories about the campaign trail, versus policy issues. In addition, we use a set of 'tone' codes - positive or negative - for parties and leaders. These codes necessarily involve more subjective judgment. To ensure as much reliability as possible, all Observatory coders go through practice coding sessions. In addition, newspapers are randomly selected for double-coding (separate coding by two different coders) during the campaign, and these codes are compared as a check on how consistent our codes are. The goal is of course to minimize the amount of subjectivity involved - to have each story consistently end up with exactly the same codes, no matter who is coding it.

Achieving this takes some time, of course; but we also employ a relatively well-defined set of decision rules for coding tone. The tone is based on the entire article - it captures the overall tone for parties or leaders based on the entire piece. And most importantly, the default 'tone' for all party and leader mentions is neutral, and a mention has to be very clearly positive or negative in order to be coded as such. This means that our coding may miss some of the more nuanced positive and negative mentioned in articles. The tone we do capture is relatively clear, however - readily apparent to all readers, and also relatively reliable coding-wise. And even as we miss some of the more subtle positive or negative tone in articles, our overall results should nevertheless capture the average tone for parties and leaders over the campaign.

Predictably, most party or leader mentions that are clearly positive or negative occur in editorial or opinion pieces; much front-section news is relatively neutral. "Net tone" is measured by (1) coding the tone (positive, negative, neutral) for every mention of a party or leader in the stories we code; (2) taking the % of party/leader mentions that are positive, and subtracting the % of party/leader mentions that are negative. The measure thus runs from -100% (all party/leader mentions are negative) to +100% (all party/leader mentions are positive). Many mentions are simply neutral, but the relative weight of positive versus negative mentions can be quite telling.

Automated Coding

Topics & Leader/Party Mentions:

For topics, we use a simple text-based search to identify articles mentioning a battery of topic keywords. If an article mentions keywords for a given topic more than once, we assume the article has at least some content dealing with that topic. Note that using this method, an article can address several different topics, We use a very similar procedure for leader and party mentions. We count not just the number of mentions of all major parties and leaders, but also the placement (counting characters from the beginning of the article) of each mention. This allows us to look at the first party or leader mentioned in an article, which we believe can captures some valuable information about the way in which the campaign is being framed – as a reaction to what one particular leader is saying, for instance.

Coding Tone:

The analysis of “net tone” is automated using a relatively simple, but we believe powerful, “bag-of-words” approach, weighing the frequency of positive and negative words in an article. That is, the tone of coverage – or the semantic polarity of a text – is determined by the relative frequency of a pre-defined list of sentiment-bearing words in a full-text analysis. Sentiment classification is much more challenging than identifying subjects, of course, since tone tends to be more dependent on the relation between words. Nevertheless, we have found a good degree of reliability, at least with election-period newspaper content, using a simple count of the number of positive and negative words in the same sentence as, for instance, party or leader names. Since the 1960s scholars have been developing lexicons in which words are labeled with affect in order to categorize the positive and negative connotations they carry. The best-known of these is the General Inquirer (GI) combined Harvard IV-4 psychosocial and Lasswell value dictionaries. (Go here for more information.) Our own lexicon of positive and negative words is based on a combination of GI categories and several other lexicons developed by psychologists and computational linguists. Though automated, the calculation of "net tone" is exactly as it was for the human-coded data in past elections. "Net tone" is measured as follows: ((# positive words co-occurring with leader/party names) - (# negative words co-occurring with leader/party names)) / total 3 words in sentences with any tone alongside leader/party names.

In principle, the measure runs from -100% (all party/leader mentions are negative) to +100% (all party/leader mentions are positive). Many mentions are simply neutral, however, and any sentence with tone tends to have just one or two "toned" words. Even so, the relative weight of positive versus negative mentions can be quite telling.