Are Presidential Debates Becoming More Toxic?

Brief Report   July 27, 2024 by Clint McKenna

After the first presidential debate of 2024, some have noted the stark lack of civility in how the candidates speak to each other. Popular media outlets also noted this trait during the 2020 debates. 1, Carlisle, M. (2020, September 30). Interruptions and Insults: All About the Very Uncivil Tone of the First Presidential Debate. TIME. https://time.com/5894565/interruptions-insults-presidential-debate/ 2 Demsas, J. (2020, October 23). Donald Trump "only" interrupted 34 times in tonight’s debate. Vox. https://www.vox.com/2020/10/23/21529607/biden-trump-debate-won-interrupt-kristen-welker-presidential This is indicative of the broader problem of affective polarization, where people will view outgroup party members with more emotional dislike, mistrust, and hostility. 3 Iyengar, S., Lelkes, Y., Levendusky, M., Malhotra, N., & Westwood, S. J. (2019). The Origins and Consequences of Affective Polarization in the United States. Annual Review of Political Science, 22(1), 129–146. https://doi.org/10.1146/annurev-polisci-051117-073034 Have the debates actually become more toxic over time? This report examines the language used in the presential debates over the last 20 years using Google's Perspective API, a machine learning text model developed to quantify toxic conversation online.

Toxicity Over Time by Party

To begin, we look at differences over time by political party, starting with 2004. We first scored each of the statements made at the presidential debates and grouped them by each party's affiliate during a given year. A single statement's toxicity rate should be interpreted as the probability that the statement contains toxic language. A greater value does not mean that the language is necessarily more toxic in severity when compared to a statement with lower values.

Bar graph depicting the rates of toxicity in presidential debates since 2004. Rates between Republican and Democrat speakers are roughly similar but slightly higher among republicans and increasing over time for all.

The averages presented here represent the averages for each party's statements for each given year. As one can see, the first debate in 2024 (.194) has a noticeably higher rate of toxic language when compared to 2020 (.076), and more than twice the amount of any year prior. Overall, the differences between parties were roughly similar, with Republican speakers (.085) scoring slightly higher than Democrat speakers (.074).

One downside of this method is that some comments may be miscategorized as toxic in some instances, like if a speaker is quoting another person's insult or if they are making a off-hand comment. For instance, this moment is rated as 40% on toxicity liklihood because George W. Bush says "put a head fake on us," making a joke about where the speaker appeared in the town hall.

Toxicity by Policy

Are some policies more prone to toxicity than others? We were interested in probing further, as some statements about current events may inherently use more aggressive language than statements at other time points. For instance, lots of post-9/11 foreign policy discussion concerned more language about violence. There was more focus on the pandemic in 2020, and so on. Some statements may also contain harsher language for other reasons. For example, consider the following two quotes, both opposing abortion:

But that does not mean that we will cease to protect the rights of the unborn. Of course, we have to come together. Of course, we have to work together, and, of course, it’s vital that we do so and help these young women who are facing such a difficult decision, with a compassion, that we'll help them with the adoptive services, with the courage to bring that child into this world and we’ll help take care of it. John McCain (2008)
So that means he can take the life of the baby in the ninth month and even after birth, because some states, Democrat-run, take it after birth. Again, the governor – former governor of Virginia: put the baby down, then we decide what to do with it. So he’s in – he's willing to, as we say, rip the baby out of the womb in the ninth month and kill the baby. Nobody wants that to happen. Democrat or Republican, nobody wants it to happen. Donald Trump (2024)

Both concern the same issue, but Trump's quote (toxicity score = .361) is quite a bit more visceral than McCain's quote (taken from a longer statement, toxicity score = .114). Unfortunately, many statements made in presidential debates do not cleanly fall into one policy category, so it is difficult to quantify differences over the course of time. Nevertheless, taking more lengthy statements that did not contain multiple policy issues or personal attacks, we looked next at toxicity by the most common policy category within a given year.

Grouped bar chart of the three most discussed policy issue in the debates by year. All policies for each year were Economy, Foreign Policy, Healthcare, or Social Issues.

Among these more substantial statements, both foreign policy and social issues increased each year. Healthcare in 2020 (mostly COVID-related) and the economy in 2024 were also more toxic than any year prior.

What about positive language?

Aside from toxicity, we were interested in looking at differences in positive language over time. The Perspective API offers some experimental categorization of bridging attributes that attempt to capture language that tries to decrease divisiveness or conflict. 4 Ovadya, A., & Thorburn, L. (2023, October 26). Bridging Systems: Open problems for countering destructive divisiveness across ranking, recommenders, and governance. http://knightcolumbia.org/content/bridging-systems We begin by depicting the average scores for each of these bridging attributes over time.

Positive bridging attribute averages over time: Affinity, Compassion, Curiosity, Nuance, Personal Story, Reasoning, and Respect. The averages for each value .

Most of the attributes decrease from the previous year. There is an interesting quirk where 2024 had higher values of these bridging attribute scores, despite also being the most toxic year so far. Why is this so? To investigate the robustness of these experimental measures, here is a visualization showing the distribution of scores on these bridging attributes over time.

Positive bridging attributes individually displayed as a distribution of all scores, over time. The distribution of each attribute mostly do not follow normal distributions and do not have discernable pattern over time.

As you can see, some of the attributes do not have scores that have normal patterns year-to-year. Nuance and reasoning have bimodal distributions most years, for instance. While instances of positive attributes have decreased over the years, the sharp drops in 2016 and 2020 may have more to do with the nature of the debate style in these years. Trump-Clinton and Trump-Biden debates had more cases of interruptions, crosstalk, and short bursts of dialogue when compared to previous years. It should also be noted that these Perspective API measures were likely intended to be scaled to thousands of online comments, while we are working with the limited sample of only general election presidential debates.

Conclusion

Natural language processing tools can lend some insight into how presidential debates have grown to contain more toxic language over time. The extent to which this affects the average voter is unclear. It could be the case that affective polarization is leading to support for candidates who value antagonistic speech patterns, or perhaps downstream effects of elite cues leading to more polarization among the population. 5 Zaller, J. R. (1992). The nature and origins of mass opinion (pp. xiii, 367). Cambridge University Press. https://doi.org/10.1017/CBO9780511818691 As debates continue to play a crucial role in shaping public perception and discourse, addressing growing incivility is essential for fostering a healthier democratic process and promoting constructive political dialogue.


Political reports by California Social Labs should not be interpreted to endorse or support any particular political group, candidate, or legislation.
Methodology

Data and code are available at https://github.com/clintmckenna/debate-toxicity Participants. Data only examined presidential candidates from 2004 to 2024: George W. Bush (2004), John Kerry (2004), Barack Obama (2008, 2012), John McCain (2008), Mitt Romney (2012), Hilary Clinton (2016), Donald Trump (2016, 2020, 2024), and Joe Biden (2020, 2024). Measures. Perspective API model attributes were used as the outcome variable of interest in all reported findings. The desription of all attributes can be seen here. Statements made in debates were gathered and the average of these attribute ratings were presented (e.g. average toxicity score of each party's candidate per year). For the policy statements, we only included substantially lengthy statements, including only statements that were longer than 240 characters. Afterwards, we manually categorized the statements as best belonging to a particular policy, discarding statements that spanned multiple policies, contained personal critiques, or were generally incoherent. As some statement categorization might be contestable, we include the coded text csv in the repository linked above for readers to examine. No statistical models are presented and this report should be seen as merely descriptive in nature.

References

1 Carlisle, M. (2020, September 30). Interruptions and Insults: All About the Very Uncivil Tone of the First Presidential Debate. TIME. https://time.com/5894565/interruptions-insults-presidential-debate/ 2 Demsas, J. (2020, October 23). Donald Trump "only" interrupted 34 times in tonight’s debate. Vox. https://www.vox.com/2020/10/23/21529607/biden-trump-debate-won-interrupt-kristen-welker-presidential 3 Iyengar, S., Lelkes, Y., Levendusky, M., Malhotra, N., & Westwood, S. J. (2019). The Origins and Consequences of Affective Polarization in the United States. Annual Review of Political Science, 22(1), 129–146. https://doi.org/10.1146/annurev-polisci-051117-073034 4 Ovadya, A., & Thorburn, L. (2023, October 26). Bridging Systems: Open problems for countering destructive divisiveness across ranking, recommenders, and governance. http://knightcolumbia.org/content/bridging-systems 5 Zaller, J. R. (1992). The nature and origins of mass opinion (pp. xiii, 367). Cambridge University Press. https://doi.org/10.1017/CBO9780511818691