ANALYZING TOPIC DIFFERENCES, WRITING QUALITY, AND RHETORICAL CONTEXT IN COLLEGE STUDENTS’ ESSAYS USING LINGUISTIC INQUIRY AND WORD COUNT (LIWC)

Machine methods for automatically analyzing text have been investigated for decades. Yet the availability and usability of these methods for classifying and scoring specialized essays in small samples–as is typical for ordinary coursework–remains unclear. In this paper we analyzed 156 essays submitted by students in a first-year college rhetoric course. Using cognitive and affective measures within Linguistic Inquiry and Word Count (LIWC), we tested whether machine analyses could i) distinguish among essay topics, ii) distinguish between high and low writing quality, and iii) identify differences due to changes in rhetorical context across writing assignments. The results showed positive results for all three tests. We consider ways that LIWC may benefit college instructors in assessing student compositions and in monitoring the effectiveness of the course curriculum. We also consider extensions of machine assessments for instructional applications.


Introduction
Machine analysis of text has a long history in computational and applied research. Methods of analysis among machine methods vary, with some systems constructing high-dimensional semantic matrices 1 (Latent Semantic Analysis: Landauer, Foltz, & Laham, 1998;HAL: Lund & Burgess, 1996) of correlated words. These methods compare the similarity of target documents to pre-constructed semantic matrices. Other methods use generative statistical algorithms to identify common topics among target documents (Latent Dirichlet Analysis: Blei, Ng, & Jordan, 2003). Yet others use pre-constructed dictionaries to quantify cognitive and emotional characteristics in target documents (Pennebaker, Boyd, Jordan, & Blackburn, 2015). Or, as an alternative to using pre-constructed dictionaries, some methods apply naïve Bayesian methods to construct dictionaries from the target essays themselves (Taraban et al., 2019). Overall, machine tools have potential value in the classroom. Instructors may use machine-assisted assessments to guide instruction, for example, or might use these tools to assess whether students are incorporating elements of the course into their written work. Practically speaking, to the extent that machine tools aid course assessment and grading, there is the potential benefit of freeing instructors' time for attention to other course demands.
The present study uses predefined dictionaries (i.e., word lists) to analyze differences among college student essays. The motivation for this study comes from the work of Pennebaker and King (1999) who proposed that "the way people talk about things reveals important information about them" (p. 1297). According to this thinking, Pennebaker et al. reasoned that it should be possible to construct lists of words that identify specific "beliefs, fears, thinking patterns, social relationships, and personalities" (Pennebaker et al., 2015, p. 1) that characterize individuals based on the words that they use. In order to test this thesis, Pennebaker and colleagues (Pennebaker et al., 2015) developed Linguistic Inquiry and Word Count (LIWC) 2 , which is a machine tool used to analyze the semantic content of documents, like essays, editorials, novels, and blog comments. LIWC is based on the analytic assumption that aspects of the semantic content of text can be reliably recovered through algorithmic methods. The influence of LIWC on text analysis has been broad, with applications in clinical (Lord, Sheng, Imel, Baer, & Atkins, 2015;Pennebaker, 2004), academic (Carroll, 2007;Pennebaker, Chung, Lavergne, & Beaver, 2014;Robinson, Navea, & Ickes, 2013), and financial (Robertson & Doig, 2010) domains, among others, and translations of LIWC into Catalan (Massó, Lambert, Penagos, & Saurí, 2013) and Dutch (Boot, Zijlstra, & Geenen, 2017;Van Wissen, & Boot, 2017), among other languages.
Highly selective lists that define categories are the core of the LIWC program. These lists were developed over the course of decades using extensive samples of texts (Tausczik & Pennebaker, 2010). The lists are referred to as "dictionaries" (Pennebaker et al., 2015). LIWC works by searching for terms that appear in these dictionaries, which represent both broad categories-like positive emotion, cognition, and biological processes-and specific categories-like anger, sad, family, and health. The LIWC program reports the percentage of words that fall into each of 125 categories. LIWC includes nearly 6,400 words or word stems (Pennebaker et al., 2015) that define these categories. LIWC researchers have used subsets of these variables to create four standardized composite scores based on previously published work. The composite scores are Analytic Thinking, Clout, Authentic, and Tone, and are defined as follows in the LIWC Manual (Pennebaker et al., 2015): Analytic Thinking -A high number reflects formal, logical, and hierarchical thinking; lower numbers reflect more informal, personal, here-and-now, and narrative thinking.
Clout -A high number suggests that the author is speaking from the perspective of high expertise and is confident; low Clout numbers suggest a more tentative, humble, even anxious style.
Authentic -A higher number is associated with a more honest, personal, and disclosing text; lower numbers suggest a more guarded, distanced form of discourse.
Tone -A high number is associated with a more positive, upbeat style; a low number reveals greater anxiety, sadness, or hostility. A number around 50 suggests either a lack of emotionality or different levels of ambivalence.
In summary, LIWC, as a machine application, can be understood within a textmining model. The dictionaries are used to extract data from text. The dictionaries are engineered to identify cognitive, affective, and other categories, which supports knowledge discovery. LIWC computes percentages of words and categories, and also composite scores, providing statistical analysis of the presence of words and categories in the text samples.
The purpose of the present study was to examine applications of LIWC in an instructional context. The four composite scores, which were standardized by Pennebaker and colleagues using previous studies, were chosen for the present study based on their appropriateness for small samples. The composite scores are scaled as percentile scores. The target documents were written by college students and were chosen as a convenience sample of data. The results from this study have practical significance, given the ever-growing size of class enrollments and the prospect that applications of LIWC could enhance the quality of instructor performance. The specific research questions are as follows: 1. Do the topics in student essays reliably evoke different patterns of LIWC categories?
2. Are LIWC categories positively associated with high-scoring student essays? 3. Can LIWC categories reliably detect shifts in rhetorical context across writing assignments? Rhetorical context includes the purpose that an author has in writing and includes consideration of the audience and the message one wants to convey-i.e., the what, why, and for whom the author is writing (Lunsford, 2016).
Although LIWC has been applied in academic settings, to our knowledge, the composite scores have not been applied to college essays to examine the present questions. Therefore the present study has the potential to make a valuable contribution to the literature.

Methods -Overview
The present study consists of analyses of essays written by students at the middle (Case Study 1) and end (Case Study 2) of an "Essentials of College Rhetoric" first-year writing course at a public research university in the southwest United States. The writing assignments were part of the normal course curriculum. Students received credit according to the course syllabus. The course instructor graded the student essays as a normal part of the course using the rubric provided to students. The instructors' grades were subsequently used in the present study to address research question number 2. The instructor evaluated the papers without prior knowledge of the current research.
In order to conduct the analyses reported here, student essays were input into the LIWC2015 1 software using a comma-separated format (.csv) in an Excel file. The categories selected for analysis via LIWC were the four composite categories (Analytic Thinking; Clout; Authentic; Tone). The output from LIWC was exported in an Excel file to carry out the statistical analyses described below.

Methods -Case Study 1
In Case Study 1, students were assigned a paper concerning rhetorical analysis. Students self-selected one of five topics (Food Culture, MSG, Organic Farming, Sexual Harassment, and Vocational Education). The writing prompt for the essay was as follows: Select a text from the options provided by your teacher and write a rhetorical analysis of this [text]. Using a variety of rhetorical terms and concepts, assess the effectiveness of the author's claims and overarching argument, as well as their various choices and strategies throughout the text.

Results and Discussion -Case Study 1
Eighty-three essays submitted by the full course enrollment were available for analysis. As only four essays addressed Vocational Education, these essays were eliminated from further analysis. The mean length of the remaining 79 essays was 1475 words (standard deviation = 321). In order to address the first research question about whether topics in student essays reliably evoke different patterns of LIWC categories, an analysis of mean percentile differences using the GLM procedure in IBM SPSS Version 24 https://www.ibm.com was conducted with Topic (between-subjects) and LIWC categories (within-subjects) as independent variables and percentile scores as the dependent variable. The interaction between these Topic and LIWC categories was the crucial effect in this analysis, as it would indicate that the mean percentile scores for LIWC categories varied depending on the specific topic that students were targeting in their essays. The results showed significant differences for Topic [F(3,75) = 12.09, p < .001], for LIWC Categories [F(3,225) = 602.92, p < .001], and for the LIWC by Topic interaction [F(9,225) = 8.39, p < .001]. These effects are summarized in Figure 1. The main effect for Topic in the statistical analyses shows that there are reliable differences in Topic scores across LIWC categories. The main effect for LIWC Categories shows that there are reliable differences in LIWC scores across essay topics. However, the significant interaction effect of Topic with LIWC Categories indicates that specific essay topics evoked different patterns of LIWC categories. In order to further examine these differences in LIWC patterns, simpleeffects tests were conducted using percentile scores separately for each LIWC category and with topic as the independent variable. As summarized in Table 1, Analytic Thinking was significantly stronger for Organic Farming compared to the other topics; Clout was significantly stronger for Food Culture and Sexual Harassment compared to MSG and Organic Farming. There were no significant differences for Authentic. Tone was significantly stronger for Food Culture and Organic Farming compared to MSG and Sexual Harassment. These patterns suggest that the topic of organic farming evoked significantly more analytic thinking from students compared to the other topics and that students were more confident discussing food culture and sexual harassment. The emotional tone was neutral when discussing food culture and organic farming, but significantly negative when discussing MSG and sexual harassment. The significant variation in LIWC categories depending on essay topic indicates that LIWC is able to capture cognitive and affective differences in student writing as a function of the specific topic that is being addressed. These differences could also be informative to an instructor, who could assess whether students were responding as intended to the course assignments, and to decide whether to retain or delete writing assignments depending on the analysis of student writing. In the present case, for instance, an instructor may want students to express strong analytic thinking regardless of topic, or to adopt a positive attitude in their discussion of all topics.
The second research question addressed whether LIWC categories are positively associated with high-scoring student essays, as graded by the course instructor. Because the assignment was explicitly about rhetorical analysis, we expected percentile scores for Analytic Thinking to be significantly correlated with instructor-assigned grades. It was also possible that other facets of the essay, like the confidence with which a student wrote, the level of disclosure, and the affective tone, could influence the instructor's grade. Nearly half (n=38) of the 79 essays received a grade of A. Therefore, to keep sample sizes similar, the grades were divided into two categories: A, which was the highest possible grade, and Lessthan-A. Point-biserial correlations were separately conducted for each LIWC category and grade and are summarized in Table 2. The correlation between Analytic thinking and Grade was positive and significant; the remaining correlations were not significant. The instructions to students regarding the composition of their essays emphasized an assessment of the author's claims, arguments, choices, and strategies. The significant correlation for Analytic Thinking indicates that the instructor was sensitive to the quality of the critical analysis that was evident in the essays. Additional composition instructions, like expressing an affective response to the author's position, could possibly show significant correlations with other LIWC categories, like Tone. A limitation of LIWC categories to correlate strongly with grades in the present analysis is that instructors typically take other factors into account in grading, like organization, complexity, coherence, employment of text analysis tools, and formatting. These factors are outside the scope of the LIWC categories.

Methods -Case Study 2
Participants in this case study were identical to those in Case Study 1. The assigned essay was the last writing assignment in the course. The prompt for the essay was as follows:

This essay is meant to make an argument that you've accomplished the goals of this course. Write a formal, academic argument essay in which you make clear claims about what you have achieved in this course in relation to the course goals. Support those claims with specific evidence from your work in the class and an explanation of that evidence.
Because this essay required students to reflect upon their own writing, it provided a promising contrast to the rhetorical analysis essay in Case Study 1 in which students were expected to be objective and largely detached from their topic of analysis.

Results and Discussion -Case Study 2
In order to address the third research question about whether LIWC categories reliably detect shifts in rhetorical context across writing assignments, it was necessary to match student essays across the two assignments. There were 77 cases in which students completed both essays. The mean length of rhetorical analysis papers was 1486 words (standard deviation = 337) and of self-assessment papers 1224 words (standard deviation = 312). The word count of the two essays correlated at 0.40 (p < .001, two-tailed), indicating that students were relatively consistent in the length of the essays they wrote. Forty-four percent of students received an A on the rhetorical analysis essay and 66% received an A on the self-assessment essay. The correlation between grades was 0.30 (p = .007), indicating that students were somewhat consistent in the grades they received on the two essays.
In order to examine shifts in LIWC categories across the two essays, an analysis of mean differences using the GLM procedure in IBM SPSS Version 24 https://www.ibm.com was conducted with Essay and LIWC categories (withinsubjects) as independent variables and percentile scores as the dependent variable. The crucial effect again was the interaction of Essay and LIWC categories because a significant interaction effect would show that the percentile scores for LIWC categories shifted significantly with the shift in rhetorical context. The results showed significant differences for Essay [F(1,76)   The significant Essay by LIWC interaction indicated that percentile differences for the rhetorical analysis essay versus the self-assessment essay were not the same across the LIWC categories. In order to examine the percentile differences, paired t-tests were conducted for each LIWC category, as summarized in Table 3. The difference in mean percentile scores was significant for each LIWC category, as shown by the significant p-values in Table 3. The direction of the significant shifts also varied, as shown in the Mean-Percentile-Difference column in Table 3 and visually in Figure 2. Students were somewhat more analytic when writing the rhetorical analysis essay, but decidedly more honest, personal, and disclosing (Authentic), positive and upbeat (Tone), and tentative, humble, and anxious (Clout)-based on the definitions in the introduction-in composing their selfassessment essays. Addressing the second research question about whether LIWC categories are positively associated with high-scoring student essays, correlation analyses for each LIWC variable with essay grade showed a significant correlation only for Analytic Thinking, as shown in Table 4, which replicated the correlation pattern for Case Study 1.

General Discussion
The results from the two case studies provided affirmative responses to the three primary research questions. Specifically, i) the topics in student essays reliably evoked different patterns of LIWC categories, ii) LIWC categories were positively associated with high-scoring student essays, and iii) LIWC categories reliably signaled shifts in rhetorical context across writing assignments. These results support the underlying premise in the work of Pennebaker and King (1999) that text and other communications convey more than the explicit message, and that important information can be gained from analyzing individuals' communications. The success of the present analyses provides sound encouragement to researchers and instructors to further the examination and development of machine tools that are applicable to classroom instruction and that could benefit student development of writing skills.
Thinking about how the LIWC program works may provide some insight into why some analyses may be more difficult than others for LIWC. Given that LIWC relies on an extensive repertoire of dictionaries, detecting cognitive and affective patterns across papers that address different topics is rather straightforward. Different topics draw on distinct ideas and issues, which should match more closely to some dictionaries than to others. Using LIWC to assess writing quality, however, is a bit more tenuous. Instructors use a multitude of factors to score essays, such as organization, flow, and clarity, which are beyond the scope of LIWC dictionaries. Detecting shifts in rhetorical context is similar in difficulty to distinguishing between different topics inasmuch as rhetorical shifts often involve shifts in topics.
Of the three tests, the detection of shifts in rhetorical contexts provided the strongest support for LIWC and the most incisive analyses. As depicted by Ted Major in Figure 3, rhetorical context can be understood in terms of four elements (Lunsford, 2016). At the center is the text-the essay, paper, blog, or tweet-that conveys the message. To convey the message effectively, there is an author who has full control of what is composed. The author needs a clear purpose and to be cognizant of the audience. These factors are important in the comparison of the essays on rhetorical analysis (Case Study 1) and those on selfassessment of learning and achievements in an end-of-the-course essay (Case Study 2). Figure 2 and Table 3 highlight shifts in cognitive and affective processing in students' compositions for the two writing assignments. Analytical Thinking differed significantly between the two essays. The essay prompts provided by the instructor required students to be analytical in both essays-and the students followed those instructions, but applied Analytic Thinking more explicitly when composing a critical analysis. The shift in rhetorical context (rhetorical analysis vs selfassessment) is strikingly clear in the shifts in the remaining LIWC categories. Regarding Clout, students were confident when conducting a critical analysis, but more humble and tentative when self-assessing. Regarding Authentic, students were guarded and distant in rhetorical analysis, but notably self-disclosing and personal in self-assessment. Regarding Tone, students were emotionally neutral for rhetorical analysis but emotionally upbeat and positive in considering course learning and achievement. The correlations between LIWC categories and essay grades suggested a strong sensitivity of the instructor to analytic thinking when grading both essays and perhaps less cognizance or reflection on the significant shifts in other factors, like Clout and Tone.
The present study clearly demonstrates the ability of LIWC to find patterns within and across student essays, but also raises questions about the relevance and value of LIWC to classroom instruction. Answers to these questions cannot be addressed well from a distance, as in the present study. Rather, when LIWC is implemented in a specific course, the implications of the LIWC results need to be assessed by the instructors themselves-by those making the assignments and assessing the students. Nevertheless, examples do come to mind. For instance, a correlation suggesting that the instructor is evaluating students largely on the basis of their positive assessments (Tone) of the course could signal the instructor to grade more holistically. As another example, instructors may want to develop autobiographical writing in students, in which case they could use the LIWC Authentic category to assess self-disclosure in students' writing. As a third example, LIWC outputs analyses for each of the 125 variables, including the composite variables used here, for each essay. Therefore, instructors can rank order students on any one, or combination, of those variables, for student assessment, curriculum assessment, or other purposes.
A limitation of the LIWC approach is the reliance on pre-defined dictionaries and categories for classification. Specifically, the dictionaries are constructed to identify and quantify specific categories. Although the LIWC categories may have general utility, classroom assignments are idiosyncratic to the course and instructor. Applicable to the current study, the dictionaries are susceptible to missing relevant categorical information in a target set of essays because the categories of interest may not be well represented by the LIWC categories. One way to counter this shortcoming is to take advantage of the option within LIWC to upload and apply specialized dictionaries. An instructor could, for instance, upload key terms and concepts from the course into a LIWC dictionary and then analyze students' essays against the standard and specialized LIWC dictionaries. Further, machine methods provide alternatives to LIWC's fixed dictionaries. Naïve Bayes methods, for example, provide for the construction of dictionaries tailored to specific texts (Taraban et al., 2019), like the rhetorical analysis essays in the current study. These dictionaries can stand alone or they could be integrated with LIWC dictionaries in order to increase the classification capabilities of LIWC.

Conclusions
The present studies demonstrate the capacity of LIWC to distinguish among student essays according to cognitive and affective variables and writing quality. LIWC also affords instructors the ability to rank order students with respect to their performance on any of the 125 variables that are output by LIWC. These, and other possible extensions of LIWC, provide instructors with the means to assess and reflect on their own performance and students' performance, and to use LIWC analyses to monitor and guide curriculum implementation and revision.
Machine tools currently provide new and exciting methods for instructors to more fully and effectively connect with students. These possibilities deserve researchers' and teachers' attention.