Creating and Testing Specialized Dictionaries for Text Analysis

Authors

  • Roman Taraban Texas Tech University, USA
  • Jessica Pittman Texas Tech University, USA
  • Taleen Nalabandian Texas Tech University, USA
  • Winson Fu Zun Yang Texas Tech University, USA
  • William M. Marcy Texas Tech University, Lubbock, USA
  • Srivinasa Murthy Gunturu Tata Consultancy Services, Chennai, India

DOI:

https://doi.org/10.29038/eejpl.2019.6.1.rta

Keywords:

text analysis, machine learning, LIWC, naïve Bayes.

Abstract

Practitioners in many domains–e.g., clinical psychologists, college instructors, researchers–collect written responses from clients. A well-developed method that has been applied to texts from sources like these is the computer application Linguistic Inquiry and Word Count (LIWC). LIWC uses the words in texts as cues to a person’s thought processes, emotional states, intentions, and motivations. In the present study, we adopt analytic principles from LIWC and develop and test an alternative method of text analysis using naïve Bayes methods. We further show how output from the naïve Bayes analysis can be used for mark up of student work in order to provide immediate, constructive feedback to students and instructors.

References

  • Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993-1022.
  • Boot, P., Zijlstra, H., & Geenen, R. (2017). The Dutch translation of the Linguistic Inquiry and Word Count (LIWC) 2007 dictionary. Dutch Journal of Applied Linguistics6(1), 65-76.
  • Chung, C. K., & Pennebaker, J. W. (2008). Revealing dimensions of thinking in open-ended self-descriptions: An automated meaning extraction method for natural language. Journal of research in personality42(1), 96-132.
  • Hsieh, H-F., & Shannon, S. E. (2005).Three approaches to qualitative content analysis. Qualitative health research15(9), 277-1288.
  • Kintsch, W. (1998). Comprehension: A paradigm for cognition. New York: Cambridge University Press.
  • Landauer, T. K., Foltz, P. W., & Laham, D. (1998). An introduction to latent semantic ana­lysis. Discourse processes25(2-3), 259-284.
  • Lund, K., & Burgess, C. (1996). Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instruments, & Computers28(2), 203-208.
  • Massó, G., Lambert, P., Penagos, C. R., & Saurí, R. (2013, December). Generating New LIWC Dictionaries by Triangulation. In Asia Information Retrieval Symposium (pp. 263-271). Springer, Berlin, Heidelberg.
  • Newman, M., Groom, C.J., Handelman, L.D., & Pennebaker, J.W. (2008). Gender differences in language use: An analysis of 14,000 text samples. Discourse Processes, 45(3), 211-236.
  • Pennebaker, J.W., Boyd, R.L., Jordan, K., & Blackburn, K. (2015). The development and psychometric properties of LIWC 2015. Austin, TX: University of Texas at Austin.
  • Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of language and social psychology29(1), 24-54.
  • Van Wissen, L., & Boot, P. (2017, September). An Electronic Translation of the LIWC Dictionary into Dutch. In: Electronic lexicography in the 21st century: Proceedings of eLex 2017 Conference. (pp. 703-715). Lexical Computing.

Downloads

Download data is not yet available.

Author Biographies

Downloads

Published

2019-06-30

Issue

Section

Vol 6 No 1 (2019)

How to Cite

Taraban, R., Pittman, J., Nalabandian, T. ., Fu Zun Yang, W., M. Marcy, W., & Srivinasa Murthy Gunturu. (2019). Creating and Testing Specialized Dictionaries for Text Analysis. East European Journal of Psycholinguistics , 6(1), 65-75. https://doi.org/10.29038/eejpl.2019.6.1.rta