Dočekal: Experimental Semantics: Data Science and Programming in R

The course begins with core descriptive statistics in R, including measures of central tendency and dispersion, quantiles, and common visualizations such as histograms, boxplots, and scatterplots. It then moves to count-data analysis, covering frequency and contingency tables, bar plots, and corpus-based methods, including CQL queries, with a case study on the distribution of color terms in the BNC (Gries, 2008; Lijffijt & Gries, 2012). Students also learn to prepare reproducible reports in R Markdown by combining code with narrative text and exporting to HTML, PDF, and Word formats. Building on these foundations, the course introduces experimental methods in linguistics, from design and data collection to analysis and visualization in R, together with platforms for creating and sharing experiments (e.g., L-Rex and PCIbex).
In the second part, attention shifts to regression modeling: first, linear regression and its interpretation, diagnostics, and visualization, illustrated by studies on polarity items and inferential judgments (Denić et al., 2021; Chemla et al., 2011; Szabolcsi et al., 2008); then, logistic regression for binary outcomes, including interpretation of model parameters and a comparison of frequentist and Bayesian approaches.

 

Readings

  • Baayen, R. H. (2008). Analyzing Linguistic Data: A Practical Introduction to Statistics Using R. Cambridge: Cambridge University Press.
  • Denić, M., Homer, V., Rothschild, D., & Chemla, E. (2021). The influence of polarity items on inferential judgments. Cognition, 215, 104791.
  • Geurts, B., & van Der Slik, F. (2005). Monotonicity and processing load. Journal of Semantics, 22(1), 97-117.
  • Gries, S. Th. (2008). Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13(4), 403-437. https://doi.org/10.1075/ijcl.13.4.02gri.
  • Chemla, E., Homer, V., & Rothschild, D. (2011). Modularity and intuitions in formal semantics: The case of polarity items. Linguistics and Philosophy, 34(6), 537-570.
  • Lijffijt, J., & Gries, S. Th. (2012). Correction to “Dispersions and adjusted frequencies in corpora.” International Journal of Corpus Linguistics, 17(1), 147-149. https://doi.org/10.1075/ijcl.17.1.08lij.
  • Levshina, N. (2015). How to Do Linguistics with R. Amsterdam: John Benjamins Publishing Company.
  • Szabolcsi, A., Bott, L., & McElree, B. (2008). The effect of negative polarity items on inference verification. Journal of Semantics, 25(4), 411-450.
  • Wickham, H., & Grolemund, G. (2017). R for Data Science (Vol. 2). Sebastopol: O’Reilly.