This project formed the basis for my own PhD thesis on the English dative alternation in varieties of English. Broadly speaking, the project married the spirit of the Probabilistic Grammar Framework (which is the methodological outcome of the usage-based turn in linguistic theory, positing that grammatical knowledge is experience-based and partially probabilistic) to research along the lines of the English World-Wide Paradigm (which is concerned with the sociolinguistics of, and linguistic variation across, post-colonial English-speaking communities around the world). Because variation is a core explanandum in current linguistic theorizing, the project as well as my own thesis contributes to the development of usage-based theoretical linguistics by adopting a variational, large-scale comparative, and sociolinguistically responsible perspective.

In my work, I analysed the various factors driving the variation between the ditransitive dative (e.g. Mary gives John an apple) and the prepositional variant (e.g. Mary gives an apple to John) across nine different varieties of English. Data was sampled from the International Corpus of English (ICE) and the Corpus of Global web-based English (GloWbE) on each of the nine varieties, namely four traditionally labelled ‘native’ varieties – British, Irish, New Zealand and Canadian English -, and five ‘non-native’ varieties, namely Hong Kong, Singapore, Indian, Philippine and Jamaican English.

Employing different statistical techniques, such as mixed-effects modeling, conditional random forests, cluster analysis, and multidimensional scaling, I show that the constraints driving this variation are largely stable across varieties, i.e. their effect direction is the same regardless of location. For instance, the longer one constituent is in the number of letters with respect to the other constituent, the more likely one of the variants become. However, the increase or decrease in likelihood is different across varieties. That is, the effect that a constraint has on the variation differs in degrees between localities. Furthermore, some effects also differed in strength depending on the register in which the dative variant was used (e.g. newspaper vs. spoken dialogue) and seems to be lexically dependent. My findings thus not only speak to the cognitive nature of this variation but also highlight that any such cognitive processes have to be grounded in the social reality that lead to their entrenchment (termed ‘cognitive indigenization’, see our publication here).

(For more information, check out the project’s website here.)

Project members
Prof. Dr. Benedikt Szmrecsanyi (PI)
Dr. Jason Grafmiller
Benedikt Heller (PhD fellow)
Melanie Röthlisberger (PhD fellow)

Comparative sociolinguistics

One of the aims of the project was to investigate to what extent speakers from different regional backgrounds differ in their probabilistic grammar. In order to quantify this difference, we used dialectometric techniques inspired by Comparative Sociolinguistics. More specifically, we compared each variety’s probabilistic grammar according to (1) the number of shared significant and non-significant factors in a regression model, (2) the effect size of constraints provided by the coefficient estimates, and (3) the importance of constraints. The latter values we derived from conditional random forest (see my publications and presentations for more details). We then used the values obtained in (1) to (3) as input for a classic distance matrix and then reduced the number of dimensions to two or three with MDS. In case where we had three dimensions, we plotted the varieties in a three-dimensional cube. Using beamer for our presentation slides (and the animation library in latex), we created a rotating three-dimensional cube in which distance between varieties corresponds to probabilistic distance with regard to one of the three lines of comparison.

Figure: Rotating 3D cube plotting nine varieties of English in probabilistic space. The three axes were labelled according to the clustering of the varieties (color coded based on hierarchical clustering with ward).[click on the image to see the cube rotating]