Theory-Driven Statistics for the DigitalHumanities: Presenting Pitfalls and a Practical Guide by the Example of the Reformation

Ramona Roller

Journal of Cultural Analytics (2022)


The Digital Humanities face the problem of multiple hypothesis testing: Evermore hypotheses are tested until a desired pattern has been found. This practice is prone to mistaking random patterns for real ones. Instead, we should reduce the number of hypothesis tests to only test meaningful ones. We address this problem by using theory to generate hypotheses for statistical models. We illustrate our approach with the example of the European Reformation, where we test a theory on the role of opinion leaders for the adoption of Protestantism with a logistic regression model. Given our specific setting, including choice of data and operationalisation of variables, we do not find enough evidence to claim that opinion leaders contributed via personal visits and letters to the adoption of Protestantism. To falsify or to support a theory, it has to be tested in different settings. Our presented approach helps the Digital Humanities bridge the gap between the qualitative and quantitative camp, advance understanding of structures resulting from human activity, and increase scientific credibility.