A Research Platform for Data-Driven Democracy Studies in Switzerland
This project is related to our research line: Online Social Networks
Duration 24 months (2018 - 2020)
Funding source Swiss Data Science Center (SDSC)
Both social sciences and humanities are currently shifting from classical research methodologies (such as surveys or close reading) to the adoption of data science techniques. However, the emerging research areas of social data science and digital humanities are still impeded by a lack of easily accessible, structured data. At the same time, large amounts of valuable records are stored in archives and libraries, but are often stored in formats that are not suitable for data-driven research. Efforts to digitize and structure these records are often undertaken in an improvised and isolated way – in other words, the wheel is reinvented for every such project.
Addressing these issues, the key goals of this project are to (i) develop a scalable and re-usable data processing chain to extract structured information from archival records, (ii) apply it to a large corpus of scanned proceedings of the Swiss parliament spanning 125 years of Swiss history, which is made available by the Swiss Federal Archive, and (iii) develop user-friendly, interactive data analysis and visualization tools to promote the use of the resulting data set by political scientists and the public. Working towards these three goals, a first milestone of the project will be the development of an automated method to cluster raw archival records based on document structure. A combination of manual semantic annotation of training data with machine learning techniques will then allow to extract structured information from scanned records, which constitutes the second milestone of the project. A third milestone will be the vectorization of text components based on neural network topic models, and the identification of named entities with the help of heuristic disambiguation techniques. This enables the construction of a knowledge graph which - in the context of the parliamentary proceedings addressed in this project – links entities such as members of parliament, political parties and fractions, committees, Swiss cantons and cities, policy topics, and legislative processes. Building on this knowledge graph, two final milestones of the project address the development of a Web-based visualization and analysis tool as well as the documentation and dissemination of our database to interested researchers and the public.
Figure: Schematic cutout of the knowledge graph, showing subsets of node and relationship types.
Blue: MPs; green: parties; red: speeches; purple: interventions. MPs are members of parties, are speakers of or mentioned in speeches, and are main- or co-sponsors of interventions; speeches refer to interventions.
The significance of the project for data science is two-fold: First, a structured database of Swiss parliamentary proceedings hosted on the SDSC platform will open new avenues for data-driven research in political science, social science and digital humanities, and will provide a valuable ground truth data set, e.g. for opinion mining. Second, the data processing chain will be developed in such a way that it can be reused in other contexts. Examples outside the scope of political science include, e.g., the processing and analysis of medical records in health sciences, the digitization of scientific articles in the area of scientometrics, or the mining of newspaper archives in the digital humanities.