Welcome to the Chair of Systems Design
Our research can be best described as data driven modelling of complex systems with particular emphasis on social, socio-technical, and socio-economic systems. We are a truly interdisciplinary team of about 15 people from various disciplines (statistical physics, applied mathematics, computer science, social science, engineering). And, yes, we do all the cool stuff, from big data analysis to multilayer network models, from social software engineering to predictions of scientific success - not to forget our research on polarization in political systems, cooperation in animal societies, and life cycles of R&D networks. Just click through our publications, funded projects, teaching or media coverage.
We welcome applications for one open position for PhD students in the context of data-driven modeling of social systems. We offer excellent working conditions in a lively interdisciplinary team as well as a competitive salary.
More information on this position and how to apply is available here.
How can we quantify the significance of links in relational data?
In this short paper, we propose a new statistical modeling framework to address this challenge. It builds on generalized hypergeometric ensembles, a class of generative stochastic models that give rise to analytically tractable probability spaces of directed, multi-edge graphs. We show how this framework can be used to assess the significance of links in noisy relational data. We illustrate our method in two data sets capturing spatio-temporal proximity relations between actors in a social system. The results show that our analytical framework provides a new approach to infer significant links from relational data, with interesting perspectives for the mining of data on social systems.
In our recent preprint, we perform a large scale analysis of R&D networks using a data driven modeling approach. We monitor the selection of partners for R&D collaborations of firms both empirically, by analyzing a large data set of R&D alliances over 25 years, and theoretically, by utilizing an agent-based model of alliance formation. Using the weighted k-core decomposition method we derive a centrality-based career path for each firm, and analyzing coreness differences between firms and their partners, we identify a change in the way firms select partners.
We use the agent-based model to test whether this change in behavior can be attributed to strategic considerations, and we find that the observed behavior can be well reproduced without such considerations. This way we challenge the role of strategies in explaining macro patterns of collaborations.
In this paper, we propose an efficient and practical method to identify valid bug reports which a) refer to an actual software bug, b) are not duplicates and c) contain enough information to be processed right away.
How big is the risk that a few initial failures of nodes in a network amplify to large cascades? Predicting the final cascade size is critical to ensure the functioning of a system as a whole.
To make this prediction, we often compute the average cascade size using local tree approximations or mean field approximations. Yet, as we demonstrate in our recent work, in finite networks, this average does not even need to be a likely outcome. Instead, we find broad and even bimodal cascade size distributions.
We are proud that our KDD 2017 paper on the analysis of time-stamped and sequential network data has been covered by ETH News. Our work casts a critical light on the pervasive use of network analysis methods in various contexts, including infrastructure systems, information systems, and health. We provide a novel data mining framework that allows to overcome limitations of existing network-based techniques for time series data, improving our ability to model and analyze complex systems.
The article can be found here.
We are happy to announce that our work When is a network a network? Multi-Order Graphical Model Selection in Pathways and Temporal Networks has been accepted for publication as a research paper at KDD'17. A short promotional video is available at the KDD YouTube channel:
How do economic actors or scientists choose their collaboration partners? On the one hand, one would argue that scientists as decision makers are quited different from firms. On the other hand, in order to reproduce macroscopic structure such as a collaboration network, we may not need to include all the microscopic details that distinguish economic from social agent.
In our article, we adopt a data-driven modeling approach to calibrate and validate a previously proposed agent-based model that abstract from these microscopic details, to capture only the essential features of the decision making process. The model is characterized by five parameters which relate to strategies adopted by economic actors or scientists when choosing their collaboration partners. Our results shed new light on the long-lasting question about the role of endogenous and exogenous factors in the formation of collaboration networks.
We have recently developed a new model to reproduce the size distribution of R&D alliances among firms. Our model can be used not only for agent-based simulations, but it is also analytic tractable. In addition, we have tested it against a data set listing 15,000 firms engaging in 15,000 R&D alliances over 26 years. Interested? Then take a look at our paper.
The "discovery" of reguralitiries and correlations from big data cannot replace the scientific clarification of the hidden causal effects. Hence to make real use of big data, social scientists are indespensable to make computational science a social one.
Read more about this on the ETH Zukunftblog.
Ever wondered how to apply regression to Multiplex Networks? In our preprint we introduce a new statistical method to investigate the impact of dyadic relations on complex networks generated from repeated interactions. The method is based on generalised hypergeometric ensembles (gHypEs), a class of statistical network ensembles we have developed recently.
We represent different types of known relations between system elements by weighted graphs, separated in the different layers of a multiplex network. With our method we can regress the influence of each relational layer, the independent variables, on the interaction counts, the dependent variables. Moreover, we can test the statistical significance of the relations as explanatory variables for the observed interactions.
Graph- and network-analytic methods are widely applied to data which capture relations between elements. Despite this popularity, we still lack principled methods to decide when network abstractions are justified and when not.
A new data mining framework developed at our chair can be used to answer the question when it is justified to make a network abstraction of sequential data on pathways and temporal networks. Building on principled model selection and statistical inference techniques, it further allows to infer optimal higher-order network models, which capture both temporal and toplogical characteristics of sequential data.
The methods proposed in this work have been implemented in the OpenSource python package pathpy which is available on gitHub.