gambit - An Open Source Name Disambiguation Tool for Version Control Systems

Authors: Christoph Gote and Christian Zingg

2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR) (2021)

Projects: Data Science

Abstract

Name disambiguation is a complex but highly relevant challenge whenever analysing real-world user data, such as data from version control systems. We propose gambit, a rule-based disambiguation tool that only relies on name and email information. We evaluate its performance against two commonly used algorithms with similar characteristics on manually disambiguated ground-truth data from the Gnome GTK project. Our results show that gambit significantly outperforms both algorithms, achieving an F1 score of 0.985.