Categorizing bugs with social networks: A case study on four open source software communities
Authors: Marcelo Serrano Zanetti, Ingo Scholtes, Claudio Juan Tessone and Frank Schweitzer
ICSE '13 Proceedings of the 35th International Conference on Software Engineering (2013)
Efﬁcient bug triaging procedures are an important precondition for successful collaborative software engineering projects. Triaging bugs can become a laborious task particularly in open source software (OSS) projects with a large base of comparably inexperienced part - time contributors. In this paper, we propose an efﬁcient and practical method to identify valid bug reports which a) refer to an actual software bug, b) are not duplicates and c) contain enough information to be processed right away. Our classiﬁcation is based on nine measures to quantify the social embeddedness of bug reporters in the collaboration network. We demonstrate its applicability in a case study, using a comprehensive data set of more than 700, 000 bug reports obtained from the BUGZILLA installation of four major OSS communities, for a period of more than ten years. For those projects that exhibit the lowest fraction of valid bug reports, we ﬁnd that the bug reporters’ position in the collaboration network is a strong indicator for the quality of bug reports. Based on this ﬁnding, we develop an automated classiﬁcation scheme that can easily be integrated into bug tracking platforms and analyze its performance in the considered OSS communities. A support vector machine (SVM) to identify valid bug reports based on the nine measures yields a precision of up to 90.3% with an associated recall of 38.9%. With this, we signiﬁcantly improve the results obtained in previous case studies for an automated early identiﬁcation of bugs that are eventually ﬁxed. Furthermore, our study highlights the potential of using quantitative measures of social organization in collaborative software engineering. It also opens a broad perspective for the integration of social network analysis in the design of support infrastructures.