Quantifying and suppressing ranking bias

26 March 2017

Every day scholars and online users explore available knowledge using recommender systems based on ranking algorithms. This challenge us to design more sophistcated filtering and ranking procedures to avoid biases that can systematically hide relevant contents.

In this work, we tackle this issue by quantifying and supressing biases of indicators of scientific impact. We use a large citation dataset from Microsoft Academic Graph and a new statistical framework based on the Mahalanobis distance to show that the rankings by well known indicators, including relative citation count and Google's PageRank score, are significantly biased by paper field and age. We propose a general normalization procedure motivated by the z-score which produces much less biased rankings when applied to citation count and PageRank score.


We provide a simple and quick tutorial on how we quantify ranking bias. Source codes and tutotial can be found in this gitHub project. Enjoy!