Quantifying ranking bias

A quick example on a synthetic network

We have recently developed a statistical framework to quantify biases of(in) rankings [1]. Our framework is based on two simple ideas: first, we define an unbiased selection process, and then we use the Mahalanobis distance [2] to compare the observed rankings against what is expected from the unbiased selection process. To present this framework and provide an intuition on how it works, we here provide a tutorial.

We initially grow a synthetic network using the linear preferential attachment rule [3]. This rule is choosen as it is one of the simplest rules that causes the final degree of the nodes to be strongly correlated with their age: the older the node, the higher the degree (the age of a node is defined based on the time at which the node entered into the network). Afterwards, we rank the nodes according to their degree and PageRank score, and use our framework to show that the top-10% of these rankings is biased by the age of the nodes.

  1. G. Vaccario, M. Medo, N. Wider, M.S. Mariani: Quantifying and suppressing ranking bias in a large citation network [Journal of Informetrics]
  2. Mahalanobis distance
  3. Barabasi-Albert model
In [1]:
import numpy as np
import igraph
import matplotlib
import matplotlib.pyplot as plt

from quantbias import *
from mhd import *
from IPython.display import * 
from IPython.display import HTML                             

The synthetic network

We grow a directed scale-free network using the functions pre-defined by igraph. In particular, we use the linear preferential attachment rule as growing mechanism.

In [2]:
no_of_nodes = 1000
no_of_links = 4
cit_net = igraph.Graph.Barabasi(no_of_nodes,no_of_links,outpref=False,directed=True, power=1, zero_appeal = 1)
IGRAPH D--- 1000 3990 -- 

To visualize the network we use various plotting features of igraph. First, we use the Fruchterman-Reingold layout for the visualization of the network.

In [3]:
layout = cit_net.layout('fr')

We scale the size of the nodes accordingly to their IN-degree

In [4]:
visual_style["vertex_size"]=[4 + 2*np.log(i+1) for i in cit_net.degree(mode = "IN")]

and color the nodes from light (older nodes) to dark (younger nodes) purple.

In [5]:
min_light = 0.2
palette = [plt.cm.Purples]
for i in range(no_of_nodes):
    color = palette[0](min_light + (i-min_light)/no_of_nodes)#older nodes lighter color
    visual_style["vertex_color"].append("rgb"+str(tuple(x*255 for x in color[:3])))       

Finally, we make the edges thin in order to make the image of the network more presentable.

In [6]:
visual_style["edge_arrow_size"] = 0.2
visual_style["bbox"] = (900, 900)
igraph.plot(cit_net,filename, **visual_style)