Higher-order analysis of temporal network data

Higher-order analysis of temporal network data

An educational tutorial using the free python module pyTempNets

Ingo Scholtes
Chair of Systems Design
ETH Zürich

May 2015 (Updated: August 2015)

In this tutorial, I will introduce some basic concepts of the analysis of temporal network data using higher-order networks, which have been introduced in the following recent publications:

  • I Scholtes, N Wider, A Garas: Higher-Order Aggregate Networks in the Analysis of Temporal Networks: Path Structures and centralities, [1508.06467], Aug. 2015
  • I Scholtes, N Wider, R Pfitzner, A Garas, CJ Tessone, F Schweitzer: Causality-driven slow-down and speed-up of diffusion in non-Markovian temporal neworks, Nature Communications, 5, Sept. 2014
  • R Pfitzner, I Scholtes, A Garas, CJ Tessone, F Schweitzer: Betweenness preference: Quantifying correlations in the topological dynamics of temporal networks, Phys Rev Lett, 110(19), 198701, May 2013

A key point of these works is that we highlight the importance of non-Markovian characteristics in time-stamped relational data. In a nutshell, we show that the ordering of links in time-stamped networks matters. We further introduce higher-order networks, a generalization of the common network perspective that allows to study the influence of order correlations in temporal networks. This new perspective on time-stamped network data provides interesting new opportunities not only to better understand dynamical processes on temporal networks, but also for the definition of novel measures for centrality or new community detection algorithms that take into account bot the topological and temporal dimension of complex systems.

In the following, I illustrate some basic concepts of non-Markovian characteristics in temporal networks. In particular, I will showcase the use of pyTempNet, a free python module which simplifies the analysis and visualization of time-stamped relational data, as well as the simulation of dynamical processes on top of temporal networks. All of the methods and concepts introduced in the publications listed above are fully implemented in pyTempNet thus making it particularly easy to apply the higher-order network framework to your data.

Installation of pyTempNets is easy. You will need a working installation of python3 properly set up. You will also have to have the modules numpy, scipy, matplotlib and igraph installed. This can be done easily by installing the Anaconda python 3.4 distribution, which you can get from here. Once you have installed Anaconda, you will just need to additionally set up igraph by invoking the following command:

> pip install python-igraph

You are then ready to install pyTempNets directly from gitHub by typing:

> pip install git+git://github.com/IngoScholtes/pyTempNets.git

After this, you should be ready to run this notebook.

Let us start with a new ipython notebook and let us import some basic modules required in the following tutorial. We first import the basic pyTempNet module as follows:

In [1]:
import pyTempNet as tn

Since time-aggregated networks are internally represented as igraph objects (which we can plot and analyze using standard igraph functions) we will also need to include igraph.

In [2]:
import igraph

Finally, we will want to plot some figures and do some basic calculations using numpy and matplotlib.

In [3]:
import numpy as np
import matplotlib.pyplot as plt

You won't have to care too much about the following imports and definitions, as they will simply allow us to directly embed igraph plots, Latex output and videos into the output of an ipython notebook.

In [4]:
from IPython.display import *
from IPython.display import HTML
import base64

from wand.image import Image

from subprocess import call

def showVideo(filename):
    VIDEO_TAG = """<video controls>
 <source src="data:video/x-m4v;base64,{0}" type="video/mp4">
 Your browser does not support the video tag.
</video>"""
    bytes = open(filename, "rb").read()
    return HTML(VIDEO_TAG.format(base64.b64encode(bytes).decode('utf-8')))

def compileAndShowTex(file):
    call("cd " + file.split('/')[0] + " && pdflatex " + file.split('/')[1], shell=True)
    with Image(filename=file.replace('.tex', '.pdf'), resolution=200) as img:
        img.resize(400, 400)
        img.save(filename=file.replace('.tex', '.png'))
        display(Image(filename=file.replace('.tex', '.png')))

Two simple examples

Let us introduce some basic concepts using two simple examples for temporal networks. We first create an empty temporal network object, which we can then use to add time-stamped links one by one. In each call, we pass the source and target node, as well as an integer time stamp indicating when the link was present.

In [5]:
t1 = tn.TemporalNetwork()
t1.addEdge('a', 'c', 1)
t1.addEdge('c', 'e', 2)

t1.addEdge('b', 'c', 3)
t1.addEdge('c', 'd', 4)

t1.addEdge('b', 'c', 5)
t1.addEdge('c', 'e', 6)

t1.addEdge('a', 'c', 7)
t1.addEdge('c', 'd', 8)

A nice way to visualize such networks are so-called time-unfolded notations, in which we unfold time into a spatial (vertical) dimension. pyTempNet allows you to directly generate tikz code for such a representation. We just call the following function:

In [6]:
tn.Visualizer.exportTikzUnfolded(t1, 'tutorial/t1.tex')

This will produce a LaTeX file which we can simply compile to obtain a PDF figure (this comes in handy for illustrative figures in manuscripts). We can either manually compile the file and convert the resulting PDF figure to a PNG, or we can do this automatically using our little helper function (which requires imagemagick and ghostscript as well as imagemagick's python interface Wand to be installed on your system).

With this helper function, we can easily display the resulting figure in ipython as follows:

In [7]:
compileAndShowTex('tutorial/t1.tex')

In this representation, each of the five nodes is represented by multiple temporal copies. Two subsequent temporal copies are connected whenever a link existed at the respective time stamp. This representation makes it easy to study so-called time-respecting paths.

Apart from time-unfolded representations, pyTempNet can also easily be used to produce videos of temporal networks. For this we can simply export a sequence of PNG movie frames using the following built-in function:

In [8]:
tn.Visualizer.exportMovie(t1,'tutorial\\video.mp4', fps=1)
2015-08-19 12:58:23 [Severity.INFO]	Extracting two-paths for delta = 1...
2015-08-19 12:58:23 [Severity.INFO]	finished.
2015-08-19 12:58:23 [Severity.INFO]	Constructing first-order aggregate network ...
2015-08-19 12:58:23 [Severity.INFO]	finished.
2015-08-19 12:58:23 [Severity.WARNING]	No visual style specified, setting to defaults
2015-08-19 12:58:23 [Severity.INFO]	Encoding video ...
2015-08-19 12:58:24 [Severity.INFO]	finished.

This will internally generate a sequence of numbered frames within a subdirectory called frames. It will then automatically convert the resulting frames to an mp4 video using - for instance - the tool imagemagick. For this, make sure to install imagemagick for your platform.

We can now show the resulting video of the temporal network using the helper function defined above. Just click the play button.

In [9]:
showVideo('tutorial/video.mp4')
Out[9]:

Let us now study the extraction of so-called time-respecting paths in temporal networks. This crucial concept is the temporal equivalent to paths in static network. Different from static networks, links on a time-respecting path have to respect causality, i.e. in order for a path (u,v,t_1) -> (v,w,t_2) of two time-stamped links to connect nodes u and w, the first link has to be present before the second link, i.e. t_1<t_2. pyTempNet can automatically extract time-respecting paths of length two for you. Sometimes, you may also want to include a limit in the waiting time for time-respecting paths. I.e. rather than simply requiring t_1<t_2, you may want to impose that t_2-t_1 <= delta for some maximum waiting time delta.

The pyTempnet function extractTwoPaths can be used to extract time-respecting paths of length two. If no delta is specified, delta = 1 is assumed, i.e. only consecutive links will be considered to contribute to time-respecting paths.

Note that we can specify an arbitrary maximum time difference delta by using the function setMaxTimeDiff.

In [10]:
t1.setMaxTimeDiff(delta=1)
t1.extractTwoPaths()
2015-08-19 12:58:24 [Severity.INFO]	Extracting two-paths for delta = 1...
2015-08-19 12:58:24 [Severity.INFO]	finished.

Depending on your data set, this function may require a few seconds (if you have several hundred thousand or even Millions of time-stamped links). Once it has been completed, we can start analyzing causality structures and non-Markovian characteristics in the temporal network. If you omit the explicit call above, the call will automatically be made whenever time-respecting paths are first needed.

Based on the statistics of time-respecting paths, we can calculate betweenness preference, a measure that captures to what extent nodes mediate time-respecting paths between particular pairs of their neighbours in the static network. Being an information-theoretic measure, it can be interpreted in terms of the number of bits of information that are lost, when we aggregate all time-stamped links around a node (thus obtaining a time-aggregated network which misses the information on the ordering of links). For details on the definition and interpretation of the measure, I have to refer to the publications mentioned above.

With pyTempNet we can calculate the betweenness preference of a node using the following function. In this particular case, the betweenness preference of node c should be zero, as there is no preference for node c to mediate time-respecting paths between any particular pair of nodes (see also temporal network above).

In [11]:
print('I^b(S;D) = ', tn.Measures.BetweennessPreference(t1,'c'))
I^b(S;D) =  0.0

We can now calculate different time-aggregated representations of our temporal network. The simplest and most commonly used one is the plain time-aggregated network, in which all time-stamps are discarded, while link weights indicate the number or frequency of interactions between nodes.

We can obtain an igraph object representing the (first-order) aggregate network as follows. This igraph object can be plotted using basic igraph functions.

In [12]:
# We can compute and plot the first-order aggregate network
g1 = t1.igraphFirstOrder()

visual_style = {}
visual_style["bbox"] = (600, 400)
visual_style["margin"] = 60
visual_style["vertex_size"] = 80
visual_style["vertex_label_size"] = 24
visual_style["vertex_color"] = "lightblue"
visual_style["edge_curved"] = 0.2
visual_style["edge_width"] = 1
visual_style["edge_arrow_size"] = 2

visual_style["layout"] = g1.layout_auto()
visual_style["vertex_label"] = g1.vs["name"]
visual_style["edge_label"] = g1.es["weight"]

igraph.plot(g1, 'tutorial/t1_G1.png', **visual_style)
display(Image(filename='tutorial/t1_G1.png'))

In the Nature Communications and EPJ B articles listed above, we introduced so-called higher-order networks which not only capture the frequency but also the ordering of time-stamped links. The idea is that each link in the time-stamped network is represented by a node in the second-order network, while each time-respecting path of length two is represented by a link. Again, details on this abstraction can be found in the papers above.

Here, we can compute (and plot) the second-order network for the temporal network above as follows:

In [13]:
g2 = t1.igraphSecondOrder()

visual_style["layout"] = g2.layout_auto()
visual_style["vertex_label"] = g2.vs["name"]
visual_style["edge_label"] = g2.es["weight"]
igraph.plot(g2, 'tutorial/t1_G2.png', **visual_style)
display(Image(filename='tutorial/t1_G2.png'))
2015-08-19 12:58:24 [Severity.INFO]	Constructing second-order aggregate network ...
2015-08-19 12:58:24 [Severity.INFO]	finished.

In the example above, we see that a total of four time-respecting paths of length two (represented by the four links) exist in the temporal network, each of them occurring exactly one time (thus the link weights of one).

Let us now see what happens if we simply flip the order of two time-stamped links in the temporal network. We do this by defining the following new temporal network (which is identical to the previous one except for one reordering).

In [14]:
t2 = tn.TemporalNetwork()
t2.addEdge('a', 'c', 1)
t2.addEdge('c', 'e', 2)

t2.addEdge('b', 'c', 3)
t2.addEdge('c', 'd', 4)

t2.addEdge('b', 'c', 5)
t2.addEdge('c', 'd', 6)

t2.addEdge('a', 'c', 7)
t2.addEdge('c', 'e', 8)

Again, we can produce a tikz figure showing the time-unfolded representation of this temporal network (which I again manually compiled and converted to a PNG file).

In [15]:
tn.Visualizer.exportTikzUnfolded(t2, 'tutorial/t2.tex')
compileAndShowTex('tutorial/t2.tex')

We now again extract all time-respecting paths of length two and then visualize the first-order network. Not surprisingly, this is exactly the same as before, since we only changed the ordering of time-stamped links.

In [16]:
t2.setMaxTimeDiff(delta=1)
t2.extractTwoPaths()

g1 = t2.igraphFirstOrder()

visual_style["layout"] = g1.layout_auto()
visual_style["vertex_label"] = g1.vs["name"]
visual_style["edge_label"] = g1.es["weight"]
igraph.plot(g1, 't2_G1.png', **visual_style)
display(Image(filename='t2_G1.png'))
2015-08-19 12:58:26 [Severity.INFO]	Extracting two-paths for delta = 1...
2015-08-19 12:58:26 [Severity.INFO]	finished.
2015-08-19 12:58:26 [Severity.INFO]	Constructing first-order aggregate network ...
2015-08-19 12:58:26 [Severity.INFO]	finished.

As we will see now, there is an important change in terms of causality structures. Now, node c has a preference to mediate time-respecting paths between the pairs of nodes a and e as well as b and d respectively. Furthermore, knowing the "source" of a time-respecting path through c completely determines the target (source a determining target e and source b determining target d). Our uncertainity of two equally likely choices is reduced to one, which corresponds to a betweenness preference of one bit.

In [17]:
print('I^b(S;D) = ', tn.Measures.BetweennessPreference(t2,'c'))
I^b(S;D) =  1.0

We can also output the corresponding (unnormalized) betweenness preference matrix (again see paper for details). The entries provide us with the number of different time-respecting paths through node c. The first row corresponds to node a, the second row corresponds to node b, the first column corresponds to node e, the second column corresponds to node d. Here, the entries reveal that there are two time-respecting paths a -> c -> e (first row) and two time-respecting paths b -> c -> d (second row). Time-respecting paths a -> c -> d and b -> c -> e (off-diagonal zero entries) are absent.

In [18]:
print('I^b(S;D) = ', tn.Utilities.BWPrefMatrix(t2,'c'))
I^b(S;D) =  [[ 2.  0.]
 [ 0.  2.]]

The changes in the statistics of time-respecting paths that are due to the reordering of the time-stamped edges is captured by the fact that the second-order network is different from the one before (even though the first-order network is the same!).

In [19]:
g2 = t2.igraphSecondOrder()

visual_style["layout"] = g2.layout_auto()
visual_style["vertex_label"] = g2.vs["name"]
visual_style["edge_label"] = g2.es["weight"]
igraph.plot(g2, 'tutorial/t2_G2.png', **visual_style)
display(Image(filename='tutorial/t2_G2.png'))
2015-08-19 12:58:26 [Severity.INFO]	Constructing second-order aggregate network ...
2015-08-19 12:58:26 [Severity.INFO]	finished.

In the second-order network above, we see that now each of the two time-respecting paths (a,c) -> (c,e) and (b,c) -> (c,d)occurs twice, while the theoretically possible time-respecting paths (a,c) -> (c,d) and (b,c) -> (c,e) (which we would actually expect to occur with the same probability than the others) are absent!

You see that this change in the causal structures of the temporal network are exclusively driven by the ordering of time-stamped edges and it is these phenomena that we will now study in more realistic scenarios.

Reading from files: A larger synthetic example

We now consider a larger (synthetic) example which we read from a TEDGE file, a simple file format which contains a list of time-stamped links. TEDGE files have the format

`source target time
a b 4
b c 7
u v 3
u v 3
...`

The separation character (here: space) can be arbitrary and can be specified upon reading. Furthermore, the ordering of columns can be arbitrary and thus a header line indicating which column refers to source, target and timestamp of an edge is required. As you can see in the example above, time-stamped edges do not necessarily have to be ordered according to time. In addition to simple integer time stamps, actual (arbitrary) string timestamps like 2015-05-23 09:23:22 can be used. In this case a format string indicating the meaning of the timestamp has to be included.

Here, we use a simple example of integer time stamps, reading a file of time-stamped edges that were synthetically generated to include order correlations.

In [20]:
t = tn.readFile('data/tutorial/example.tedges', sep=' ')
2015-08-19 12:58:26 [Severity.INFO]	Reading time-stamped links ...
2015-08-19 12:58:26 [Severity.INFO]	finished.
2015-08-19 12:58:26 [Severity.INFO]	Building index data structures ...
2015-08-19 12:58:26 [Severity.INFO]	finished.
2015-08-19 12:58:26 [Severity.INFO]	Sorting time stamps ...
2015-08-19 12:58:26 [Severity.INFO]	finished.

We can actually print a summary of its basic properties by using the Summary() function.

In [21]:
print(t.Summary())
Nodes:			30
Time-stamped links:	13200
Links/Nodes:		440.0
Observation period:	[0, 13199]
Observation length:	13199
Time stamps:		13200
Avg. inter-event dt:	1.0
Min/Max inter-event dt:	1/1
Max Time Diff (delta):	1
Two-paths:		not calculated
First-order network:	not constructed
Second-order network:	not constructed

Let us now extract time-respecting paths of length two and visualize the first-order network.

In [22]:
t.setMaxTimeDiff(delta=1)
t.extractTwoPaths()

g1 = t.igraphFirstOrder()

visual_style = {}
visual_style["bbox"] = (600, 600)
visual_style["margin"] = 60
visual_style["edge_width"] = [x/100 for x in g1.es()["weight"]]
visual_style["vertex_size"] = 20
visual_style["vertex_label_size"] = 12
visual_style["vertex_color"] = "lightblue"
visual_style["vertex_label"] = g1.vs["name"]
visual_style["edge_arrow_size"] = 0.3
visual_style["layout"] = g1.layout_auto()

igraph.plot(g1, 'tutorial/example_g1.png', **visual_style)
display(Image(filename='tutorial/example_g1.png'))
2015-08-19 12:58:26 [Severity.INFO]	Extracting two-paths for delta = 1...
2015-08-19 12:58:26 [Severity.INFO]	finished.
2015-08-19 12:58:26 [Severity.INFO]	Constructing first-order aggregate network ...
2015-08-19 12:58:26 [Severity.INFO]	finished.

Importantly, the weighted time-aggregated network does not show any specific structure. However, due to the specific ordering of time-stamped edges in the temporal network, the second-order aggregate network can look completely different, meaning that only very specific time-respecting paths actually occur. Here, the second-order network shows three pronounced temporal communities that are not visible in the first-order network.

In [23]:
g2 = t.igraphSecondOrder()

visual_style = {}
visual_style["edge_arrow_size"] = 0.01
visual_style["vertex_color"] = "lightblue"
visual_style["vertex_size"] = 5

igraph.plot(g2, 'tutorial/example_g2.png', **visual_style)
display(Image(filename='tutorial/example_g2.png'))
2015-08-19 12:58:26 [Severity.INFO]	Constructing second-order aggregate network ...
2015-08-19 12:58:26 [Severity.INFO]	finished.

Analyzing real-world data: Diffusion dynamics in temporal networks

Let us now come to a more interesting case, analyzing real-world relational data sets from a number of different contexts. The first data set that we will study covers time-stamped antenna-antenna interactions in an ant colony and was originally published in:

  • B Blonder, A Dornhaus: Time-Ordered Networks Reveal Limitations to Information Flow in Ant Colonies, PlosOne, 2011

Here, we read data on one particular colony (details in our papers above) as a temporal network.

In [24]:
t = tn.readFile('data/tutorial/ants-1-1_agg_6s_scc.tedges', sep=' ')
print(t.Summary())
2015-08-19 12:58:30 [Severity.INFO]	Reading time-stamped links ...
2015-08-19 12:58:30 [Severity.INFO]	finished.
2015-08-19 12:58:30 [Severity.INFO]	Building index data structures ...
2015-08-19 12:58:30 [Severity.INFO]	finished.
2015-08-19 12:58:30 [Severity.INFO]	Sorting time stamps ...
2015-08-19 12:58:30 [Severity.INFO]	finished.
Nodes:			61
Time-stamped links:	982
Links/Nodes:		16.098360655737704
Observation period:	[0, 239]
Observation length:	239
Time stamps:		201
Avg. inter-event dt:	1.195
Min/Max inter-event dt:	1/11
Max Time Diff (delta):	1
Two-paths:		not calculated
First-order network:	not constructed
Second-order network:	not constructed

Let us first consider the (first-order) time-aggregated network.

In [25]:
g1 = t.igraphFirstOrder()

visual_style = {}
visual_style["bbox"] = (600, 600)
visual_style["margin"] = 60
visual_style["vertex_size"] = 30
visual_style["vertex_label_size"] = 8
visual_style["vertex_color"] = "lightblue"
visual_style["vertex_label"] = g1.vs["name"]
visual_style["edge_arrow_size"] = 0.3
visual_style["layout"] = g1.layout_auto()

igraph.plot(g1, 'tutorial/ants_g1.png', **visual_style)
display(Image(filename='tutorial/ants_g1.png'))
2015-08-19 12:58:30 [Severity.INFO]	Extracting two-paths for delta = 1...
2015-08-19 12:58:30 [Severity.INFO]	finished.
2015-08-19 12:58:30 [Severity.INFO]	Constructing first-order aggregate network ...
2015-08-19 12:58:30 [Severity.INFO]	finished.

Above, we have briefly argued about the time-respecting paths that we would expect, considering that specific order correlations (i.e. betweennness preferences of nodes) are absent in the temporal network. Based on the first-order aggregate network we can actually compute the expected second-order network, under the assumption of a null model in which all interactions are independent from each other. For this, we can use the following pyTempNet function:

In [26]:
g2n = t.igraphSecondOrderNull()

visual_style = {}
visual_style["edge_arrow_size"] = 0.01
visual_style["vertex_color"] = "lightblue"
visual_style["vertex_size"] = 10
visual_style["layout"] = g2n.layout_auto()

igraph.plot(g2n, 'tutorial/ants_g2n.png', **visual_style)
display(Image(filename='tutorial/ants_g2n.png'))
2015-08-19 12:58:31 [Severity.INFO]	Constructing second-order aggregate network ...
2015-08-19 12:58:31 [Severity.INFO]	finished.

We see that this expected network is rather densely connected, i.e. we expect many time-respecting paths of length two to exist between nodes. We can now compare this to the actual second-order network.

Here, we see that - due to the specific ordering of time-stamped links, the actual second-order network can look completely different, meaning that only very specific time-respecting paths actually occur. Here, the second-order network is much less connected than the expected one shown above.

In [27]:
g2 = t.igraphSecondOrder()

igraph.plot(g2, 'tutorial/ants_g2.png', **visual_style)
display(Image(filename='tutorial/ants_g2.png'))

This (visually) different second-order network shown above is a sign for the presence of non-Markovian characteristics which change causality in the temporal network. We can quantify the presence of these characteristics by calculating the entropy growth rate ratio between a first- and a second-order Markov model of the time-stamped edge sequence (see details in publications above). This value captures to what extend the actual temporal network differs from a Markovian case, in which the next edge in a time-stamped edge sequence is independent from the previous one.

Here, we observe a value that is strictly smaller than one, which signifies that the data set indeed included non-Markovian characteristics.

In [28]:
print("Entropy growth rate ratio is", tn.Measures.EntropyGrowthRateRatio(t))
Entropy growth rate ratio is 0.363798532795

We can next ask the question to what extent these characteristics influence dynamical processes on the temporal network. Here, we consider a diffusion process, modeled by the convergence behavior of a random walk process on the temporal network. With the next two functions, we can compute the time (i.e. number of random walk steps) required by a random walker to converge. Precisely, we measure the average number of steps after which the total variation distance between the visitation probabilities and the stationary distribution is smaller than a given threshold of epsilon.

In [29]:
speed_g2 = tn.Processes.RWDiffusion(t.igraphSecondOrder().components(mode="strong").giant(), epsilon=1e-6)
speed_g2n = tn.Processes.RWDiffusion(t.igraphSecondOrderNull().components(mode="strong").giant(), epsilon=1e-6)

print("Empirical slow-down factor for diffusion is", speed_g2/speed_g2n)
Empirical slow-down factor for diffusion is 2.214887640449438

For small epsilon (i.e. large times t) we empirically observe that non-Markovian characteristics slow down the diffusion process on average by a factor of approx. 2.05 (values will naturally differ for individual runs).

We can actually predict this analytically by means of a spectral analysis of the second-order network. The following function analytically calculates the expected factor by which a diffusion process in a temporal network is slowed down due to non-Markovian characteristics (and the resulting changes in the causality structures of the temporal network).

In [30]:
print("Analytical slow-down factor for diffusion is", tn.Measures.SlowDownFactor(t))
Analytical slow-down factor for diffusion is 2.05252345941

Here we expect diffusion in the temporal network to be slower by a factor of about 2.05 (compared to a Markovian temporal network in which no order correlations change causality). This is well in line with our empirical observation above.

Apart from predicting changes in diffusion speed, we can do some extended spectral analysis of the second-order network based on a normalized Laplacian matrix. This analysis confirms that the actual temporal sequence has a smaller algebraic connectivity than a Markovian temporal network. This indicates that the causal topology of time-respecting paths in the temporal network is less connected than expected at random, thus explaining the observed slow-down of diffusion.

This can intuitively be related to the weaker *causal connectivity& in the real temporal network compared to what we expect at random (compare the second-order networks shown above).

In [31]:
print("Algebraic Connectivity (G2) =", tn.Measures.AlgebraicConn(t))
print("Algebraic Connectivity (G2 null) =", tn.Measures.AlgebraicConn(t, model="NULL"))
Algebraic Connectivity (G2) = 0.0426976073824
Algebraic Connectivity (G2 null) = 0.0856700955263

Let us now move to a second data set, which covers E-Mail exchanges between employees in a manufacturing company. Again, we can read these data asa temporal network.

In [32]:
t = tn.readFile('data/tutorial/manufacturing_30d_agg_3600_scc.tedges', sep=' ')
print(t.Summary())
2015-08-19 12:58:33 [Severity.INFO]	Reading time-stamped links ...
2015-08-19 12:58:33 [Severity.INFO]	finished.
2015-08-19 12:58:33 [Severity.INFO]	Building index data structures ...
2015-08-19 12:58:33 [Severity.INFO]	finished.
2015-08-19 12:58:33 [Severity.INFO]	Sorting time stamps ...
2015-08-19 12:58:33 [Severity.INFO]	finished.
Nodes:			96
Time-stamped links:	3843
Links/Nodes:		40.03125
Observation period:	[36, 717]
Observation length:	681
Time stamps:		221
Avg. inter-event dt:	3.09545454545
Min/Max inter-event dt:	1/64
Max Time Diff (delta):	1
Two-paths:		not calculated
First-order network:	not constructed
Second-order network:	not constructed

The entropy growth rate ratio smaller than one confirms that the temporal network exhibits non-Markovian characteristics that are likely to change causality.

In [33]:
print("Entropy growth rate ratio is", tn.Measures.EntropyGrowthRateRatio(t))
2015-08-19 12:58:33 [Severity.INFO]	Extracting two-paths for delta = 1...
2015-08-19 12:58:33 [Severity.INFO]	finished.
2015-08-19 12:58:33 [Severity.INFO]	Constructing second-order aggregate network ...
2015-08-19 12:58:33 [Severity.INFO]	finished.
2015-08-19 12:58:33 [Severity.INFO]	Constructing first-order aggregate network ...
2015-08-19 12:58:33 [Severity.INFO]	finished.
Entropy growth rate ratio is 0.549171568756

Based on spectral properties, we analytically predict these characteristics to slow down diffusion by a factor of about 3.01 (compared to a Markovian temporal network).

In [34]:
print("Analytical slow-down factor for diffusion is", tn.Measures.SlowDownFactor(t))
Analytical slow-down factor for diffusion is 3.01996747563

Let us empirically confirm that this analytical prediction is correct ...

In [35]:
speed_g2 = tn.Processes.RWDiffusion(t.igraphSecondOrder().components(mode="strong").giant(), epsilon=1e-12)
speed_g2n = tn.Processes.RWDiffusion(t.igraphSecondOrderNull().components(mode="strong").giant(), epsilon=1e-12)

print("Empirical slow-down factor for diffusion is", speed_g2/speed_g2n)
Empirical slow-down factor for diffusion is 2.86497461928934

Our third data set covers contacts between medical personnel in a hospital, recorded by means of proximity sensing tags. Again, we read these data asa temporal network.

In [36]:
t = tn.readFile('data/tutorial/Hospital_noADM_agg_300_scc_8_56h.tedges', sep=' ')

print(t.Summary())
2015-08-19 12:58:34 [Severity.INFO]	Reading time-stamped links ...
2015-08-19 12:58:34 [Severity.INFO]	finished.
2015-08-19 12:58:34 [Severity.INFO]	Building index data structures ...
2015-08-19 12:58:34 [Severity.INFO]	finished.
2015-08-19 12:58:34 [Severity.INFO]	Sorting time stamps ...
2015-08-19 12:58:34 [Severity.INFO]	finished.
Nodes:			53
Time-stamped links:	8571
Links/Nodes:		161.71698113207546
Observation period:	[23, 574]
Observation length:	551
Time stamps:		350
Avg. inter-event dt:	1.5787965616
Min/Max inter-event dt:	1/99
Max Time Diff (delta):	1
Two-paths:		not calculated
First-order network:	not constructed
Second-order network:	not constructed

Again, the entropy growth rate ratio smaller than one confirms that the temporal network exhibits non-Markovian characteristics that are likely to change causality.

In [37]:
print("Entropy growth rate ratio is", tn.Measures.EntropyGrowthRateRatio(t))
2015-08-19 12:58:34 [Severity.INFO]	Extracting two-paths for delta = 1...
2015-08-19 12:58:34 [Severity.INFO]	finished.
2015-08-19 12:58:34 [Severity.INFO]	Constructing second-order aggregate network ...
2015-08-19 12:58:34 [Severity.INFO]	finished.
2015-08-19 12:58:34 [Severity.INFO]	Constructing first-order aggregate network ...
2015-08-19 12:58:34 [Severity.INFO]	finished.
Entropy growth rate ratio is 0.700588097083

Based on spectral properties, we analytically predict these characteristics to slow down diffusion by a factor of about 5.75.

In [38]:
print("Analytical slow-down factor for diffusion is", tn.Measures.SlowDownFactor(t))
Analytical slow-down factor for diffusion is 5.7450563705

Again, we empirically confirm that this prediction is reasonable ...

In [39]:
speed_g2 = tn.Processes.RWDiffusion(t.igraphSecondOrder().components(mode="strong").giant(), epsilon=1e-10)
speed_g2n = tn.Processes.RWDiffusion(t.igraphSecondOrderNull().components(mode="strong").giant(), epsilon=1e-10)

print("Empirical slow-down factor for diffusion is", speed_g2/speed_g2n)
Empirical slow-down factor for diffusion is 5.407657657657658

We next use the RealityMining data set, covering proximity interactions between students at MIT. We read it as a temporal network.

In [40]:
t = tn.readFile('data/tutorial/RealityMining_agg_300s_scc.tedges', sep=' ')
print(t.Summary())
2015-08-19 12:58:35 [Severity.INFO]	Reading time-stamped links ...
2015-08-19 12:58:35 [Severity.INFO]	finished.
2015-08-19 12:58:35 [Severity.INFO]	Building index data structures ...
2015-08-19 12:58:36 [Severity.INFO]	finished.
2015-08-19 12:58:36 [Severity.INFO]	Sorting time stamps ...
2015-08-19 12:58:36 [Severity.INFO]	finished.
Nodes:			58
Time-stamped links:	24984
Links/Nodes:		430.7586206896552
Observation period:	[0, 6215]
Observation length:	6215
Time stamps:		5961
Avg. inter-event dt:	1.0427852349
Min/Max inter-event dt:	1/185
Max Time Diff (delta):	1
Two-paths:		not calculated
First-order network:	not constructed
Second-order network:	not constructed

And again, the temporal sequence deviates from a Markovian temporal network and non-Markovian characteristics are expected to slow down diffusion by a factor of about 7.77, which we confirm empirically.

In [41]:
print("Entropy growth rate ratio is", tn.Measures.EntropyGrowthRateRatio(t))
2015-08-19 12:58:36 [Severity.INFO]	Extracting two-paths for delta = 1...
2015-08-19 12:58:36 [Severity.INFO]	finished.
2015-08-19 12:58:36 [Severity.INFO]	Constructing second-order aggregate network ...
2015-08-19 12:58:36 [Severity.INFO]	finished.
2015-08-19 12:58:36 [Severity.INFO]	Constructing first-order aggregate network ...
2015-08-19 12:58:36 [Severity.INFO]	finished.
Entropy growth rate ratio is 0.392959580242
In [42]:
print("Analytical slow-down factor for diffusion is", tn.Measures.SlowDownFactor(t))
Analytical slow-down factor for diffusion is 7.77413007763
In [43]:
speed_g2 = tn.Processes.RWDiffusion(t.igraphSecondOrder().components(mode="strong").giant(), epsilon=1e-6)
speed_g2n = tn.Processes.RWDiffusion(t.igraphSecondOrderNull().components(mode="strong").giant(), epsilon=1e-6)

print("Empirical slow-down factor for diffusion is", speed_g2/speed_g2n)
Empirical slow-down factor for diffusion is 7.2079942556247

Finally, we also find examples for temporal networks in which non-Markovian characteristics result in a speed-up. For this, we consider a data set of time-stamped passenger flows in the London Tube network.

In [44]:
t = tn.readFile('data/tutorial/tube_flows_scc.tedges', sep=' ')
print(t.Summary())
2015-08-19 12:58:39 [Severity.INFO]	Reading time-stamped links ...
2015-08-19 12:58:40 [Severity.INFO]	finished.
2015-08-19 12:58:40 [Severity.INFO]	Building index data structures ...
2015-08-19 12:58:43 [Severity.INFO]	finished.
2015-08-19 12:58:43 [Severity.INFO]	Sorting time stamps ...
2015-08-19 12:58:43 [Severity.INFO]	finished.
Nodes:			132
Time-stamped links:	225372
Links/Nodes:		1707.3636363636363
Observation period:	[8, 358154]
Observation length:	358146
Time stamps:		225372
Avg. inter-event dt:	1.58913968523
Min/Max inter-event dt:	1/34
Max Time Diff (delta):	1
Two-paths:		not calculated
First-order network:	not constructed
Second-order network:	not constructed

Interestingly, here non-Markovian characteristics result in a speed-up of diffusion expressed by a slow-down factor smaller than one.

In [45]:
print("Entropy growth rate ratio is", tn.Measures.EntropyGrowthRateRatio(t))
print("Analytical slow-down factor for diffusion is", tn.Measures.SlowDownFactor(t))
2015-08-19 12:58:43 [Severity.INFO]	Extracting two-paths for delta = 1...
2015-08-19 12:58:46 [Severity.INFO]	finished.
2015-08-19 12:58:46 [Severity.INFO]	Constructing second-order aggregate network ...
2015-08-19 12:58:46 [Severity.INFO]	finished.
2015-08-19 12:58:46 [Severity.INFO]	Constructing first-order aggregate network ...
2015-08-19 12:58:47 [Severity.INFO]	finished.
Entropy growth rate ratio is 0.282201408877
Analytical slow-down factor for diffusion is 0.230196514165

This sounds counter-intuitive, the mere reordering of time-stamped links results in a speed up. Let us validate this by means of a simulation.

In [46]:
speed_g2 = tn.Processes.RWDiffusion(t.igraphSecondOrder().components(mode="strong").giant(), epsilon=1e-6)
speed_g2n = tn.Processes.RWDiffusion(t.igraphSecondOrderNull().components(mode="strong").giant(), epsilon=1e-6)
print("Empirical slow-down factor for diffusion is", speed_g2/speed_g2n)
Empirical slow-down factor for diffusion is 0.23787772312016864

That's interesting, but what exactly is happening there? We can get a better idea about this, by actually looking at a visualization of a diffusion process. For this purpose, pyTempNet comes with a function that allows to conveniently generate a video of the diffusion in a temporal network. The idea is that the process will be visualized in the first-order time-aggregated network, while the diffusion dynamics can either follow a second-order or a first-order Markov model. By this, we are able to visualize the effect of order correlations on the dynamical process.

In [47]:
g1 = t.igraphFirstOrder()

visual_style = {}
visual_style["edge_arrow_size"] = 0.4
visual_style["vertex_color"] = "lightblue"
visual_style["vertex_size"] = 10
visual_style["edge_width"] = [np.log(x)/4 for x in g1.es()["weight"]]
visual_style["layout"] = g1.layout_auto()

tn.exportDiffusionVideo(t, 'tutorial/LT_diffusion_t2.mp4', visual_style, steps = 200, initial_index=0, model='SECOND')
2015-08-19 12:58:48 [Severity.INFO]	Calculating diffusion dynamics in non-Markovian temporal network ...
2015-08-19 12:58:48 [Severity.INFO]	Step 0 TVD =0.999208285297
2015-08-19 12:58:53 [Severity.INFO]	Step 50 TVD =0.0784047221785
2015-08-19 12:58:58 [Severity.INFO]	Step 100 TVD =0.00563876981434
2015-08-19 12:59:03 [Severity.INFO]	Step 150 TVD =0.000428278867121
2015-08-19 12:59:08 [Severity.INFO]	finished.
2015-08-19 12:59:08 [Severity.INFO]	Encoding video ...
2015-08-19 12:59:11 [Severity.INFO]	finished.

This function will output the evolution of total variation distance on the console every 50 steps, so that you can check whether the chosen number of steps was sufficient for the diffusion process to converge to a reasonable degree. The function also generates video frames, are internally encoded into a video using imagemagick's convert tool. If you would like, you could actually call the function exportDiffusionMovieFrames and convert the frames into a video yourself with this command:

> convert -delay 10 frames/LT_diffusion_t2_frame* output/LT_t2.mp4

However, here we save us this effort and directly embed the video resulting from the method call above. This video shows how a diffusion process evolves in the temporal network based on the actual two-path statistics in the data.

In [48]:
showVideo('tutorial/LT_diffusion_t2.mp4')
Out[48]:

In the video, we observe a fast diffusion through the network. Let us now compare this to a diffusion process based on the first-order time-aggregated network, i.e. based on the two-path statistics that we would expect. We could do this in the same way as above, i.e. generating a video and embedding it below. However there is a nice shortcut that directly produces a comparison video, in which a diffusion process is running side-by-side in a Markovian and non-Markovian temporal network. This allows you to quickly visually explore how non-Markovian characteristics influence a dynamical process in your data. We can simply do this by the following method call:

In [49]:
tn.exportDiffusionComparisonVideo(t, 'tutorial/diffusion_comparison.mp4', visual_style, steps = 200, initial_index=0)
2015-08-19 13:00:57 [Severity.INFO]	Calculating diffusion dynamics in non-Markovian temporal network
2015-08-19 13:00:57 [Severity.INFO]	Step 0 TVD =0.999208285297
2015-08-19 13:01:02 [Severity.INFO]	Step 50 TVD =0.0784047221785
2015-08-19 13:01:07 [Severity.INFO]	Step 100 TVD =0.00563876981434
2015-08-19 13:01:12 [Severity.INFO]	Step 150 TVD =0.000428278867121
2015-08-19 13:01:17 [Severity.INFO]	finished.
2015-08-19 13:01:17 [Severity.INFO]	Calculating diffusion dynamics in Markovian temporal network ...
2015-08-19 13:01:17 [Severity.INFO]	Step 0 TVD =0.999208285297
2015-08-19 13:01:23 [Severity.INFO]	Step 50 TVD =0.270497339842
2015-08-19 13:01:28 [Severity.INFO]	Step 100 TVD =0.132012191175
2015-08-19 13:01:33 [Severity.INFO]	Step 150 TVD =0.0720745505823
2015-08-19 13:01:38 [Severity.INFO]	finished.
2015-08-19 13:01:38 [Severity.INFO]	Stitching video frames ...
2015-08-19 13:02:21 [Severity.INFO]	finished.
2015-08-19 13:02:21 [Severity.INFO]	Encoding video ...
2015-08-19 13:02:26 [Severity.INFO]	finished.

Neat! Now we can directly watch the diffusion processes side-by-side. Note that in the current implementation, the non-Markovian temporal network is shown on the left, while the Markovian one is shown on the right.

In [51]:
showVideo('tutorial/diffusion_comparison.mp4')
Out[51]: