Tutorial: GraphRicciCurvature

This is a walk through tutorial of GraphRicciCurvature, and a demonstration of how to apply Ricci curvature for various tasks such as community detection. Please make sure you have the latest package to run this tutorial.

  • Try this tutorial with interactive jupyter notebooks:
    • Binder
    • Open In Colab (Faster, but Google account required.)

Preparation:

Load library

[1]:
# # colab setting
!pip install GraphRicciCurvature
!pip install scikit-learn

import networkx as nx
import numpy as np
import math
import importlib

# matplotlib setting
%matplotlib inline
import matplotlib.pyplot as plt

# to print logs in jupyter notebook
import logging
logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.ERROR)

# load GraphRicciCuravture package
from GraphRicciCurvature.OllivierRicci import OllivierRicci

# load python-louvain for modularity computation
import community as community_louvain

# for ARI computation
from sklearn import preprocessing, metrics

Requirement already satisfied: GraphRicciCurvature in /usr/local/lib/python3.8/site-packages (0.5.1)
Requirement already satisfied: python-louvain in /usr/local/lib/python3.8/site-packages (from GraphRicciCurvature) (0.14)
Requirement already satisfied: cvxpy in /usr/local/lib/python3.8/site-packages (from GraphRicciCurvature) (1.1.3)
Requirement already satisfied: cython in /usr/local/lib/python3.8/site-packages (from GraphRicciCurvature) (0.29.21)
Requirement already satisfied: numpy in /usr/local/lib/python3.8/site-packages (from GraphRicciCurvature) (1.19.1)
Requirement already satisfied: packaging in /usr/local/lib/python3.8/site-packages (from GraphRicciCurvature) (20.4)
Requirement already satisfied: pot in /usr/local/lib/python3.8/site-packages (from GraphRicciCurvature) (0.7.0)
Requirement already satisfied: networkit>=6.1 in /usr/local/lib/python3.8/site-packages (from GraphRicciCurvature) (7.0)
Requirement already satisfied: scipy>=1.0 in /usr/local/lib/python3.8/site-packages (from GraphRicciCurvature) (1.5.2)
Requirement already satisfied: networkx>=2.0 in /usr/local/lib/python3.8/site-packages (from GraphRicciCurvature) (2.4)
Requirement already satisfied: osqp>=0.4.1 in /usr/local/lib/python3.8/site-packages (from cvxpy->GraphRicciCurvature) (0.6.1)
Requirement already satisfied: ecos>=2 in /usr/local/lib/python3.8/site-packages (from cvxpy->GraphRicciCurvature) (2.0.7.post1)
Requirement already satisfied: scs>=1.1.3 in /usr/local/lib/python3.8/site-packages (from cvxpy->GraphRicciCurvature) (2.1.2)
Requirement already satisfied: six in /usr/local/lib/python3.8/site-packages (from packaging->GraphRicciCurvature) (1.15.0)
Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.8/site-packages (from packaging->GraphRicciCurvature) (2.4.7)
Requirement already satisfied: decorator>=4.3.0 in /usr/local/lib/python3.8/site-packages (from networkx>=2.0->GraphRicciCurvature) (4.4.2)
Requirement already satisfied: future in /usr/local/lib/python3.8/site-packages (from osqp>=0.4.1->cvxpy->GraphRicciCurvature) (0.18.2)
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.8/site-packages (0.23.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.8/site-packages (from scikit-learn) (2.1.0)
Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.8/site-packages (from scikit-learn) (0.16.0)
Requirement already satisfied: scipy>=0.19.1 in /usr/local/lib/python3.8/site-packages (from scikit-learn) (1.5.2)
Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib/python3.8/site-packages (from scikit-learn) (1.19.1)

Load sample graph

  • First, let’s load karate club graph from networkx as an example.
[2]:
G = nx.karate_club_graph()
[3]:
print(nx.info(G))
Name: Zachary's Karate Club
Type: Graph
Number of nodes: 34
Number of edges: 78
Average degree:   4.5882

Ollivier-Ricci curvature

  • Ricci curvature is a geometric property to describe the local shape of an object. In graph, an edge with positive curvature represents an edge within a cluster, while a negative curvature edge tents to be a bridge within clusters.
  • To compute the Ollivier-Ricci curvature of a graph, we can use the class OllivierRicci to load the graph and setup the parameter for the curvature computation.
[4]:
orc = OllivierRicci(G, alpha=0.5, verbose="TRACE")
  • After setup the class orc, we can call compute_ricci_curvature to start the Ricci curvature computation. The graph in the class with results will be updated.
[5]:
orc.compute_ricci_curvature()
G_orc = orc.G.copy()  # save an intermediate result
INFO:Edge weight not detected in graph, use "weight" as default edge weight.
TRACE:Number of nodes: 34
TRACE:Number of edges: 78
TRACE:Start to compute all pair shortest path.
TRACE:0.002087 secs for all pair by NetworKit.
INFO:0.231934 secs for Ricci curvature computation.
  • The computed edge Ricci curvature is then stored in the returned networkx graph G_orc. Let’s check the first five results and plot the histogram of the edge Ricci curvature distributions.
  • The Ricci curvature distributions is shown to be different from graph to graph, and can be act as a graph fingerprint or graph kernel.
[6]:
def show_results(G):
    # Print the first five results
    print("Karate Club Graph, first 5 edges: ")
    for n1,n2 in list(G.edges())[:5]:
        print("Ollivier-Ricci curvature of edge (%s,%s) is %f" % (n1 ,n2, G[n1][n2]["ricciCurvature"]))

    # Plot the histogram of Ricci curvatures
    plt.subplot(2, 1, 1)
    ricci_curvtures = nx.get_edge_attributes(G, "ricciCurvature").values()
    plt.hist(ricci_curvtures,bins=20)
    plt.xlabel('Ricci curvature')
    plt.title("Histogram of Ricci Curvatures (Karate Club)")

    # Plot the histogram of edge weights
    plt.subplot(2, 1, 2)
    weights = nx.get_edge_attributes(G, "weight").values()
    plt.hist(weights,bins=20)
    plt.xlabel('Edge weight')
    plt.title("Histogram of Edge weights (Karate Club)")

    plt.tight_layout()

show_results(G_orc)
Karate Club Graph, first 5 edges:
Ollivier-Ricci curvature of edge (0,1) is 0.110614
Ollivier-Ricci curvature of edge (0,2) is -0.144027
Ollivier-Ricci curvature of edge (0,3) is 0.041481
Ollivier-Ricci curvature of edge (0,4) is -0.114599
Ollivier-Ricci curvature of edge (0,5) is -0.281267
_images/tutorial_11_1.png
[ ]:

Ricci flow

  • Ricci flow is an iterative process that aims to smooth out the curvatures of the input graph by adjusting the edges weight, it stretches edges of large negative Ricci curvature and shrinks edges of large positive Ricci curvature over time.
  • To compute the Ollivier-Ricci flow, simply call function compute_ricci_flow to start the flow process. The iterations of flow is controlled by variable iterations.
[7]:
# Start a Ricci flow with Lin-Yau's probability distribution setting with 4 process.
orf = OllivierRicci(G, alpha=0.5, base=1, exp_power=0, proc=4, verbose="INFO")

# Do Ricci flow for 2 iterations
orf.compute_ricci_flow(iterations=2)
INFO:No ricciCurvature detected, compute original_RC...
INFO:Edge weight not detected in graph, use "weight" as default edge weight.
INFO:0.209314 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 0 ===
INFO:0.205769 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 1 ===
INFO:0.245885 secs for Ricci curvature computation.
INFO:0.682418 secs for Ricci flow computation.
[7]:
<networkx.classes.graph.Graph at 0x135d4c250>
  • After “enough” Ricci flow iterations, the Ricci curvature of the graph will be converged to some value. In our experience, most of graphs need around 20~50 iterations.
  • Now let’s do more Ricci flow to flatten the Ricci curvature and refine the Ricci flow metric.
[8]:
orf.set_verbose("ERROR") # mute logs
orf.compute_ricci_flow(iterations=50)
G_rf = orf.G.copy()
  • After Ricci flow, the edge weights (Ricci flow metrics) changed and the edge Ricci curvatures are converged to \(-0.0115\).
[9]:
show_results(G_rf)
Karate Club Graph, first 5 edges:
Ollivier-Ricci curvature of edge (0,1) is -0.011554
Ollivier-Ricci curvature of edge (0,2) is -0.011540
Ollivier-Ricci curvature of edge (0,3) is -0.011551
Ollivier-Ricci curvature of edge (0,4) is -0.011558
Ollivier-Ricci curvature of edge (0,5) is -0.011562
_images/tutorial_18_1.png
[ ]:

Ricci flow for Community detection

In this section, we will show you how to use Ricci flow metric to detect community. If you wish to know how to use the build-in function directly, please jump to Ricci community section.

Preliminary

Visualized the communities

  • We can apply Ricci flow to detect communities in graph.
  • To visualized the communities in graph, let’s first draw the graph and color the nodes with its community.
[10]:
def draw_graph(G, clustering_label="club"):
    """
    A helper function to draw a nx graph with community.
    """
    complex_list = nx.get_node_attributes(G, clustering_label)

    le = preprocessing.LabelEncoder()
    node_color = le.fit_transform(list(complex_list.values()))

    nx.draw_spring(G,seed=0, nodelist=G.nodes(),
                   node_color=node_color,
                   cmap=plt.cm.rainbow,
                   alpha=0.8)

draw_graph(G_rf)
_images/tutorial_21_0.png

Detect the communities

  • To detect the communities of the graph, we can try a simple but effected method: remove all edges with weight greater than a threshold.
  • By observing the histogram of edge weights, let’s set the threshold to be 1.5 or 1.0, and apply modularity and Adjust Rand Index(ARI) as clustering metrics to evaluate the clustering result.
    • Modularity: A clustering metrics define without ground-truth.
    • ARI: A clustering metrics define with ground-truth.
[11]:
def ARI(G, clustering, clustering_label="club"):
    """
    Computer the Adjust Rand Index (clustering accuracy) of "clustering" with "clustering_label" as ground truth.

    Parameters
    ----------
    G : NetworkX graph
        A given NetworkX graph with node attribute "clustering_label" as ground truth.
    clustering : dict or list or list of set
        Predicted community clustering.
    clustering_label : str
        Node attribute name for ground truth.

    Returns
    -------
    ari : float
        Adjust Rand Index for predicted community.
    """

    complex_list = nx.get_node_attributes(G, clustering_label)

    le = preprocessing.LabelEncoder()
    y_true = le.fit_transform(list(complex_list.values()))

    if isinstance(clustering, dict):
        # python-louvain partition format
        y_pred = np.array([clustering[v] for v in complex_list.keys()])
    elif isinstance(clustering[0], set):
        # networkx partition format
        predict_dict = {c: idx for idx, comp in enumerate(clustering) for c in comp}
        y_pred = np.array([predict_dict[v] for v in complex_list.keys()])
    elif isinstance(clustering, list):
        # sklearn partition format
        y_pred = clustering
    else:
        return -1

    return metrics.adjusted_rand_score(y_true, y_pred)



def my_surgery(G_origin: nx.Graph(), weight="weight", cut=0):
    """A simple surgery function that remove the edges with weight above a threshold

    Parameters
    ----------
    G_origin : NetworkX graph
        A graph with ``weight`` as Ricci flow metric to cut.
    weight: str
        The edge weight used as Ricci flow metric. (Default value = "weight")
    cut: float
        Manually assigned cutoff point.

    Returns
    -------
    G : NetworkX graph
        A graph after surgery.
    """
    G = G_origin.copy()
    w = nx.get_edge_attributes(G, weight)

    assert cut >= 0, "Cut value should be greater than 0."
    if not cut:
        cut = (max(w.values()) - 1.0) * 0.6 + 1.0  # Guess a cut point as default

    to_cut = []
    for n1, n2 in G.edges():
        if G[n1][n2][weight] > cut:
            to_cut.append((n1, n2))
    print("*************** Surgery time ****************")
    print("* Cut %d edges." % len(to_cut))
    G.remove_edges_from(to_cut)
    print("* Number of nodes now: %d" % G.number_of_nodes())
    print("* Number of edges now: %d" % G.number_of_edges())
    cc = list(nx.connected_components(G))
    print("* Modularity now: %f " % nx.algorithms.community.quality.modularity(G, cc))
    print("* ARI now: %f " % ARI(G, cc))
    print("*********************************************")

    return G
[12]:
draw_graph(my_surgery(G_rf, cut=1.5))
*************** Surgery time ****************
* Cut 12 edges.
* Number of nodes now: 34
* Number of edges now: 66
* Modularity now: 0.060343
* ARI now: 0.072402
*********************************************
_images/tutorial_24_1.png
[13]:
draw_graph(my_surgery(G_rf, cut=1.0))
*************** Surgery time ****************
* Cut 39 edges.
* Number of nodes now: 34
* Number of edges now: 39
* Modularity now: 0.692829
* ARI now: 0.321943
*********************************************
_images/tutorial_25_1.png
  • Both of the above cutoffs are not that promising. It seems to be a lot of works to find a good cutoff point, let’s try to automize it by scan through the possible cutoff points and plot the corresponding accuracy.
[14]:
def check_accuracy(G_origin, weight="weight", clustering_label="value", plot_cut=True):
    """To check the clustering quality while cut the edges with weight using different threshold

    Parameters
    ----------
    G_origin : NetworkX graph
        A graph with ``weight`` as Ricci flow metric to cut.
    weight: float
        The edge weight used as Ricci flow metric. (Default value = "weight")
    clustering_label : str
        Node attribute name for ground truth.
    plot_cut: bool
        To plot the good guessed cut or not.

    """
    G = G_origin.copy()
    modularity, ari = [], []
    maxw = max(nx.get_edge_attributes(G, weight).values())
    cutoff_range = np.arange(maxw, 1, -0.025)

    for cutoff in cutoff_range:
        edge_trim_list = []
        for n1, n2 in G.edges():
            if G[n1][n2][weight] > cutoff:
                edge_trim_list.append((n1, n2))
        G.remove_edges_from(edge_trim_list)

        # Get connected component after cut as clustering
        clustering = {c: idx for idx, comp in enumerate(nx.connected_components(G)) for c in comp}

        # Compute modularity and ari
        modularity.append(community_louvain.modularity(clustering, G, weight))
        ari.append(ARI(G, clustering, clustering_label=clustering_label))

    plt.xlim(maxw, 0)
    plt.xlabel("Edge weight cutoff")
    plt.plot(cutoff_range, modularity, alpha=0.8)
    plt.plot(cutoff_range, ari, alpha=0.8)

    if plot_cut:
        good_cut = -1
        mod_last = modularity[-1]
        drop_threshold = 0.01  # at least drop this much to considered as a drop for good_cut

        # check drop from 1 -> maxw
        for i in range(len(modularity) - 1, 0, -1):
            mod_now = modularity[i]
            if mod_last > mod_now > 1e-4 and abs(mod_last - mod_now) / mod_last > drop_threshold:
                if good_cut != -1:
                    print("Other cut:%f, diff:%f, mod_now:%f, mod_last:%f, ari:%f" % (
                        cutoff_range[i + 1], mod_last - mod_now, mod_now, mod_last, ari[i + 1]))
                else:
                    good_cut = cutoff_range[i + 1]
                    print("*Good Cut:%f, diff:%f, mod_now:%f, mod_last:%f, ari:%f" % (
                        good_cut, mod_last - mod_now, mod_now, mod_last, ari[i + 1]))
            mod_last = mod_now

        plt.axvline(x=good_cut, color="red")
        plt.legend(['Modularity', 'Adjust Rand Index', 'Good cut'])
    else:
        plt.legend(['Modularity', 'Adjust Rand Index'])
[15]:
check_accuracy(G_rf,clustering_label="club",plot_cut=False)
_images/tutorial_28_0.png
  • By the figure above, the best accuracy we can have is ~0.55 with ARI when cutoff set to ~1.05. Let’s try to cut the graph with that weight:
[16]:
draw_graph(my_surgery(G_rf, cut=1.05))
*************** Surgery time ****************
* Cut 35 edges.
* Number of nodes now: 34
* Number of edges now: 43
* Modularity now: 0.542141
* ARI now: 0.557313
*********************************************
_images/tutorial_30_1.png
  • The result seems to be better, but it’s still far from the optimal result, do we have some better way to fine-tune the community result?
[ ]:

Optimal transportation and Communities

  • The key of Ollivier-Ricci curvature is optimal transportation. The above example demonstrate the result of Ricci flow with Lin and Yau’s setting of probability distribution for optimal transportation while uniformly distribute probabilities to neighbors. For node \(x\) and it’s neighbor set \(\pi(x)\), they define the probability distribution with \(\alpha\in[0,1]\) as the followings:

    \begin{equation*} m^{\alpha}_x(x_i)= \begin{cases} \alpha & \mbox{ if } x_i = x\\ 1-\alpha & \mbox{ if } x_i \in \pi(x)\\ 0 & \mbox{ otherwise } \end{cases} \end{equation*}
  • We proposed a more generalized setting of probability distributions as the following, where \(C=\sum_{x_i \in \pi(x)} b^{-d(x,x_i)^p}\):

\begin{equation*} m^{\alpha,p}_x(x_i)= \begin{cases} \alpha & \mbox{ if } x_i = x\\ \frac{1-\alpha}{C}\cdot b^{-d(x,x_i)^p} & \mbox{ if } x_i \in \pi(x)\\ 0 & \mbox{ otherwise } \end{cases} \end{equation*}
  • This setting allows us to flavor the probability distribution by the edge weight and come with better result for community detection. If we take \(p=0\), then the distribution fall back to Lin and Yau’s setting, if we set \(b=\exp\) and \(p=2\), then this distribution is similar to “heat diffusion”. It turns out that for most of the case, \((b,p)=(\exp,1)\) and \((b,p)=(\exp,2)\) performs the best for community detection task.

Parameters

alpha: The parameter for the probability distribution, range from [0 ~ 1]. It means the share of mass to leave on the original node. E.g. \(x \rightarrow y\), alpha = 0.4 means 0.4 for \(x\), 0.6 to evenly spread to \(x\)’s nbr. Default: 0.5

base: Base variable for weight distribution. Default: math.e

exp_power: Exponential power for weight distribution. Default: 0

method: Transportation method. [“OTD”, “ATD”, “Sinkhorn”]. Default: Sinkhorn

- "OTD" for Optimal Transportation Distance.
- "ATD" for Average Transportation Distance.
- "Sinkhorn" for OTD approximated Sinkhorn distance (faster).
  • Let’s retry the karate club example with different settings:
[17]:
orf2 = OllivierRicci(G, alpha=0.5, base=math.e, exp_power=1, verbose="ERROR")
orf2.compute_ricci_flow(iterations=50)
G_rf2 = orf2.G.copy()
[18]:
show_results(G_rf2)
Karate Club Graph, first 5 edges:
Ollivier-Ricci curvature of edge (0,1) is -0.019430
Ollivier-Ricci curvature of edge (0,2) is -0.012186
Ollivier-Ricci curvature of edge (0,3) is -0.016822
Ollivier-Ricci curvature of edge (0,4) is -0.020180
Ollivier-Ricci curvature of edge (0,5) is -0.020298
_images/tutorial_36_1.png
[19]:
check_accuracy(G_rf2, clustering_label="club")
*Good Cut:1.108158, diff:0.013447, mod_now:0.342205, mod_last:0.355652, ari:0.590553
_images/tutorial_37_1.png
  • The check_accuracy function above detects the possible good cut point by detecting the drop of the modularity from edge weight 1 -> max edge weight. Notice that in most of the case, the trend of ARI and modularity is similar in cutoff range grater than 1, this gives us a clue when community ground-truth is not available, we can always find the plateau of modularity curve and decide a cutoff point.
  • By check_accuracy, we acquired two cut points at 1.11 and 3.88, in most of the case, the smallest cut point is usually a good clustering, but it might cut the clustering too loose sometimes (because of the community hierarchy), so it’s always a good idea to try different cut point to find the best result.
  • Let’s check the result with cut point 3.88 first, it yields a higher ARI with ~0.78, this is almost the best result we can get for the karate club graph for most of the community detection algorithm.
[20]:
draw_graph(my_surgery(G_rf2, cut=3.88))
*************** Surgery time ****************
* Cut 10 edges.
* Number of nodes now: 34
* Number of edges now: 68
* Modularity now: 0.499999
* ARI now: 0.771626
*********************************************
_images/tutorial_39_1.png
  • Here is the result if we choose the second highest plateau (cut point 1.11) of the modularity curve as the cutoff point, we basically separate the community into a hierarchical clustering.
[21]:
draw_graph(my_surgery(G_rf2, cut=1.11))
*************** Surgery time ****************
* Cut 15 edges.
* Number of nodes now: 34
* Number of edges now: 63
* Modularity now: 0.355652
* ARI now: 0.590553
*********************************************
_images/tutorial_41_1.png

Ricci community

Here let’s demonstrate how to use the function ricci_community to automatically detect cutoff point and community.

General setting

[22]:
from urllib.request import urlopen
from GraphRicciCurvature.util import cut_graph_by_cutoff
[23]:
G_football_raw = nx.read_gexf(urlopen("https://raw.githubusercontent.com/saibalmars/RicciFlow-SampleGraphs/master/football_rf_sinkhorn_e2_20.gexf"))
G_football = G_football_raw.copy()

# Clean up the pre-computed Ricci curvature and Ricci flow to start from the begining.
def clean_graph(G):
    for n1, n2 in G.edges():
        del G[n1][n2]["ricciCurvature"]
        del G[n1][n2]["original_RC"]
    for n in G.nodes():
        del G.nodes[n]["ricciCurvature"]

clean_graph(G_football)
  • Since communities are detected using Ricci flow, let set up a orc_football class and do 10 iterations of Ricci flow.
[24]:
orc_football = OllivierRicci(G_football,verbose="INFO")
[25]:
orc_football.compute_ricci_flow(iterations=10)
INFO:No ricciCurvature detected, compute original_RC...
INFO:1.205510 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 0 ===
INFO:1.212575 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 1 ===
INFO:1.183238 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 2 ===
INFO:1.465902 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 3 ===
INFO:1.468783 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 4 ===
INFO:1.356177 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 5 ===
INFO:1.317229 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 6 ===
INFO:1.304953 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 7 ===
INFO:1.145310 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 8 ===
INFO:1.251217 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 9 ===
INFO:1.455200 secs for Ricci curvature computation.
INFO:14.684404 secs for Ricci flow computation.
[25]:
<networkx.classes.graph.Graph at 0x13605e4f0>
  • After Ricci flow computation, let’s call ricci_community to detect the cutpoint and community clustering by the modularity detection method mentioned before.
  • The function will return a tuple of (cutpoint, dict_of_community).
  • Since the football graph have a simpler community structure, the community clustering can be easily captured with just 10 iterations of Ricci flow. Notice that some more complex graph might take more iterations or even surgery to capture the clustering.
[26]:
cc = orc_football.ricci_community()
INFO:Ricci flow detected, start cutting graph into community...
INFO:Communities detected: 12
  • Let’s draw the detected Ricci community.
[27]:
draw_graph(cut_graph_by_cutoff(orc_football.G,cutoff=cc[0]),clustering_label="value")
_images/tutorial_52_0.png
[28]:
check_accuracy(orc_football.G,clustering_label="value")
*Good Cut:1.834345, diff:0.018870, mod_now:0.803453, mod_last:0.822323, ari:0.889343
Other cut:2.109345, diff:0.025725, mod_now:0.777728, mod_last:0.803453, ari:0.846507
Other cut:2.209345, diff:0.037256, mod_now:0.740472, mod_last:0.777728, ari:0.806941
Other cut:2.384345, diff:0.031135, mod_now:0.712070, mod_last:0.743205, ari:0.578407
Other cut:2.534345, diff:0.040856, mod_now:0.671214, mod_last:0.712070, ari:0.578407
Other cut:2.584345, diff:0.031765, mod_now:0.639449, mod_last:0.671214, ari:0.498880
Other cut:2.659345, diff:0.024489, mod_now:0.630462, mod_last:0.654950, ari:0.498880
Other cut:2.684345, diff:0.020132, mod_now:0.610330, mod_last:0.630462, ari:0.377298
Other cut:2.709345, diff:0.015452, mod_now:0.594878, mod_last:0.610330, ari:0.377298
Other cut:2.809345, diff:0.021080, mod_now:0.573799, mod_last:0.594878, ari:0.377298
Other cut:2.884345, diff:0.136805, mod_now:0.477028, mod_last:0.613833, ari:0.251405
_images/tutorial_53_1.png

Fine tuning

  • Let’s try ricci_community on a graph with less obvious community structure.
[29]:
G_polbooks_raw = nx.read_gexf(urlopen("https://raw.githubusercontent.com/saibalmars/RicciFlow-SampleGraphs/master/polbooks_rf_sinkhorn_e2_20.gexf"))
G_polbooks = G_polbooks_raw.copy()

# Clean up the pre-computed Ricci curvature and Ricci flow to start from the begining.
clean_graph(G_polbooks)
[30]:
orc_polbooks = OllivierRicci(G_polbooks,verbose="INFO")
  • So instead of call compute_ricci_flow, we could also call ricci_community directly, it will automatically compute Ricci flow with 20 iterations.
[31]:
cc = orc_polbooks.ricci_community()
INFO:Ricci flow not detected yet, run Ricci flow with default setting first...
INFO:No ricciCurvature detected, compute original_RC...
INFO:0.287963 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 0 ===
INFO:0.274402 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 1 ===
INFO:0.368255 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 2 ===
INFO:0.304456 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 3 ===
INFO:0.323258 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 4 ===
INFO:0.292736 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 5 ===
INFO:0.258472 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 6 ===
INFO:0.295724 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 7 ===
INFO:0.287693 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 8 ===
INFO:0.271760 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 9 ===
INFO:0.293024 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 10 ===
INFO:0.302731 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 11 ===
INFO:0.279108 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 12 ===
INFO:0.300799 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 13 ===
INFO:0.256582 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 14 ===
INFO:0.257300 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 15 ===
INFO:0.268453 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 16 ===
INFO:0.255423 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 17 ===
INFO:0.269776 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 18 ===
INFO:0.301950 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 19 ===
INFO:0.286648 secs for Ricci curvature computation.
INFO:6.499143 secs for Ricci flow computation.
INFO:Ricci flow detected, start cutting graph into community...
INFO:Communities detected: 8
  • The community is less obvious, so with the good cut we can only get ARI with ~0.37
[32]:
check_accuracy(orc_polbooks.G,clustering_label="value")
*Good Cut:2.385628, diff:0.018401, mod_now:0.559422, mod_last:0.577823, ari:0.374930
Other cut:2.435628, diff:0.013358, mod_now:0.546065, mod_last:0.559422, ari:0.473447
Other cut:2.460628, diff:0.034689, mod_now:0.511375, mod_last:0.546065, ari:0.473447
Other cut:2.485628, diff:0.030596, mod_now:0.480779, mod_last:0.511375, ari:0.473447
Other cut:2.510628, diff:0.012063, mod_now:0.468716, mod_last:0.480779, ari:0.473447
Other cut:2.535628, diff:0.023313, mod_now:0.445402, mod_last:0.468716, ari:0.473447
Other cut:2.560628, diff:0.022216, mod_now:0.423186, mod_last:0.445402, ari:0.473447
Other cut:3.660628, diff:0.060068, mod_now:0.493394, mod_last:0.553462, ari:0.628270
Other cut:4.060628, diff:0.008335, mod_now:0.469584, mod_last:0.477918, ari:0.564356
Other cut:4.135628, diff:0.009010, mod_now:0.457678, mod_last:0.466688, ari:0.564356
Other cut:4.410628, diff:0.016393, mod_now:0.438085, mod_last:0.454478, ari:0.564356
_images/tutorial_60_1.png
[ ]:

  • For graph with weaker community structure, we suggest the followings to get the best community result:
    • Compute with more Ricci flow iterations: for some graph, it might take more iterations for Ricci flow to capture the community structure. For example, karate_club graph need 50+ iterations to capture the community structure.
    • Tuning the exp_power: bigger exp_power means we put more focus on neighbors with larger weights for optimal transportation, exp_power=0 means we treat every neighbors as equal.
  • There is currently no theoretical support to suggest what is the best setting, but we see some good results with 20 iterations and exp_power=2 for both synthetic and real-world graphs.
Tuning Ricci flow iterations
  • Let’s first try to do more iterations for this graph to see what we can get.
  • You will see that the curve is somewhat compressed in the x direction.
[33]:
orc_polbooks.set_verbose("ERROR")
orc_polbooks.compute_ricci_flow()
[33]:
<networkx.classes.graph.Graph at 0x135f377f0>
[34]:
check_accuracy(orc_polbooks.G,clustering_label="value")
*Good Cut:2.575244, diff:0.084283, mod_now:0.573150, mod_last:0.657433, ari:0.384757
Other cut:2.600244, diff:0.010622, mod_now:0.562528, mod_last:0.573150, ari:0.529909
Other cut:2.625244, diff:0.012090, mod_now:0.550438, mod_last:0.562528, ari:0.529909
Other cut:2.650244, diff:0.013006, mod_now:0.537432, mod_last:0.550438, ari:0.529909
Other cut:2.675244, diff:0.026541, mod_now:0.510892, mod_last:0.537432, ari:0.529909
Other cut:2.700244, diff:0.032492, mod_now:0.478400, mod_last:0.510892, ari:0.518298
Other cut:2.775244, diff:0.011332, mod_now:0.475610, mod_last:0.486943, ari:0.628270
Other cut:3.300244, diff:0.045066, mod_now:0.498860, mod_last:0.543927, ari:0.628270
Other cut:3.825244, diff:0.005538, mod_now:0.488423, mod_last:0.493961, ari:0.564356
Other cut:3.900244, diff:0.006708, mod_now:0.481714, mod_last:0.488423, ari:0.564356
Other cut:3.950244, diff:0.010208, mod_now:0.471506, mod_last:0.481714, ari:0.564356
Other cut:3.975244, diff:0.005457, mod_now:0.466049, mod_last:0.471506, ari:0.564356
Other cut:4.400244, diff:0.009465, mod_now:0.450717, mod_last:0.460182, ari:0.564356
Other cut:4.425244, diff:0.006387, mod_now:0.444330, mod_last:0.450717, ari:0.564356
_images/tutorial_65_1.png
[ ]:

Tuning exp_power
  • Let’s try to use exp_power=1 to capture a better result. You can try other value to check the difference of ARI and modularity curve.
[35]:
orc_polbooks1 = OllivierRicci(G_polbooks,exp_power=1,verbose="ERROR")
cc1 = orc_polbooks1.ricci_community()
[36]:
check_accuracy(orc_polbooks1.G,clustering_label="value")
*Good Cut:4.401536, diff:0.069523, mod_now:0.466563, mod_last:0.536086, ari:0.651740
Other cut:4.676536, diff:0.007997, mod_now:0.454704, mod_last:0.462700, ari:0.564356
Other cut:4.876536, diff:0.008469, mod_now:0.437881, mod_last:0.446350, ari:0.564356
Other cut:5.051536, diff:0.004324, mod_now:0.425006, mod_last:0.429329, ari:0.564356
_images/tutorial_69_1.png
[ ]: