Overview

API: GraphRicciCurvature

FormanRicci

A class to compute the Forman-Ricci curvature of a given NetworkX graph.

class GraphRicciCurvature.FormanRicci.FormanRicci(G, verbose='ERROR')[source]
__init__(G, verbose='ERROR')[source]

A class to compute Forman-Ricci curvature for all nodes and edges in G.

Parameters:
  • G (NetworkX graph) – A given NetworkX graph, unweighted graph only for now, edge weight will be ignored.
  • verbose ({"INFO","DEBUG","ERROR"}) –
    Verbose level. (Default value = “ERROR”)
    • ”INFO”: show only iteration process log.
    • ”DEBUG”: show all output logs.
    • ”ERROR”: only show log if error happened.
compute_ricci_curvature()[source]

Compute Forman-ricci curvature for all nodes and edges in G. Node curvature is defined as the average of all it’s adjacency edge.

Returns:G – A NetworkX graph with “formanCurvature” on nodes and edges.
Return type:NetworkX graph

Examples

To compute the Forman-Ricci curvature for karate club graph:

>>> G = nx.karate_club_graph()
>>> frc = FormanRicci(G)
>>> frc.compute_ricci_curvature()
>>> frc.G[0][1]
{'formanCurvature': 0}

OllivierRicci

A class to compute the Ollivier-Ricci curvature of a given NetworkX graph.

class GraphRicciCurvature.OllivierRicci.OllivierRicci(G: <sphinx.ext.autodoc.importer._MockObject object at 0x7f3da2f8c470>, weight='weight', alpha=0.5, method='Sinkhorn', base=2.718281828459045, exp_power=2, proc=2, chunksize=None, shortest_path='all_pairs', cache_maxsize=1000000, nbr_topk=1000, verbose='ERROR')[source]

A class to compute Ollivier-Ricci curvature for all nodes and edges in G. Node Ricci curvature is defined as the average of all it’s adjacency edge.

__init__(G: <sphinx.ext.autodoc.importer._MockObject object at 0x7f3da2f8c470>, weight='weight', alpha=0.5, method='Sinkhorn', base=2.718281828459045, exp_power=2, proc=2, chunksize=None, shortest_path='all_pairs', cache_maxsize=1000000, nbr_topk=1000, verbose='ERROR')[source]

Initialized a container to compute Ollivier-Ricci curvature/flow.

Parameters:
  • G (NetworkX graph) – A given directional or undirectional NetworkX graph.
  • weight (str) – The edge weight used to compute Ricci curvature. (Default value = “weight”)
  • alpha (float) – The parameter for the discrete Ricci curvature, range from 0 ~ 1. It means the share of mass to leave on the original node. E.g. x -> y, alpha = 0.4 means 0.4 for x, 0.6 to evenly spread to x’s nbr. (Default value = 0.5)
  • method ({"OTD", "ATD", "Sinkhorn"}) –

    The optimal transportation distance computation method. (Default value = “Sinkhorn”)

    Transportation method:
    • ”OTD” for Optimal Transportation Distance,
    • ”ATD” for Average Transportation Distance.
    • ”Sinkhorn” for OTD approximated Sinkhorn distance. (faster)
  • base (float) – Base variable for weight distribution. (Default value = math.e)
  • exp_power (float) – Exponential power for weight distribution. (Default value = 2)
  • proc (int) – Number of processor used for multiprocessing. (Default value = cpu_count())
  • chunksize (int) – Chunk size for multiprocessing, set None for auto decide. (Default value = None)
  • shortest_path ({"all_pairs","pairwise"}) – Method to compute shortest path. (Default value = all_pairs)
  • cache_maxsize (int) – Max size for LRU cache for pairwise shortest path computation. Set this to None for unlimited cache. (Default value = 1000000)
  • nbr_topk (int) – Only take the top k edge weight neighbors for density distribution. Smaller k run faster but the result is less accurate. (Default value = 1000)
  • verbose ({"INFO", "TRACE","DEBUG","ERROR"}) –
    Verbose level. (Default value = “ERROR”)
    • ”INFO”: show only iteration process log.
    • ”TRACE”: show detailed iteration process log.
    • ”DEBUG”: show all output logs.
    • ”ERROR”: only show log if error happened.
compute_ricci_curvature()[source]

Compute Ricci curvature of edges and nodes. The node Ricci curvature is defined as the average of node’s adjacency edges.

Returns:G – A NetworkX graph with “ricciCurvature” on nodes and edges.
Return type:NetworkX graph

Examples

To compute the Ollivier-Ricci curvature for karate club graph:

>>> G = nx.karate_club_graph()
>>> orc = OllivierRicci(G, alpha=0.5, verbose="INFO")
>>> orc.compute_ricci_curvature()
>>> orc.G[0][1]
{'weight': 1.0, 'ricciCurvature': 0.11111111071683011}
compute_ricci_curvature_edges(edge_list=None)[source]

Compute Ricci curvature for edges in given edge lists.

Parameters:edge_list (list of edges) – The list of edges to compute Ricci curvature, set to [] to run for all edges in G. (Default value = [])
Returns:output – A dictionary of edge Ricci curvature. E.g.: {(node1, node2): ricciCurvature}.
Return type:dict[(int,int), float]
compute_ricci_flow(iterations=20, step=1, delta=0.0001, surgery=(<function OllivierRicci.<lambda>>, 100))[source]

Compute the given Ricci flow metric of each edge of a given connected NetworkX graph.

Parameters:
  • iterations (int) – Iterations to require Ricci flow metric. (Default value = 100)
  • step (float) – Step size for gradient decent process. (Default value = 1)
  • delta (float) – Process stop when difference of Ricci curvature is within delta. (Default value = 1e-4)
  • surgery ((function, int)) – A tuple of user define surgery function that will execute every certain iterations. (Default value = (lambda G, *args, **kwargs: G, 100))
Returns:

G – A graph with weight as Ricci flow metric.

Return type:

NetworkX graph

Examples

To compute the Ollivier-Ricci flow for karate club graph:

>>> G = nx.karate_club_graph()
>>> orc_OTD = OllivierRicci(G, alpha=0.5, method="OTD", verbose="INFO")
>>> orc_OTD.compute_ricci_flow(iterations=10)
>>> orc_OTD.G[0][1]
{'weight': 0.06399135316908759,
 'ricciCurvature': 0.18608249978652802,
 'original_RC': 0.11111111071683011}
ricci_community(cutoff_step=0.025, drop_threshold=0.01)[source]

Detect community clustering by Ricci flow metric. The communities are detected by the modularity drop while iteratively remove edge weight (Ricci flow metric) from large to small.

Parameters:
  • cutoff_step (float) – The step size to find the good cutoff points.
  • drop_threshold (float) – At least drop this much to considered as a drop for good_cut.
Returns:

  • cutoff (float) – Ricci flow metric weight cutoff for detected community clustering.
  • clustering (dict) – Detected community clustering.

Examples

To compute the Ricci community for karate club graph:

>>> G = nx.karate_club_graph()
>>> orc = OllivierRicci(G, alpha=0.5, verbose="INFO")
>>> orc.compute_ricci_flow(iterations=50)
>>> cc = orc.ricci_community()
>>> print("The detected community label of node 0: %s" % cc[1][0])
The detected community label of node 0: 0
ricci_community_all_possible_clusterings(cutoff_step=0.025, drop_threshold=0.01)[source]

Detect community clustering by Ricci flow metric (all possible clustering guesses). The communities are detected by Modularity drop while iteratively remove edge weight (Ricci flow metric) from large to small.

Parameters:
  • cutoff_step (float) – The step size to find the good cutoff points.
  • drop_threshold (float) – At least drop this much to considered as a drop for good_cut.
Returns:

cc – All detected cutoff and community clusterings pairs. Clusterings are detected by detected cutoff points from large to small. Usually the last one is the best clustering result.

Return type:

list of (float, dict)

Examples

To compute the Ricci community for karate club graph:

>>> G = nx.karate_club_graph()
>>> orc = OllivierRicci(G, alpha=0.5, verbose="INFO")
>>> orc.compute_ricci_flow(iterations=50)
>>> cc = orc.ricci_community_all_possible_clusterings()
>>> print("The number of possible clusterings: %d" % len(cc))
The number of possible clusterings: 3
set_verbose(verbose)[source]

Set the verbose level for this process.

Parameters:verbose ({"INFO", "TRACE","DEBUG","ERROR"}) –
Verbose level. (Default value = “ERROR”)
  • ”INFO”: show only iteration process log.
  • ”TRACE”: show detailed iteration process log.
  • ”DEBUG”: show all output logs.
  • ”ERROR”: only show log if error happened.

util

GraphRicciCurvature.util.cut_graph_by_cutoff(G_origin, cutoff, weight='weight')[source]

Remove graph’s edges with “weight” greater than “cutoff”.

Parameters:
  • G_origin (NetworkX graph) – A graph with weight as Ricci flow metric to cut.
  • cutoff (float) – A threshold to remove all edges with “weight” greater than it.
  • weight (str) – The edge weight used as Ricci flow metric. (Default value = “weight”)
Returns:

G – A graph with edges cut by given cutoff value.

Return type:

NetworkX graph

GraphRicciCurvature.util.get_rf_metric_cutoff(G_origin, weight='weight', cutoff_step=0.025, drop_threshold=0.01)[source]

Get good clustering cutoff points for Ricci flow metric by detect the change of modularity while removing edges.

Parameters:
  • G_origin (NetworkX graph) – A graph with “weight” as Ricci flow metric to cut.
  • weight (str) – The edge weight used as Ricci flow metric. (Default value = “weight”)
  • cutoff_step (float) – The step size to find the good cutoff points.
  • drop_threshold (float) – At least drop this much to considered as a drop for good_cut.
Returns:

good_cuts – A list of possible cutoff point, usually we use the first one as the best cut.

Return type:

list of float

GraphRicciCurvature.util.set_verbose(verbose='ERROR')[source]

Set up the verbose level of the GraphRicciCurvature.

Parameters:verbose ({"INFO", "TRACE","DEBUG","ERROR"}) –
Verbose level. (Default value = “ERROR”)
  • ”INFO”: show only iteration process log.
  • ”TRACE”: show detailed iteration process log.
  • ”DEBUG”: show all output logs.
  • ”ERROR”: only show log if error happened.

Tutorial: GraphRicciCurvature

This is a walk through tutorial of GraphRicciCurvature, and a demonstration of how to apply Ricci curvature for various tasks such as community detection. Please make sure you have the latest package to run this tutorial.

  • Try this tutorial with interactive jupyter notebooks:
    • Binder
    • Open In Colab (Faster, but Google account required.)

Preparation:

Load library

[1]:
# # colab setting
!pip install GraphRicciCurvature
!pip install scikit-learn

import networkx as nx
import numpy as np
import math
import importlib

# matplotlib setting
%matplotlib inline
import matplotlib.pyplot as plt

# to print logs in jupyter notebook
import logging
logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.ERROR)

# load GraphRicciCuravture package
from GraphRicciCurvature.OllivierRicci import OllivierRicci

# load python-louvain for modularity computation
import community as community_louvain

# for ARI computation
from sklearn import preprocessing, metrics

Requirement already satisfied: GraphRicciCurvature in /usr/local/lib/python3.8/site-packages (0.5.1)
Requirement already satisfied: python-louvain in /usr/local/lib/python3.8/site-packages (from GraphRicciCurvature) (0.14)
Requirement already satisfied: cvxpy in /usr/local/lib/python3.8/site-packages (from GraphRicciCurvature) (1.1.3)
Requirement already satisfied: cython in /usr/local/lib/python3.8/site-packages (from GraphRicciCurvature) (0.29.21)
Requirement already satisfied: numpy in /usr/local/lib/python3.8/site-packages (from GraphRicciCurvature) (1.19.1)
Requirement already satisfied: packaging in /usr/local/lib/python3.8/site-packages (from GraphRicciCurvature) (20.4)
Requirement already satisfied: pot in /usr/local/lib/python3.8/site-packages (from GraphRicciCurvature) (0.7.0)
Requirement already satisfied: networkit>=6.1 in /usr/local/lib/python3.8/site-packages (from GraphRicciCurvature) (7.0)
Requirement already satisfied: scipy>=1.0 in /usr/local/lib/python3.8/site-packages (from GraphRicciCurvature) (1.5.2)
Requirement already satisfied: networkx>=2.0 in /usr/local/lib/python3.8/site-packages (from GraphRicciCurvature) (2.4)
Requirement already satisfied: osqp>=0.4.1 in /usr/local/lib/python3.8/site-packages (from cvxpy->GraphRicciCurvature) (0.6.1)
Requirement already satisfied: ecos>=2 in /usr/local/lib/python3.8/site-packages (from cvxpy->GraphRicciCurvature) (2.0.7.post1)
Requirement already satisfied: scs>=1.1.3 in /usr/local/lib/python3.8/site-packages (from cvxpy->GraphRicciCurvature) (2.1.2)
Requirement already satisfied: six in /usr/local/lib/python3.8/site-packages (from packaging->GraphRicciCurvature) (1.15.0)
Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.8/site-packages (from packaging->GraphRicciCurvature) (2.4.7)
Requirement already satisfied: decorator>=4.3.0 in /usr/local/lib/python3.8/site-packages (from networkx>=2.0->GraphRicciCurvature) (4.4.2)
Requirement already satisfied: future in /usr/local/lib/python3.8/site-packages (from osqp>=0.4.1->cvxpy->GraphRicciCurvature) (0.18.2)
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.8/site-packages (0.23.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.8/site-packages (from scikit-learn) (2.1.0)
Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.8/site-packages (from scikit-learn) (0.16.0)
Requirement already satisfied: scipy>=0.19.1 in /usr/local/lib/python3.8/site-packages (from scikit-learn) (1.5.2)
Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib/python3.8/site-packages (from scikit-learn) (1.19.1)

Load sample graph

  • First, let’s load karate club graph from networkx as an example.
[2]:
G = nx.karate_club_graph()
[3]:
print(nx.info(G))
Name: Zachary's Karate Club
Type: Graph
Number of nodes: 34
Number of edges: 78
Average degree:   4.5882

Ollivier-Ricci curvature

  • Ricci curvature is a geometric property to describe the local shape of an object. In graph, an edge with positive curvature represents an edge within a cluster, while a negative curvature edge tents to be a bridge within clusters.
  • To compute the Ollivier-Ricci curvature of a graph, we can use the class OllivierRicci to load the graph and setup the parameter for the curvature computation.
[4]:
orc = OllivierRicci(G, alpha=0.5, verbose="TRACE")
  • After setup the class orc, we can call compute_ricci_curvature to start the Ricci curvature computation. The graph in the class with results will be updated.
[5]:
orc.compute_ricci_curvature()
G_orc = orc.G.copy()  # save an intermediate result
INFO:Edge weight not detected in graph, use "weight" as default edge weight.
TRACE:Number of nodes: 34
TRACE:Number of edges: 78
TRACE:Start to compute all pair shortest path.
TRACE:0.002087 secs for all pair by NetworKit.
INFO:0.231934 secs for Ricci curvature computation.
  • The computed edge Ricci curvature is then stored in the returned networkx graph G_orc. Let’s check the first five results and plot the histogram of the edge Ricci curvature distributions.
  • The Ricci curvature distributions is shown to be different from graph to graph, and can be act as a graph fingerprint or graph kernel.
[6]:
def show_results(G):
    # Print the first five results
    print("Karate Club Graph, first 5 edges: ")
    for n1,n2 in list(G.edges())[:5]:
        print("Ollivier-Ricci curvature of edge (%s,%s) is %f" % (n1 ,n2, G[n1][n2]["ricciCurvature"]))

    # Plot the histogram of Ricci curvatures
    plt.subplot(2, 1, 1)
    ricci_curvtures = nx.get_edge_attributes(G, "ricciCurvature").values()
    plt.hist(ricci_curvtures,bins=20)
    plt.xlabel('Ricci curvature')
    plt.title("Histogram of Ricci Curvatures (Karate Club)")

    # Plot the histogram of edge weights
    plt.subplot(2, 1, 2)
    weights = nx.get_edge_attributes(G, "weight").values()
    plt.hist(weights,bins=20)
    plt.xlabel('Edge weight')
    plt.title("Histogram of Edge weights (Karate Club)")

    plt.tight_layout()

show_results(G_orc)
Karate Club Graph, first 5 edges:
Ollivier-Ricci curvature of edge (0,1) is 0.110614
Ollivier-Ricci curvature of edge (0,2) is -0.144027
Ollivier-Ricci curvature of edge (0,3) is 0.041481
Ollivier-Ricci curvature of edge (0,4) is -0.114599
Ollivier-Ricci curvature of edge (0,5) is -0.281267
_images/tutorial_11_1.png
[ ]:

Ricci flow

  • Ricci flow is an iterative process that aims to smooth out the curvatures of the input graph by adjusting the edges weight, it stretches edges of large negative Ricci curvature and shrinks edges of large positive Ricci curvature over time.
  • To compute the Ollivier-Ricci flow, simply call function compute_ricci_flow to start the flow process. The iterations of flow is controlled by variable iterations.
[7]:
# Start a Ricci flow with Lin-Yau's probability distribution setting with 4 process.
orf = OllivierRicci(G, alpha=0.5, base=1, exp_power=0, proc=4, verbose="INFO")

# Do Ricci flow for 2 iterations
orf.compute_ricci_flow(iterations=2)
INFO:No ricciCurvature detected, compute original_RC...
INFO:Edge weight not detected in graph, use "weight" as default edge weight.
INFO:0.209314 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 0 ===
INFO:0.205769 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 1 ===
INFO:0.245885 secs for Ricci curvature computation.
INFO:0.682418 secs for Ricci flow computation.
[7]:
<networkx.classes.graph.Graph at 0x135d4c250>
  • After “enough” Ricci flow iterations, the Ricci curvature of the graph will be converged to some value. In our experience, most of graphs need around 20~50 iterations.
  • Now let’s do more Ricci flow to flatten the Ricci curvature and refine the Ricci flow metric.
[8]:
orf.set_verbose("ERROR") # mute logs
orf.compute_ricci_flow(iterations=50)
G_rf = orf.G.copy()
  • After Ricci flow, the edge weights (Ricci flow metrics) changed and the edge Ricci curvatures are converged to \(-0.0115\).
[9]:
show_results(G_rf)
Karate Club Graph, first 5 edges:
Ollivier-Ricci curvature of edge (0,1) is -0.011554
Ollivier-Ricci curvature of edge (0,2) is -0.011540
Ollivier-Ricci curvature of edge (0,3) is -0.011551
Ollivier-Ricci curvature of edge (0,4) is -0.011558
Ollivier-Ricci curvature of edge (0,5) is -0.011562
_images/tutorial_18_1.png
[ ]:

Ricci flow for Community detection

In this section, we will show you how to use Ricci flow metric to detect community. If you wish to know how to use the build-in function directly, please jump to Ricci community section.

Preliminary

Visualized the communities
  • We can apply Ricci flow to detect communities in graph.
  • To visualized the communities in graph, let’s first draw the graph and color the nodes with its community.
[10]:
def draw_graph(G, clustering_label="club"):
    """
    A helper function to draw a nx graph with community.
    """
    complex_list = nx.get_node_attributes(G, clustering_label)

    le = preprocessing.LabelEncoder()
    node_color = le.fit_transform(list(complex_list.values()))

    nx.draw_spring(G,seed=0, nodelist=G.nodes(),
                   node_color=node_color,
                   cmap=plt.cm.rainbow,
                   alpha=0.8)

draw_graph(G_rf)
_images/tutorial_21_0.png
Detect the communities
  • To detect the communities of the graph, we can try a simple but effected method: remove all edges with weight greater than a threshold.
  • By observing the histogram of edge weights, let’s set the threshold to be 1.5 or 1.0, and apply modularity and Adjust Rand Index(ARI) as clustering metrics to evaluate the clustering result.
    • Modularity: A clustering metrics define without ground-truth.
    • ARI: A clustering metrics define with ground-truth.
[11]:
def ARI(G, clustering, clustering_label="club"):
    """
    Computer the Adjust Rand Index (clustering accuracy) of "clustering" with "clustering_label" as ground truth.

    Parameters
    ----------
    G : NetworkX graph
        A given NetworkX graph with node attribute "clustering_label" as ground truth.
    clustering : dict or list or list of set
        Predicted community clustering.
    clustering_label : str
        Node attribute name for ground truth.

    Returns
    -------
    ari : float
        Adjust Rand Index for predicted community.
    """

    complex_list = nx.get_node_attributes(G, clustering_label)

    le = preprocessing.LabelEncoder()
    y_true = le.fit_transform(list(complex_list.values()))

    if isinstance(clustering, dict):
        # python-louvain partition format
        y_pred = np.array([clustering[v] for v in complex_list.keys()])
    elif isinstance(clustering[0], set):
        # networkx partition format
        predict_dict = {c: idx for idx, comp in enumerate(clustering) for c in comp}
        y_pred = np.array([predict_dict[v] for v in complex_list.keys()])
    elif isinstance(clustering, list):
        # sklearn partition format
        y_pred = clustering
    else:
        return -1

    return metrics.adjusted_rand_score(y_true, y_pred)



def my_surgery(G_origin: nx.Graph(), weight="weight", cut=0):
    """A simple surgery function that remove the edges with weight above a threshold

    Parameters
    ----------
    G_origin : NetworkX graph
        A graph with ``weight`` as Ricci flow metric to cut.
    weight: str
        The edge weight used as Ricci flow metric. (Default value = "weight")
    cut: float
        Manually assigned cutoff point.

    Returns
    -------
    G : NetworkX graph
        A graph after surgery.
    """
    G = G_origin.copy()
    w = nx.get_edge_attributes(G, weight)

    assert cut >= 0, "Cut value should be greater than 0."
    if not cut:
        cut = (max(w.values()) - 1.0) * 0.6 + 1.0  # Guess a cut point as default

    to_cut = []
    for n1, n2 in G.edges():
        if G[n1][n2][weight] > cut:
            to_cut.append((n1, n2))
    print("*************** Surgery time ****************")
    print("* Cut %d edges." % len(to_cut))
    G.remove_edges_from(to_cut)
    print("* Number of nodes now: %d" % G.number_of_nodes())
    print("* Number of edges now: %d" % G.number_of_edges())
    cc = list(nx.connected_components(G))
    print("* Modularity now: %f " % nx.algorithms.community.quality.modularity(G, cc))
    print("* ARI now: %f " % ARI(G, cc))
    print("*********************************************")

    return G
[12]:
draw_graph(my_surgery(G_rf, cut=1.5))
*************** Surgery time ****************
* Cut 12 edges.
* Number of nodes now: 34
* Number of edges now: 66
* Modularity now: 0.060343
* ARI now: 0.072402
*********************************************
_images/tutorial_24_1.png
[13]:
draw_graph(my_surgery(G_rf, cut=1.0))
*************** Surgery time ****************
* Cut 39 edges.
* Number of nodes now: 34
* Number of edges now: 39
* Modularity now: 0.692829
* ARI now: 0.321943
*********************************************
_images/tutorial_25_1.png
  • Both of the above cutoffs are not that promising. It seems to be a lot of works to find a good cutoff point, let’s try to automize it by scan through the possible cutoff points and plot the corresponding accuracy.
[14]:
def check_accuracy(G_origin, weight="weight", clustering_label="value", plot_cut=True):
    """To check the clustering quality while cut the edges with weight using different threshold

    Parameters
    ----------
    G_origin : NetworkX graph
        A graph with ``weight`` as Ricci flow metric to cut.
    weight: float
        The edge weight used as Ricci flow metric. (Default value = "weight")
    clustering_label : str
        Node attribute name for ground truth.
    plot_cut: bool
        To plot the good guessed cut or not.

    """
    G = G_origin.copy()
    modularity, ari = [], []
    maxw = max(nx.get_edge_attributes(G, weight).values())
    cutoff_range = np.arange(maxw, 1, -0.025)

    for cutoff in cutoff_range:
        edge_trim_list = []
        for n1, n2 in G.edges():
            if G[n1][n2][weight] > cutoff:
                edge_trim_list.append((n1, n2))
        G.remove_edges_from(edge_trim_list)

        # Get connected component after cut as clustering
        clustering = {c: idx for idx, comp in enumerate(nx.connected_components(G)) for c in comp}

        # Compute modularity and ari
        modularity.append(community_louvain.modularity(clustering, G, weight))
        ari.append(ARI(G, clustering, clustering_label=clustering_label))

    plt.xlim(maxw, 0)
    plt.xlabel("Edge weight cutoff")
    plt.plot(cutoff_range, modularity, alpha=0.8)
    plt.plot(cutoff_range, ari, alpha=0.8)

    if plot_cut:
        good_cut = -1
        mod_last = modularity[-1]
        drop_threshold = 0.01  # at least drop this much to considered as a drop for good_cut

        # check drop from 1 -> maxw
        for i in range(len(modularity) - 1, 0, -1):
            mod_now = modularity[i]
            if mod_last > mod_now > 1e-4 and abs(mod_last - mod_now) / mod_last > drop_threshold:
                if good_cut != -1:
                    print("Other cut:%f, diff:%f, mod_now:%f, mod_last:%f, ari:%f" % (
                        cutoff_range[i + 1], mod_last - mod_now, mod_now, mod_last, ari[i + 1]))
                else:
                    good_cut = cutoff_range[i + 1]
                    print("*Good Cut:%f, diff:%f, mod_now:%f, mod_last:%f, ari:%f" % (
                        good_cut, mod_last - mod_now, mod_now, mod_last, ari[i + 1]))
            mod_last = mod_now

        plt.axvline(x=good_cut, color="red")
        plt.legend(['Modularity', 'Adjust Rand Index', 'Good cut'])
    else:
        plt.legend(['Modularity', 'Adjust Rand Index'])
[15]:
check_accuracy(G_rf,clustering_label="club",plot_cut=False)
_images/tutorial_28_0.png
  • By the figure above, the best accuracy we can have is ~0.55 with ARI when cutoff set to ~1.05. Let’s try to cut the graph with that weight:
[16]:
draw_graph(my_surgery(G_rf, cut=1.05))
*************** Surgery time ****************
* Cut 35 edges.
* Number of nodes now: 34
* Number of edges now: 43
* Modularity now: 0.542141
* ARI now: 0.557313
*********************************************
_images/tutorial_30_1.png
  • The result seems to be better, but it’s still far from the optimal result, do we have some better way to fine-tune the community result?
[ ]:

Optimal transportation and Communities

  • The key of Ollivier-Ricci curvature is optimal transportation. The above example demonstrate the result of Ricci flow with Lin and Yau’s setting of probability distribution for optimal transportation while uniformly distribute probabilities to neighbors. For node \(x\) and it’s neighbor set \(\pi(x)\), they define the probability distribution with \(\alpha\in[0,1]\) as the followings:

    \begin{equation*} m^{\alpha}_x(x_i)= \begin{cases} \alpha & \mbox{ if } x_i = x\\ 1-\alpha & \mbox{ if } x_i \in \pi(x)\\ 0 & \mbox{ otherwise } \end{cases} \end{equation*}
  • We proposed a more generalized setting of probability distributions as the following, where \(C=\sum_{x_i \in \pi(x)} b^{-d(x,x_i)^p}\):

\begin{equation*} m^{\alpha,p}_x(x_i)= \begin{cases} \alpha & \mbox{ if } x_i = x\\ \frac{1-\alpha}{C}\cdot b^{-d(x,x_i)^p} & \mbox{ if } x_i \in \pi(x)\\ 0 & \mbox{ otherwise } \end{cases} \end{equation*}
  • This setting allows us to flavor the probability distribution by the edge weight and come with better result for community detection. If we take \(p=0\), then the distribution fall back to Lin and Yau’s setting, if we set \(b=\exp\) and \(p=2\), then this distribution is similar to “heat diffusion”. It turns out that for most of the case, \((b,p)=(\exp,1)\) and \((b,p)=(\exp,2)\) performs the best for community detection task.
Parameters

alpha: The parameter for the probability distribution, range from [0 ~ 1]. It means the share of mass to leave on the original node. E.g. \(x \rightarrow y\), alpha = 0.4 means 0.4 for \(x\), 0.6 to evenly spread to \(x\)’s nbr. Default: 0.5

base: Base variable for weight distribution. Default: math.e

exp_power: Exponential power for weight distribution. Default: 0

method: Transportation method. [“OTD”, “ATD”, “Sinkhorn”]. Default: Sinkhorn

- "OTD" for Optimal Transportation Distance.
- "ATD" for Average Transportation Distance.
- "Sinkhorn" for OTD approximated Sinkhorn distance (faster).
  • Let’s retry the karate club example with different settings:
[17]:
orf2 = OllivierRicci(G, alpha=0.5, base=math.e, exp_power=1, verbose="ERROR")
orf2.compute_ricci_flow(iterations=50)
G_rf2 = orf2.G.copy()
[18]:
show_results(G_rf2)
Karate Club Graph, first 5 edges:
Ollivier-Ricci curvature of edge (0,1) is -0.019430
Ollivier-Ricci curvature of edge (0,2) is -0.012186
Ollivier-Ricci curvature of edge (0,3) is -0.016822
Ollivier-Ricci curvature of edge (0,4) is -0.020180
Ollivier-Ricci curvature of edge (0,5) is -0.020298
_images/tutorial_36_1.png
[19]:
check_accuracy(G_rf2, clustering_label="club")
*Good Cut:1.108158, diff:0.013447, mod_now:0.342205, mod_last:0.355652, ari:0.590553
_images/tutorial_37_1.png
  • The check_accuracy function above detects the possible good cut point by detecting the drop of the modularity from edge weight 1 -> max edge weight. Notice that in most of the case, the trend of ARI and modularity is similar in cutoff range grater than 1, this gives us a clue when community ground-truth is not available, we can always find the plateau of modularity curve and decide a cutoff point.
  • By check_accuracy, we acquired two cut points at 1.11 and 3.88, in most of the case, the smallest cut point is usually a good clustering, but it might cut the clustering too loose sometimes (because of the community hierarchy), so it’s always a good idea to try different cut point to find the best result.
  • Let’s check the result with cut point 3.88 first, it yields a higher ARI with ~0.78, this is almost the best result we can get for the karate club graph for most of the community detection algorithm.
[20]:
draw_graph(my_surgery(G_rf2, cut=3.88))
*************** Surgery time ****************
* Cut 10 edges.
* Number of nodes now: 34
* Number of edges now: 68
* Modularity now: 0.499999
* ARI now: 0.771626
*********************************************
_images/tutorial_39_1.png
  • Here is the result if we choose the second highest plateau (cut point 1.11) of the modularity curve as the cutoff point, we basically separate the community into a hierarchical clustering.
[21]:
draw_graph(my_surgery(G_rf2, cut=1.11))
*************** Surgery time ****************
* Cut 15 edges.
* Number of nodes now: 34
* Number of edges now: 63
* Modularity now: 0.355652
* ARI now: 0.590553
*********************************************
_images/tutorial_41_1.png

Ricci community

Here let’s demonstrate how to use the function ricci_community to automatically detect cutoff point and community.

General setting
[22]:
from urllib.request import urlopen
from GraphRicciCurvature.util import cut_graph_by_cutoff
[23]:
G_football_raw = nx.read_gexf(urlopen("https://raw.githubusercontent.com/saibalmars/RicciFlow-SampleGraphs/master/football_rf_sinkhorn_e2_20.gexf"))
G_football = G_football_raw.copy()

# Clean up the pre-computed Ricci curvature and Ricci flow to start from the begining.
def clean_graph(G):
    for n1, n2 in G.edges():
        del G[n1][n2]["ricciCurvature"]
        del G[n1][n2]["original_RC"]
    for n in G.nodes():
        del G.nodes[n]["ricciCurvature"]

clean_graph(G_football)
  • Since communities are detected using Ricci flow, let set up a orc_football class and do 10 iterations of Ricci flow.
[24]:
orc_football = OllivierRicci(G_football,verbose="INFO")
[25]:
orc_football.compute_ricci_flow(iterations=10)
INFO:No ricciCurvature detected, compute original_RC...
INFO:1.205510 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 0 ===
INFO:1.212575 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 1 ===
INFO:1.183238 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 2 ===
INFO:1.465902 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 3 ===
INFO:1.468783 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 4 ===
INFO:1.356177 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 5 ===
INFO:1.317229 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 6 ===
INFO:1.304953 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 7 ===
INFO:1.145310 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 8 ===
INFO:1.251217 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 9 ===
INFO:1.455200 secs for Ricci curvature computation.
INFO:14.684404 secs for Ricci flow computation.
[25]:
<networkx.classes.graph.Graph at 0x13605e4f0>
  • After Ricci flow computation, let’s call ricci_community to detect the cutpoint and community clustering by the modularity detection method mentioned before.
  • The function will return a tuple of (cutpoint, dict_of_community).
  • Since the football graph have a simpler community structure, the community clustering can be easily captured with just 10 iterations of Ricci flow. Notice that some more complex graph might take more iterations or even surgery to capture the clustering.
[26]:
cc = orc_football.ricci_community()
INFO:Ricci flow detected, start cutting graph into community...
INFO:Communities detected: 12
  • Let’s draw the detected Ricci community.
[27]:
draw_graph(cut_graph_by_cutoff(orc_football.G,cutoff=cc[0]),clustering_label="value")
_images/tutorial_52_0.png
[28]:
check_accuracy(orc_football.G,clustering_label="value")
*Good Cut:1.834345, diff:0.018870, mod_now:0.803453, mod_last:0.822323, ari:0.889343
Other cut:2.109345, diff:0.025725, mod_now:0.777728, mod_last:0.803453, ari:0.846507
Other cut:2.209345, diff:0.037256, mod_now:0.740472, mod_last:0.777728, ari:0.806941
Other cut:2.384345, diff:0.031135, mod_now:0.712070, mod_last:0.743205, ari:0.578407
Other cut:2.534345, diff:0.040856, mod_now:0.671214, mod_last:0.712070, ari:0.578407
Other cut:2.584345, diff:0.031765, mod_now:0.639449, mod_last:0.671214, ari:0.498880
Other cut:2.659345, diff:0.024489, mod_now:0.630462, mod_last:0.654950, ari:0.498880
Other cut:2.684345, diff:0.020132, mod_now:0.610330, mod_last:0.630462, ari:0.377298
Other cut:2.709345, diff:0.015452, mod_now:0.594878, mod_last:0.610330, ari:0.377298
Other cut:2.809345, diff:0.021080, mod_now:0.573799, mod_last:0.594878, ari:0.377298
Other cut:2.884345, diff:0.136805, mod_now:0.477028, mod_last:0.613833, ari:0.251405
_images/tutorial_53_1.png
Fine tuning
  • Let’s try ricci_community on a graph with less obvious community structure.
[29]:
G_polbooks_raw = nx.read_gexf(urlopen("https://raw.githubusercontent.com/saibalmars/RicciFlow-SampleGraphs/master/polbooks_rf_sinkhorn_e2_20.gexf"))
G_polbooks = G_polbooks_raw.copy()

# Clean up the pre-computed Ricci curvature and Ricci flow to start from the begining.
clean_graph(G_polbooks)
[30]:
orc_polbooks = OllivierRicci(G_polbooks,verbose="INFO")
  • So instead of call compute_ricci_flow, we could also call ricci_community directly, it will automatically compute Ricci flow with 20 iterations.
[31]:
cc = orc_polbooks.ricci_community()
INFO:Ricci flow not detected yet, run Ricci flow with default setting first...
INFO:No ricciCurvature detected, compute original_RC...
INFO:0.287963 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 0 ===
INFO:0.274402 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 1 ===
INFO:0.368255 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 2 ===
INFO:0.304456 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 3 ===
INFO:0.323258 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 4 ===
INFO:0.292736 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 5 ===
INFO:0.258472 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 6 ===
INFO:0.295724 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 7 ===
INFO:0.287693 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 8 ===
INFO:0.271760 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 9 ===
INFO:0.293024 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 10 ===
INFO:0.302731 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 11 ===
INFO:0.279108 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 12 ===
INFO:0.300799 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 13 ===
INFO:0.256582 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 14 ===
INFO:0.257300 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 15 ===
INFO:0.268453 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 16 ===
INFO:0.255423 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 17 ===
INFO:0.269776 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 18 ===
INFO:0.301950 secs for Ricci curvature computation.
INFO: === Ricci flow iteration 19 ===
INFO:0.286648 secs for Ricci curvature computation.
INFO:6.499143 secs for Ricci flow computation.
INFO:Ricci flow detected, start cutting graph into community...
INFO:Communities detected: 8
  • The community is less obvious, so with the good cut we can only get ARI with ~0.37
[32]:
check_accuracy(orc_polbooks.G,clustering_label="value")
*Good Cut:2.385628, diff:0.018401, mod_now:0.559422, mod_last:0.577823, ari:0.374930
Other cut:2.435628, diff:0.013358, mod_now:0.546065, mod_last:0.559422, ari:0.473447
Other cut:2.460628, diff:0.034689, mod_now:0.511375, mod_last:0.546065, ari:0.473447
Other cut:2.485628, diff:0.030596, mod_now:0.480779, mod_last:0.511375, ari:0.473447
Other cut:2.510628, diff:0.012063, mod_now:0.468716, mod_last:0.480779, ari:0.473447
Other cut:2.535628, diff:0.023313, mod_now:0.445402, mod_last:0.468716, ari:0.473447
Other cut:2.560628, diff:0.022216, mod_now:0.423186, mod_last:0.445402, ari:0.473447
Other cut:3.660628, diff:0.060068, mod_now:0.493394, mod_last:0.553462, ari:0.628270
Other cut:4.060628, diff:0.008335, mod_now:0.469584, mod_last:0.477918, ari:0.564356
Other cut:4.135628, diff:0.009010, mod_now:0.457678, mod_last:0.466688, ari:0.564356
Other cut:4.410628, diff:0.016393, mod_now:0.438085, mod_last:0.454478, ari:0.564356
_images/tutorial_60_1.png
[ ]:

  • For graph with weaker community structure, we suggest the followings to get the best community result:
    • Compute with more Ricci flow iterations: for some graph, it might take more iterations for Ricci flow to capture the community structure. For example, karate_club graph need 50+ iterations to capture the community structure.
    • Tuning the exp_power: bigger exp_power means we put more focus on neighbors with larger weights for optimal transportation, exp_power=0 means we treat every neighbors as equal.
  • There is currently no theoretical support to suggest what is the best setting, but we see some good results with 20 iterations and exp_power=2 for both synthetic and real-world graphs.
Tuning Ricci flow iterations
  • Let’s first try to do more iterations for this graph to see what we can get.
  • You will see that the curve is somewhat compressed in the x direction.
[33]:
orc_polbooks.set_verbose("ERROR")
orc_polbooks.compute_ricci_flow()
[33]:
<networkx.classes.graph.Graph at 0x135f377f0>
[34]:
check_accuracy(orc_polbooks.G,clustering_label="value")
*Good Cut:2.575244, diff:0.084283, mod_now:0.573150, mod_last:0.657433, ari:0.384757
Other cut:2.600244, diff:0.010622, mod_now:0.562528, mod_last:0.573150, ari:0.529909
Other cut:2.625244, diff:0.012090, mod_now:0.550438, mod_last:0.562528, ari:0.529909
Other cut:2.650244, diff:0.013006, mod_now:0.537432, mod_last:0.550438, ari:0.529909
Other cut:2.675244, diff:0.026541, mod_now:0.510892, mod_last:0.537432, ari:0.529909
Other cut:2.700244, diff:0.032492, mod_now:0.478400, mod_last:0.510892, ari:0.518298
Other cut:2.775244, diff:0.011332, mod_now:0.475610, mod_last:0.486943, ari:0.628270
Other cut:3.300244, diff:0.045066, mod_now:0.498860, mod_last:0.543927, ari:0.628270
Other cut:3.825244, diff:0.005538, mod_now:0.488423, mod_last:0.493961, ari:0.564356
Other cut:3.900244, diff:0.006708, mod_now:0.481714, mod_last:0.488423, ari:0.564356
Other cut:3.950244, diff:0.010208, mod_now:0.471506, mod_last:0.481714, ari:0.564356
Other cut:3.975244, diff:0.005457, mod_now:0.466049, mod_last:0.471506, ari:0.564356
Other cut:4.400244, diff:0.009465, mod_now:0.450717, mod_last:0.460182, ari:0.564356
Other cut:4.425244, diff:0.006387, mod_now:0.444330, mod_last:0.450717, ari:0.564356
_images/tutorial_65_1.png
[ ]:

Tuning exp_power
  • Let’s try to use exp_power=1 to capture a better result. You can try other value to check the difference of ARI and modularity curve.
[35]:
orc_polbooks1 = OllivierRicci(G_polbooks,exp_power=1,verbose="ERROR")
cc1 = orc_polbooks1.ricci_community()
[36]:
check_accuracy(orc_polbooks1.G,clustering_label="value")
*Good Cut:4.401536, diff:0.069523, mod_now:0.466563, mod_last:0.536086, ari:0.651740
Other cut:4.676536, diff:0.007997, mod_now:0.454704, mod_last:0.462700, ari:0.564356
Other cut:4.876536, diff:0.008469, mod_now:0.437881, mod_last:0.446350, ari:0.564356
Other cut:5.051536, diff:0.004324, mod_now:0.425006, mod_last:0.429329, ari:0.564356
_images/tutorial_69_1.png
[ ]:

GraphRicciCurvature

Binder Open In Colab Build Status Documentation Status License

This work computes the Ollivier-Ricci Curvature [1], Ollivier-Ricci Flow [2] [3] and Forman-Ricci Curvature (or Forman curvature) [4], and Ricci community [3] detected by Ollivier-Ricci flow metric.

Ricci flow on manifold karate club demo

Curvature is a geometric property to describe the local shape of an object. If we draw two parallel paths on a surface with positive curvature like a sphere, these two paths move closer to each other while for a negatively curved surface like a saddle, these two paths tend to be apart.

In [Ni], we observe that the edge Ricci curvature plays an important role in the graph structure. An edge with positive curvature represents an edge within a cluster, while a negatively curved edge tent to be a bridge within clusters. Also, negatively curved edges are highly related to graph connectivity, with negatively curved edges removed from a connected graph, the graph soon become disconnected.

Ricci flow is a process to uniformized the edge Ricci curvature of the graph. For a given graph, the Ricci flow gives a “Ricci flow metric” on each edge as edge weights, such that under these edge weights, the Ricci curvature of the graph is mostly equal everywhere. In [Ni3], this “Ricci flow metric” is shown to be able to detect communities.

Both Ricci curvature and Ricci flow metric can act as a graph fingerprint for graph classification. The different graph gives different edge Ricci curvature distributions and different Ricci flow metric.

Video demonstration of Ricci flow for community detection:

Ricci community

Package Requirement

  • NetworkX (Based Graph library)
  • NetworKit (Shortest path algorithm)
  • CVXPY (LP solver for Optimal transportation)
  • NumPy (CVXPY support)
  • POT (For approximate Optimal transportation distance)
  • python-louvain (For faster modularity computation)

Installation

Installing via pip

pip3 install [--user] GraphRicciCurvature
  • From version 0.4.0, in order to support larger graph, we switch to NetworKit’s pairwise bidirectional dijkstra algorithm for density distribution (NetworKit>6.0 is required). If the installation of NetworKit failed, please refer to [NetworKit’ Installation instructions]. In most of the cast build this package from source is recommended.

Upgrading via pip

To run with the latest code for the best performance, upgrade GraphRicciCurvature to the latest version with pip:

pip3 install [--user] --upgrade GraphRicciCurvature

Getting Started

  • See the jupyter notebook tutorial on [nbviewer] for a walk through for the basic usage of Ricci curvature, Ricci flow, and Ricci flow for community detection.
  • Or you can run it in directly on [binder] (no account required) or [Google colab] (Faster but Google account required).
  • Check the Documentations.
  • Try out sample graphs with precomputed Ricci curvature/flow.

Simple Example

import networkx as nx
from GraphRicciCurvature.OllivierRicci import OllivierRicci
from GraphRicciCurvature.FormanRicci import FormanRicci

print("\n- Import an example NetworkX karate club graph")
G = nx.karate_club_graph()

print("\n===== Compute the Ollivier-Ricci curvature of the given graph G =====")
# compute the Ollivier-Ricci curvature of the given graph G
orc = OllivierRicci(G, alpha=0.5, verbose="INFO")
orc.compute_ricci_curvature()
print("Karate Club Graph: The Ollivier-Ricci curvature of edge (0,1) is %f" % orc.G[0][1]["ricciCurvature"])

print("\n===== Compute the Forman-Ricci curvature of the given graph G =====")
frc = FormanRicci(G)
frc.compute_ricci_curvature()
print("Karate Club Graph: The Forman-Ricci curvature of edge (0,1) is %f" % frc.G[0][1]["formanCurvature"])

# -----------------------------------
print("\n=====  Compute Ricci flow metric - Optimal Transportation Distance =====")
G = nx.karate_club_graph()
orc_OTD = OllivierRicci(G, alpha=0.5, method="OTD", verbose="INFO")
orc_OTD.compute_ricci_flow(iterations=10)
print("\n=====  Compute Ricci community - by Ricci flow =====")
clustering = orc_OTD.ricci_community()

More example in example.py.

Reference

[1]Ni, C.-C., Lin, Y.-Y., Gao, J., Gu, X., and Saucan, E. 2015. Ricci curvature of the Internet topology (Vol. 26, pp. 2758–2766). Presented at the 2015 IEEE Conference on Computer Communications (INFOCOM), IEEE. [arXiv]
[2]Ni, C.-C., Lin, Y.-Y., Gao, J., and Gu, X. 2018. Network Alignment by Discrete Ollivier-Ricci Flow, Graph Drawing 2018, [arXiv]
[3](1, 2) Ni, C.-C., Lin, Y.-Y., Luo, F. and Gao, J. 2019. Community Detection on Networks with Ricci Flow, Scientific Reports, [arXiv]
[4]Sreejith, R. P., Karthikeyan Mohanraj, Jürgen Jost, Emil Saucan, and Areejit Samal. 2016. Forman Curvature for Complex Networks. Journal of Statistical Mechanics: Theory and Experiment 2016 (6). IOP Publishing: 063206. [arXiv]

Contact

Please contact [Chien-Chun Ni].

Cite

If you use this code in your research, please considering cite our paper:

@article{ni2019community,
  title={Community detection on networks with ricci flow},
  author={Ni, Chien-Chun and Lin, Yu-Yao and Luo, Feng and Gao, Jie},
  journal={Scientific reports},
  volume={9},
  number={1},
  pages={1--12},
  year={2019},
  publisher={Nature Publishing Group}
}

Indices and tables