I recently took a class on graph mining. In this post, I am aggregating all the lecture topics that make a complete four months worth of material. Hope this serves as a good archive on hand picked topics around several different areas of graph mining.

Thanks to Dr. Kamesh Madduri for this course!

## Introduction

Slides from Georgia Tech Network Science class

Link1 and Link2

Chapters 1 and 2 of Networks, Markets, Crowds book

Brandes et al., What is Network Science?

Estrada et al., Complex Networks: an invitation

Lecture blurbs of Barabasi’s Network Science class

Mislove et al., Measurement and Analysis of Online Social Networks, Proc. IMC 2007

Broder et al., Graph structure in the web, Proc. WWW 2000

Meusel et al., Graph structure in the web – revisited, Proc. WWW 2014

## Centrality

Centrality chapters in Network Analysis: Methodological foundations book
Link1 and Link2

Freeman’s paper on centrality in social networks

PageRank papers : Link1 and Link2

Gleich, PageRank beyond the web, 2014

The Wikipedia page on PageRank has some clarifications about the formula to use

Hoeffding’s inequality (used to analyze randomized algorithms)

Brandes’ Betweenness centrality algorithm paper

Faust, Centrality in affiliation networks

Papers/slides on Harmonic centrality by Boldi and Vigna : Link1 and Link2

## Community detection

Fortunato, Community detection in graphs, 2010

The paper that introduces the modularity measure
Newman, Fast algorithm for detecting community structure in networks, 2003

Newman presents the divisive spectral approach to modularity maximization
Newman, Modularity and community structure in networks, 2006

The agglomerative clustering based CNM algorithm for modularity maximization
Clauset, Newman, Moore, Finding community structure in very large networks,2004

Introduces the Louvain algorithm, a “multilevel” method for modularity maximization
Blondel et al., Fast unfolding of communities in complex networks, 2008

A fast and simple algorithm for community detection. Doesn’t explicitly try to maximize modularity.
Raghavan, Albert, Kumara, Near linear time algorithm to detect community structure in large-scale networks, 2007

Comparative evaluation of community structure in a collection of large-scale networks
Leskovec et al., Community Structure in large networks: natural cluster sizes and the absence of well-defined clusters, 2008

Various implementations for the partitioning and community detection problems were compared here. Note that graph partitioning and community detection are two different problems.
The 10th DIMACS Implementation Challenge: Graph Partitioning and Graph Clustering

Winner of the DIMACS challenge for community detection. If you had to pick one implementation for non-overlapping community detection, this would be it.
Ovelgonne and Geyer-Schulz, An ensemble learning strategy for community detection

Review paper on overlapping community detection.
Xie et al., Overlapping community detection in networks: the state of the art and comparative study, 2011

One possible algorithm for overlapping community detection.
Yang and Leskovec, Defining and evaluating network communities based on ground-truth, Proc. ICDM 2012.

A very nice exposition of this problem. An extension of the above paper.
Yang, Community Structure of Large Networks, PhD dissertation, 2014.

## Miscellaneous analysis algorithms

Assortativity

Newman, Mixing patterns in networks, 2003

Latapy, Main-memory triangle counting for very large sparse power-law graphs, 2008

Seshadhri et al., Wedge sampling for computing clustering coefficients and triangle counts on large graphs, 2013.

Tsourakakis et al., DOULION: Counting Triangles in Massive Graphs with a coin, 2009.

Degeneracy/k-core decomposition

Batagelj and Zaversnik’s (Pajek) 2002 k-core algorithm paper:

Sariyuce et al., Streaming algorithms for k-core decomposition, 2013

Rossi et al., Fast maximum clique algorithms for large graphs, 2014

Cohen, Cohesive subgraphs for social network analysis, 2008

Sariyuce et al., Finding the Hierarchy of Dense Subgraphs using Nucleus decompositions, 2014

## Network Models

Watts and Strogatz, The Collective dynamics of small-world networks, 1998

Model for small-world phenomenon :
Barabasi and Albert, Emergence of scaling in random networks, 1999

The preferential attachment paper :
Kleinberg, The small-world phenomenon: an algorithmic perspective, 2000

Another model explaining small-world nature :
Kleinberg et al., The web as a graph: measurements, models, and methods, 2000

Web graph modeling :
Doyle et al., The “robust yet fragile” nature of the internet, 2005

Observations about the internet router-level topology
Aiello, Chung, Lu, A random graph model for power law graphs, 1999

Some results about power-law random graphs
Robins et al., An Introduction to exponential random graph models for social networks, 2007

Newman, Structure and function of complex networks, 2003

Chapter 4 on Random graphs from Foundations of Data Science book by Hopcroft and Kannan, 2014

## Link prediction

Kleinberg, The link prediction problem for social networks, 2004.

Lu, Zhou, Link prediction in complex networks: a survey, 2010.

Al Hasan, Zaki, A survey of link prediction in social networks 2010.

Wang et al., Human mobility, social ties, and link prediction, Proc. KDD 2011.

Sadilek et al., Finding your friends and following them to where you are, Proc. WSDM 2012.

WTF: the Who to Follow service at Twitter, Proc. WWW 2014 :
Link1 and Link2

Chakrabarti et al., Joint inference of multiple label types in large networks, 2014.

Schifanella et al., Folks in folksonomies: social link prediction with shared metadata, 2010.

Dong et al., Link prediction and recommendation across heterogeneous social networks, Proc. ICDM 2012.

Cannistraci et al., From link-prediction in brain connectomes and protein interactomes to the local-community-paradigm in complex networks, 2014

## Subgraph mining

The Wikipedia page on network motifs:

Milo et al., Network motifs: Simple building blocks of complex networks, 2002

Yan, Han, gspan: Graph-based substructure pattern mining, 2002

Kuramochi and Karypis, Finding frequent patterns in a large sparse graph, 2005

Gibson, Kumar, Tomkins, Discovering large dense subgraphs in massive graphs, 2005.

Alon, Yuster, Zwick, Color-coding, 1995.

Ray, Holder, Choudhury, Frequent subgraph discovery in large attributed streaming graphs, 2014.

Pattabiraman et al., Fast algorithms for the maximum clique problem on massive graphs, 2013.

Yang et al., Schemaless and structureless graph querying, 2014.

## Applications

Merritt, Clauset, Social network dynamics in a massive online game: network turnover, non-densification, and team engagement in Halo Reach, 2013.

Chapter 6 of Networks, Crowds, and Markets book

Gale, Kariv, Financial networks.

Zhu, Wilkinson, Shreiber, Pan, Large-scale parallel collaborative filtering for the Netflix prize, Proc. AAIM 2008.

Koren, Bell, Volinsky, Matrix factorization techniques for recommender systems, IEEE Computer, 2009.

Natarajan, Dhillon, Inductive matrix completion for predicting gene-disease associations, 2014.

Singh-Blom et al., Prediction and Validation of Gene-Disease Associations Using Methods Inspired by Social Network Analyses, 2013.

## Graph storage and analysis frameworks

Bronson et al., TAO: Facebook’s distributed graph data store for the social graph, Proc. ATC 2013.

Curtiss et al., Unicorn: a system for searching the social graph, Proc. VLDB 2013.

Gonzales et al., PowerGraph: distributed graph-parallel computation on natural graphs, Proc. OSDI 2012.

Gonzales et al., GraphX: Graph processing in a distributed dataflow framework, Proc. OSDI 2014.

## Graph partitioning

Buluc et al., Recent advances in graph partitioning, 2015

The 10th DIMACS Implementation Challenge: Graph Partitioning and Graph Clustering

## Graph compression & visualization

Boldi, Vigna, The WebGraph Framework I: Compression Techniques, Proc. WWW 2014.

Buehrer, Chellapilla, A scalable pattern mining approach to web graph compression with communities, Proc. WSDM 2008

Hu, Efficient and High-quality force-directed graph drawing.

Ahmed et al., Visualisation and Analysis of the Internet Movie Database, Proc. APVIS 2007

Herr, Ke, Hardy, Borner, Movies and Actors: Mapping the Internet Movie Database, Proc. IV 2007

Freebase US film data analytics using GraphLab