# GraphMR: Distributed Graph Match Method using MapReduce

This work is a distributed graph match method on large data sets. This work was submitted to IEEE Transactions on Kowledge and Data Engineering (TKDE).

## Data Generation

This work includes a random graph generator tool. The graph generator is an implementation of the random graph model (Erdős and Rényi 1959) in MapReduce. A graph is constructed in a distributed way across a number of cluster nodes. Therefore, we can get large graphs in a scalable manner from this generator. Besides, It enables users to specify various parameters as follows:

• -directed or undirected graphs
• -the number of vertices
• -average of degree
• -the type of attributed graphs
• -vertices only has attributes
• -vertices and edges has attributes
• -the number of distinct vertex attributes
• -the number of distinct edge attributes

### Random Graph Model

There are two standard models for generating random graphs. One is G(n, m) model, where n is the number of vertices and m is the number of edges. This model was proposed by Erdős and Rényi . In order to generate a random graph, this model randomly chooses m edges among nC2 possible edges. Another one is G(n,p), where n is the number of vertices and p is probability of an edge between two vertices among all possible edges. This model was first introduced by E. Gilbert . Our random graph generator provides both models.

## Usages

### How to Build

```\$ tar xzvf graphmr-0.1-SNAPSHOT.tar.gz
\$ cd graphmr-0.1-SNAPSHOT
\$ mvn assembly:assembly -DskipTests
\$ ls -l target/graphmr-0.1-SNAPSHOT.jar
target/graphmr-0.1-SNAPSHOT.jar```

### Options

The following command prints out various options.

```\$ hadoop jar graphmr-0.1-SNAPSHOT.jar gen
usage: Usage: graphmr.generator.GraphGenerator
-a                the number of distinct vertex attributes
-d,--directed          directed graph or undirected (default: undirected)
--debug             debug mode (default: disabled:)
-e                the number of distinct edge attributes
--expo         a exponent value- this option is only used in zipf
-k                average of degree
-l,--labeled      none - no label, v - only vertex label, ve -
vertex and edge labels (default)
--labeldist    random - uniform distribution (default), zipf -
zipf distribution
-m                the number of map tasks
--model        which model: erdos (Erdos and Reiny model),
gilbert (a variation of Erdos and Reiny Model),
and zipf
-o,--out          output filename
-p,--prob         a probability - this option is only used in
gilbert
-r                the number of reduce tasks
-t,--text              the program outputs vertices as text format```

## Examples

generating a undirected graph with 10000 vertices and 5 average degree.

`# hadoop jar graphmr-0.1-SNAPSHOT.jar gen -k 5 10000 -o out`

### generating a undirected graph with 10000 vertices and 10 distinct vertex

`# hadoop jar graphmr-0.1-SNAPSHOT.jar gen -k 5 -a 10 10000 -o out`

generating a undirected graph with 10000 vertices, 10 distinct vertex attributes, and 20 distinct edge attributes.

`# hadoop jar graphmr-0.1-SNAPSHOT.jar gen -k 5 -a 10 -e 20 10000 -o out`

generating a directed graph with 10000 vertices

`# hadoop jar graphmr-0.1-SNAPSHOT.jar gen -d -o out 10000`