# graph-generation-EDGE

**Repository Path**: sakuraa/graph-generation-EDGE

## Basic Information

- **Project Name**: graph-generation-EDGE
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-07-13
- **Last Updated**: 2024-07-13

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# EDGE

Official pytorch implementation for ["Efficient and Degree-Guided Graph Generation via Discrete Diffusion Modeling"](https://arxiv.org/pdf/2305.04111.pdf). Our code is devloped based on https://github.com/ehoogeboom/multinomial_diffusion. 

We use the evaluation modules provided by https://github.com/uoguelph-mlrg/GGM-metrics and https://github.com/hheidrich/CELL.

## Environment requirement 
```
dgl
prettytable
scikit-learn
tensorboard
tensorflow
tensorflow-gan
torch
torch-geometric
tqdm
wandb
```
and dependencies from  https://github.com/uoguelph-mlrg/GGM-metrics, https://github.com/hheidrich/CELL and https://github.com/ehoogeboom/multinomial_diffusion.

## Training your degree sequence model
See node.ipynb, once you train the model, it's saved to the "./graphs" directory.

## Training script
🌟**IMPORTANT note on running EDGE for your own datasets**: Do not use large diffusion steps for small graphs with less than 100 nodes, for those small graph datasets, please try #diffusion steps={8,16,32,64}

### 1. Training template for generic graph datasets
By default, we use an empirical degree sampler, which randomly takes a degree sequence from the training data as $d^0$ to perform degree guidance. You can replace the keyword `empirical` with `neural` in the option `--empty_graph_sampler` if you have trained your neural degree sampler.
```
#!/bin/bash

python train.py \
        --epochs 50000 \
        --num_generation 64 \
        --diffusion_dim 64 \
        --diffusion_steps 128 \
        --device cuda:1 \
        --dataset Ego \
        --batch_size 8 \
        --clip_value 1 \
        --lr 1e-4 \
        --optimizer adam \
        --final_prob_edge 1 0 \
        --sample_time_method importance \
        --check_every 500 \
        --eval_every 500 \
        --noise_schedule linear \
        --dp_rate 0.1 \
        --loss_type vb_ce_xt_prescribred_st \
        --arch TGNN_degree_guided \
        --parametrization xt_prescribed_st \
        --empty_graph_sampler empirical \     
        --degree \
        --num_heads 8 8 8 8 1 
```

### 2. training template for large network datasets
```
#!/bin/bash

python train.py \
        --epochs 50000 \
        --num_generation 64 \
        --num_iter 256 \
        --diffusion_dim 64 \
        --diffusion_steps 512 \
        --device cuda:0 \
        --dataset polblogs \
        --batch_size 4 \
        --clip_value 1 \
        --lr 1e-4 \
        --optimizer adam \
        --final_prob_edge 1 0 \
        --sample_time_method importance \
        --check_every 50 \
        --eval_every 50 \
        --noise_schedule linear \
        --dp_rate 0.1 \
        --loss_type vb_ce_xt_prescribred_st \
        --arch TGNN_degree_guided \
        --parametrization xt_prescribed_st \
        --degree \
        --num_heads 8 8 8 8 1 
```
Evaluation is done every `eval_every` epochs. You can also re-evaluate a specific checkpoint using the script below. 

## Evaluation script
```
python evaluate.py \
        --run_name 2023-05-29_18-29-35 \
        --dataset polblogs \
        --num_samples 8 \
        --checkpoints 5500
```

## Results
Training results can be found in wandb/{dataset_name}/multinomial_diffusion/multistep/{run_name}

## Work in progress
We are still working on integrating the following two features into our code:

 * [ ]  Even faster sampling by incrementally modifying graph.
 
 * [ ]  Attributed graph generation.