# ClusterTAD
**Repository Path**: hi-c_baseline/ClusterTAD
## Basic Information
- **Project Name**: ClusterTAD
- **Description**: https://github.com/BDM-Lab/ClusterTAD
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-03-12
- **Last Updated**: 2026-03-12
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
------------------------------------------------------------------------------------------------------------------------------------
# ClusterTAD : An unsupervised machine learning approach to detecting topologically associated domains of chromosomes from Hi-C data
------------------------------------------------------------------------------------------------------------------------------------
**Bioinformatics, Data Mining, Machine Learning (BDM) Laboratory,**
**University of Missouri, Columbia MO 65211**
----------------------------------------------------------------------
**Developer:**
Oluwatosin Oluwadare
Department of Computer Science
University of Missouri, Columbia
Email: oeow39@mail.missouri.edu
**Contact:**
Jianlin Cheng, PhD
Department of Computer Science
University of Missouri, Columbia
Email: chengji@missouri.edu
--------------------------------------------------------------------
**1. Content of folders:**
-----------------------------------------------------------
* **executable**: latest **ClusterTAD.jar** version can be downloaded from the **release tab**
* examples: contains example data and outputs generated from ClusterTAD for these datasets
* src: ClusterTAD **Java** and **MATLAB** source codes
* TADs: contains identified topological domains for two mESC and mouse cortex cell type using ClusterTAD
**2. Hi-C Data used in this study:**
-----------------------------------------------------------
In our study, we used the normalized Hi-C matrix processed by Bing Ren's Lab in University of Calfornia, San Diego.
Download the normalized Matrix here : http://chromosome.sdsc.edu/mouse/hi-c/download.html
**3. Input matrix file format:**
-----------------------------------------------------------
The input to ClusterTAD is a tab seperated N by N intra-chromosomal contact matrix derived from Hi-C data, where N is the number of equal-sized regions of a chromosome.
**4. Usage:**
-----------------------------------------------------------
**4.1. Java:**
To run the tool, open command line interface and type: **java -jar ClusterTAD.jar Input_Matrix_file Matrix_Resolution**
Parameters are as follow:
* **Input_Matrix_file** : A tab seperated N by N intra-chromosomal Hi-C contact matrix.
* **Matrix_Resolution** : Contact Matrix Resolution.
**4.2. MATLAB:**
Instructions on how to run the MATLAB source code is given here **_/src/MATLAB source code/_**
**5. Output**
-----------------------------------------------------------
ClusterTAD produces 2 folders in Output folder:
**5.1. Clusters:**
* Contains a *.txt* file that contains the cluster assignment for the diagonal for all the K values considered
**5.2. TADs:**
* Contains the *.txt* files listing the TADs extracted from each clustering and reclustering done.
* Contains the Best TAD identified based on the Quality score, labeled as "BestTAD_[nameofinputfile]_K=.txt".
* Contains a *.txt* file which contains a list of the extracted TAD Quality scores, file name = [nameofinputfile]_TAD_QualityScore_List.
**6. Disclaimer**
-----------------------------------------------------------
The executable software and the source code of ClusterTAD is distributed free of charge as it is to any non-commercial users. The authors hold no liabilities to the performance of the program.
**7. Citations**
-----------------------------------------------------------
Oluwadare, Oluwatosin, and Jianlin Cheng. "ClusterTAD: an unsupervised machine learning approach to detecting topologically associated domains of chromosomes from Hi-C data." BMC bioinformatics 18.1 (2017): 480.
**8. Common questions**
-----------------------------------------------------------
**8.1. What is the format of the domain output genererated?**
The domain extracted in ClusterTAD are presented in the format **_from.id from.cord to.id to.cord_** where:
* **from.id** : start bin id for a domain.
* **from.cord** : coordinate of the start bin id for a domain based on data Resolution
* **to.id** : end bin id for a domain.
* **to.cord** : coordinate of the end bin id for a domain based on data Resolution
-----------------------------------------------------------