# bioqc_geo

**Repository Path**: mirrors_grst/bioqc_geo

## Basic Information

- **Project Name**: bioqc_geo
- **Description**: Supplementary material for "Tissue heterogeneity is prevalent in gene expression studies"
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-12-01
- **Last Updated**: 2026-05-23

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Tissue heterogeneity is prevalent in gene expression studies


> Gregor Sturm, Markus List and Jitao David Zhang. Tissue heterogeneity is prevalent in gene expression studies

The source code in this project can be used to reproduce the results
described in the paper. 

Running the pipeline will generate the figures and supplementary information from the paper. 
The supplementary information is additionally available as a website on 
[grst.github.io/bioqc_geo](https://grst.github.io/bioqc_geo). 

## Running the pipeline
Short version:
```
conda install snakemake
git clone git@github.com:grst/bioqc_geo.git
cd bioqc_geo
snakemake --use-conda
```

For details, see below.

### Prerequisites
This pipeline uses [conda](https://conda.io/miniconda.html) to install all dependencies and
[Snakemake](https://snakemake.readthedocs.io/en/stable/) to orchestrate the analyses.

1. **Download and install [Miniconda](https://conda.io/miniconda.html)**
2. **Install snakemake**
```
conda install snakemake
```

3. **Clone this repo.** We use a [git submodule](https://git-scm.com/docs/git-submodule) to import
the source code for the [immundeconv](https://github.com/grst/immunedeconv) R package.
```
git clone git@github.com:grst/bioqc_geo.git
```

### Run the pipeline
To perform all computations and to generate a HTML report with
[bookdown](https://bookdown.org/yihui/bookdown/) invoke
the corresponding `Snakemake` target:

```
snakemake --use-conda book
```

Make sure to use the `--use-conda` flag to tell Snakemake to download all dependencies from Anaconda.org.

The pipeline will generate a `results` folder which will contain
the rendered supplementary information as PDF and HTML documents, 
the figures a detailed result file with heterogeneity results
for all tested samples. 


### Performance and caching
Building the entire project can take a long time (multiple hours).
You can speed up the build process by enabling parallel processing:

```
snakemake --use-conda --cores 16
```

Up to 16 cores will lead to a speedup, most of the pipeline is sequential,
though.

**Memory requirements**: You need about 4GB of memory per core and at least
16GB of total memory to run the pipeline.

To speed up repetitive builds, `bookdown` will automatically create caches.
To remove all caches and results, use `snakemake wipe`.

### Useful Snakemake targets
Have a look at the `Snakefile`, it is self-explanatory.

A list of the most useful targets
```
snakemake --use-conda book       # generate a HTML-book in `results/book`
snakemake --use conda            # default target (= book)
snakemake clean                  # cleans the HTML book
snakemake wipe                   # cleans everything, including all caches.
```

### Preprocessed data
This pipeline makes use of preprocesse BioQC results.
Downloading the entire GEO and running BioQC on all samples
takes a lot of computational resources. Therefore,
we provide pre-calculated intermediate results, that
are used by this pipeline.

If you are interested in reproducing these files and
building the BioQC-GEO database from scratch, hava a look
at [grst/BioQC_GEO_analysis](https://github.com/grst/BioQC_GEO_analysis).