# minda **Repository Path**: kylintu/minda ## Basic Information - **Project Name**: minda - **Description**: somatic SV 位置评估软件 - **Primary Language**: Python - **License**: BSD-3-Clause - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-01-16 - **Last Updated**: 2026-01-16 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Minda ###### Note: This tool is under active devlopment. Minda is a tool for evaluating structural variant (SV) callers that * standardizes VCF records for compatibility with both germline and somatic SV callers, * benchmarks against a single VCF input file, or * benchmarks against an ensemble call set created from multiple VCF input files. ## Installation Clone the repository and install the dependencies via conda: ``` git clone https://github.com:KolmogorovLab/minda cd minda conda env create --name minda --file environment.yml conda activate minda ./minda.py ``` ## Quick Usage Benchmarking several vcfs against a truth set vcf: ``` ./minda.py truthset --base truthset.vcf --vcfs caller_1.vcf caller_2.vcf caller_3.vcf --out_dir minda_out ``` Creating an ensemble from several vcfs and benchmarking against ensemble calls: ``` ./minda.py ensemble --vcfs caller_1.vcf caller_2.vcf caller_3.vcf --out_dir minda_out ``` ## Inputs and Parameters ### Required #### Truthset ``` --out_dir path to out directory --base path of base VCF --tsv | --vcfs tsv file path -OR- vcf file path(s) ``` #### Ensemble ``` --out_dir path to out directory --tsv | --vcfs tsv file path -OR- vcf file path(s) --min_support | minimumn number of callers required to support an ensemble call --conditions -OR- specific conditions to support a call ``` ### Optional ``` --bed path to bed file for filtering records with BedTool intersect --filter filter records by FILTER column; default="['PASS']" --min_size filter records by SVLEN in INFO column --tolerance maximum allowable bp distance between base and caller breakpoint; default=500 --sample_name name of sample --vaf filter out records below a given VAF treshold --multimatch allow more than one record from the same caller VCF to match a single truthset/ensemble record ``` ##### VCF Input Minda standardizes input VCFs by decomposing every SV into start and end records. Records are handled in one of two following ways: 1. For records having a CHROM:POS pattern in the `ALT` field, the `#CHROM` and `POS` fields are considered the start. Minda then searches for the end record matching the `ALT` field among other records. Alternatively, the `MATEID` from the `INFO` field may be used to find the end record. If no end record is found, the details from the `ALT` field are used to create one. 2. All other records Minda considers start records. The corresponding end records use the start `#CHROM` and `POS` is calculated by adding the start `POS` with absolute value of `SVLEN` or is extracted from the `END` integer in the `INFO` field. Minda has been tested on VCFs produced by * Severus * SAVANA * nanomonsv * Sniffles2 * cuteSV * SVIM * GRIPSS * manta * SvABA. If you encounter issues with these or other VCF files, please [let us know](https://github.com/KolmogorovLab/minda/issues). ##### TSV Input The `--tsv` file has one required column and up three columns. The columns should be as follows: