# COSGR **Repository Path**: yuwq/COSGR ## Basic Information - **Project Name**: COSGR - **Description**: No description available - **Primary Language**: Unknown - **License**: BSD-3-Clause - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-03-25 - **Last Updated**: 2024-03-25 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README ## COSG in R Accurate and fast cell marker gene identification with COSG COSG is a cosine similarity-based method for more accurate and scalable marker gene identification. * COSG is a general method for cell marker gene identification across different data modalities, e.g., scRNA-seq, scATAC-seq and spatially resolved transcriptome data. * Marker genes or genomic regions identified by COSG are more indicative and with greater cell-type specificity. * COSG is ultrafast for large-scale datasets, and is capable of identifying marker genes for one million cells in less than two minutes. The method and benchmarking results are described in [Dai et al., (2022)](https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbab579/6511197?redirectedFrom=fulltext). The preprint is available in [bioRxiv](https://www.biorxiv.org/content/10.1101/2021.06.15.448484v1). Here is the R version for COSG, and the python version is hosted in https://github.com/genecell/COSG. ### Installation ``` # install.packages('remotes') remotes::install_github(repo = 'genecell/COSGR') ``` ### Usage Please check out the [vignette](https://github.com/genecell/COSGR/blob/master/vignettes/quick_start.Rmd) and the [PBMC10K tutorial](https://github.com/genecell/COSGR/blob/master/vignettes/pbmc10k_tutorial_cosg.Rmd) to get started. ``` suppressMessages(library(Seurat)) data('pbmc_small',package='Seurat') # Check cell groups: table(Idents(pbmc_small)) #> #> 0 1 2 #> 36 25 19 ####### # Run COSG: marker_cosg <- cosg( pbmc_small, groups='all', assay='RNA', slot='data', mu=1, n_genes_user=100) ####### # Check the marker genes: head(marker_cosg$names) #> 0 1 2 #> 1 CD7 S100A8 MS4A1 #> 2 CCL5 TYMP CD79A #> 3 GNLY S100A9 TCL1A #> 4 LAMP1 FCGRT NT5C #> 5 GZMA IFITM3 CD79B #> 6 LCK LST1 FCER2 head(marker_cosg$scores) #> 0 1 2 #> 1 0.6391917 0.8954042 0.6922908 #> 2 0.6391267 0.8312083 0.5832425 #> 3 0.6328148 0.8120045 0.5757478 #> 4 0.6164937 0.7755955 0.5533107 #> 5 0.5846589 0.7413060 0.5163446 #> 6 0.5795238 0.7380483 0.5115180 ####### Run COSG for selected groups, i.e., '0' and 2': ####### marker_cosg <- cosg( pbmc_small, groups=c('0', '2'), assay='RNA', slot='data', mu=1, n_genes_user=100) ``` ### Tip 1. If you would like to identify more specific marker genes, you could assign `mu` to larger values, such as `mu=10` or `mu=100`. 2. You could set the parameter `remove_lowly_expressed` to `TRUE` to not consider genes expressed very lowly in the target cell group, and you can use the parameter `expressed_pct` to adjust the threshold for the percentage. For example: ``` marker_region<-cosg( seo, groups='all', assay='peaks', slot='data', mu=100, n_genes_user=100, remove_lowly_expressed=TRUE, expressed_pct=0.1 ) ``` ### Citation If COSG is useful for your research, please consider citing [Dai, M., Pei, X., Wang, X.-J., 2022. Accurate and fast cell marker gene identification with COSG. Brief. Bioinform. bbab579](https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbab579/6511197?redirectedFrom=fulltext).