1 Star 2 Fork 1

_lhtk_ / b2dsc

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
Artistic-2.0

目录

部署实例

项目介绍

B2DSC (Get the distribution of sequences on chromosomes, based upon BLASTN) 是基于 Perl real-time web framework Mojolicious 的一个应用。 主要是为了用图片的形式展示序列(特别是重复序列)在染色体上的分布情况。

例如 pSc119.2-1 (CCGTTTTGTGGACTATTACTCACCGCTTTGGGGTCCCATAGCTAT) 在小麦 21 条染色体上的大致分布:

example

目前只支持 Linux

安装

# 直接克隆
git clone https://gitee.com/lhtk/b2dsc.git

或者下载 zip 包后解压。

使用说明

这是基于 Mojolicious 的一个简单应用,所以启用方法如任何一个基于它的 App。

启用。

# 示例
# 在 b2dsc 主目录下

# 开发、调试
morbo -l "http://*:8080" -w ./ script/b2dsc

# 或在生产环境下部署
hypnotoad script/b2dsc
# 终止运行,可使用
hypnotoad -s script/b2dsc

而后可以在浏览器访问:http://localhost:8080/

更多启动方法可以参考 Mojolicious 的文档

数据库的配置参考: 配置

依赖

可能需要安装的

用到的 JavaScript 库

  • jQuery: A fast, small, and feature-rich JavaScript library (v3.1.1).
  • D3: (or D3.js) is a JavaScript library for visualizing data using web standards. (v4).
  • Spectrum.js: The No Hassle JavaScript Colorpicker (v1.8.0).
  • FileSaver

配置

配置文件在 conf 目录下,目前包括 b2dsc 和 faidx 的配置,都是 JSON 格式。

第一次运行项目的时候,关键的配置文件会自动初始化,即由 xxx.json.raw 拷贝为 xxx.json

.
├── conf
│   ├── b2dsc
│   │   └── species
│   └── faidx
|........

自动化添加数据库

可使用 prepare_data4b2dsc.pl 来配置新数据库。 该程序额外依赖 HTSlib 中的 bgzip 命令。

# 用法示例:
# 
# 若是未压缩的纯文本 FASTA 文件:
./prepare_data4b2dsc.pl -f IWGSC_RefSeq_V1_chr1A.fsa -n wheat_1A -r -t chrom-id-table.tab -b ~/git/b2dsc
#
# 注意:-r 是删除用户提供的原始 FASTA 文件,若不删除,则勿启用该选项。
# 
# 若是普通 gzip 文件:
./prepare_data4b2dsc.pl -f IWGSC_RefSeq_V1_chr1A.fsa.gz -z -n wheat_1A -r -t chrom-id-table.tab -b /home/lhtk/git/b2dsc
# 
# 若是 bgzip 构建的 gzip 文件:
./prepare_data4b2dsc.pl -f IWGSC_RefSeq_V1_chr1A.fsa.gz -bgz -n wheat_1A -t chrom-id-table.tab -b /home/lhtk/sites/mcg

# 具体请查看该程序的帮助
./prepare_data4b2dsc.pl -h

B2DSC

基础配置

基础配置文件为 conf/b2dsc/b2dsc.json

内含项:

  • days_kept 临时文件保存天数。
    • 当前值: 7
  • data_dir: 临时文件目录,不提供 web 访问。
  • public_tmp_dir: 临时文件目录,可 web 访问。
  • min_seq_length: blastn 比对接受的最小序列长度。
    • 当前值: 10
  • max_num_of_seq: 单个任务最多接受多少条有效序列。
    • 当前值: 20
    • 设置为 0 时不限制序列数目。
  • threads: 单次 blastn 线程数。
    • 当前值: 2
    • 因为我的机器很差
  • blastn_short: 对长度多少多少以下(≤)的序列采用 -task blastn-short
    • 当前值: 30
  • ofmt: blastn 输出格式,目前只接受格式 6,即 Tabular 格式。
    • 至少为: "6 sseqid pident qcovhsp sstrand sstart send",缺少其中一个 specifier 则要出错。
  • ofmt_specifiers: 供 B2DSC 服务主页面调用,一般不要改。

Known probes 配置

配置文件:conf/b2dsc/known_probes.json

很简单,JSON Object 格式。

# 这是说明
# "给数据库起的名字": "job id"

{
    "example1": "k4example1",
    "example2": "k4example2",
    "example3": "k4example3"
}

# 这是说明
# k4example: keep for 'example'

将适用于某物种的已知的探针序列用 B2DSC 分析,得到结果后,将原先的以 t 开头的 job_id 更改为以 k 开头(或其他非 t 字符),从而不被自动删除。

需要把相关 id 都改掉,包括临时任务文件夹名字、done.json 内含的。

例如:

  1. job_id,比如 t58ef2f09 变更为 k58ef2f09data_dirpublic_tmp_dir 目录下对应文件夹名字都要修改。
  2. public_tmp_dir 目录下,需要把例如 k58ef2f09 文件夹里的 done.jsont58ef2f09 都替换为 k58ef2f09

species

参考基因组“数据库”配置。 需要放在 conf/b2dsc/species 目录下。

每个数据库一个配置文件,比如 example.json 是 example 数据库的配置文件。

注意:当前,数据库名必须只含 words,也就是: 英文字母 A-Za-z + 数字 0-9 + 下划线 _

配置举例说明:

# 实际配置文件中不可以添加注释!
{
    // blast database,必需配置
    "blastdb": {
        "db_dir": "data/b2dsc/blastdb/example", // 所在目录
        // 需要给每条染色体建一个 blast database
        "chromosome": {
            // 每个亚基因组一个散列,即使物种没有什么亚基因组说法,也要虚构一个。
            // 比如这里虚构了个 A 亚基因组。
            "A": {
                // 1A 染色体
                "1": [
                    "1A", // blast database 的名字
                    "chr1A" // 构建该 blast database 的染色体序列的名字
                ],
                "2": [
                    "2A",
                    "chr2A"
                ]
            },
            "B": {
                "1": [
                    "1B",
                    "chr1B"
                ],
                "2": [
                    "2B",
                    "chr2B"
                ]
            }
        }
    },
    // 必需配置
    "chromosome_length": {
        // 同样,每个“亚基因组”一个数组。
        "A": [
            10000000, // 1A 染色体长度
            12345678 // 2A 的
        ],
        "B": [
            17638475,
            18234567
        ]
    },
    // 染色体中未知碱基 N 的分布数据,必需配置
    // unknown or unspecified
    "N": {
        "dir": "data/b2dsc/Ns/example", // 目录
        "chromosome": {
            // 同样,每个“亚基因组”一个散列。
            "A": {
                "1": "1A.tab", // 存储了 1A 数据的文件
                "2": "2A.tab"
            },
            "B": {
                "1": "1B.tab",
                "2": "2B.tab"
            }
        }
    },
    // 可选配置
    // 参考序列,比如着丝粒特异重复序列
    // 这里以植物端粒富集的简单重复序列为例
    "refSeq": {
        "id": "telomeric", // 序列 id
        "desc": "guanosine-rich (T3AG3) telomeric DNA ", // 描述
        "seq": "TTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGG", // 序列
        "jid": "k4example", // 任务 id,不可以 t 开头。
        "pident": 85, // 对参考序列分析时使用的过滤参数值
        "qcovhsp": 80 // 对参考序列分析时使用的过滤参数值
    }
}

可以参考 example.json.rawwheat.json.example

总的来说,有 4 块:

  1. blastdb: 必需项
  2. chromosome_length: 必需项
  3. Ns: 必需项
  4. refSeq: 可选项
获取未知碱基的分布数据

未知碱基(unknown or unspecified nucleotides,在 FASTA 格式中一般以 N 表示)。

可以用 utils/find-N.pl 来获取染色体序列中 N 的分布。

# 不加参数会打印使用帮助
path/to/find-N.pl
# 这里 path/to 的意思是到 find-N.pl 的路径

# 使用示例
path/to/find-N.pl -i chr1A.fa -o 1A.tab
# 假设 chr1A.fa 是小麦 1A 染色体的组装序列。

utils/find-N.pl 的输出像这样:

2177	3362
17309	18396
26330	26802
34715	35573
43285	44122
52064	52817

只有两列数据,记录每一段 N 的起止。

Faidx

每个参考基因组一个配置文件,放置于 conf/faidx 目录下。

配置举例说明:

# 实际配置文件中不可以添加注释!
{
    "dir": "data/b2dsc/fasta/example", // FASTA 文件目录
    "chromosome": {
        // 同样,每个“亚基因组”一个散列。
        "A": {
            "1": [
                "chr1A.fa", // FASTA 序列文件,当然也可以是 bgzip 构建的压缩文件。
                "chr1A", // 序列 id
                10000000 // 序列长度
            ],
            "2": [
                "chr2A.fa",
                "chr2A",
                12345678
            ]
        },
        "B": {
            "1": [
                "chr1B.fa",
                "chr1B",
                17638475
            ],
            "2": [
                "chr2B.fa",
                "chr2B",
                18234567
            ]
        }
    }
}

可以参考 example.json.rawwheat.json.example

安全

因为作者是个野生的半瓶水程序员,可能会存在安全问题。

版权

© 2019 lhtk

协议

Artistic License 2.0。

引用

如果您在工作中使用 B2DSC,请引用:

  • Tao Lang, Guangrong Li, Hongjin Wang, Zhihui Yu, Qiheng Chen, Ennian Yang, Shulan Fu, Zongxiang Tang, Zujun Yang. Physical location of tandem repeats in the wheat genome and application for chromosome identification[J]. Planta, 2019, 249(3): 663-675. DOI:10.1007/s00425-018-3033-4

和/或

  • 郎涛. 小麦及其近缘物种串联重复序列的全基因组发掘与染色体区段鉴定[D]. 电子科技大学, 2019, 14-34
Artistic License 2.0 Copyright (c) 2019 lhtk Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble This license establishes the terms under which a given free software Package may be copied, modified, distributed, and/or redistributed. The intent is that the Copyright Holder maintains some artistic control over the development of that Package while still keeping the Package available as open source and free software. You are always permitted to make arrangements wholly outside of this license directly with the Copyright Holder of a given Package. If the terms of this license do not permit the full use that you propose to make of the Package, you should contact the Copyright Holder and seek a different licensing arrangement. Definitions "Copyright Holder" means the individual(s) or organization(s) named in the copyright notice for the entire Package. "Contributor" means any party that has contributed code or other material to the Package, in accordance with the Copyright Holder's procedures. "You" and "your" means any person who would like to copy, distribute, or modify the Package. "Package" means the collection of files distributed by the Copyright Holder, and derivatives of that collection and/or of those files. A given Package may consist of either the Standard Version, or a Modified Version. "Distribute" means providing a copy of the Package or making it accessible to anyone else, or in the case of a company or organization, to others outside of your company or organization. "Distributor Fee" means any fee that you charge for Distributing this Package or providing support for this Package to another party. It does not mean licensing fees. "Standard Version" refers to the Package if it has not been modified, or has been modified only in ways explicitly requested by the Copyright Holder. "Modified Version" means the Package, if it has been changed, and such changes were not explicitly requested by the Copyright Holder. "Original License" means this Artistic License as Distributed with the Standard Version of the Package, in its current version or as it may be modified by The Perl Foundation in the future. "Source" form means the source code, documentation source, and configuration files for the Package. "Compiled" form means the compiled bytecode, object code, binary, or any other form resulting from mechanical transformation or translation of the Source form. Permission for Use and Modification Without Distribution (1) You are permitted to use the Standard Version and create and use Modified Versions for any purpose without restriction, provided that you do not Distribute the Modified Version. Permissions for Redistribution of the Standard Version (2) You may Distribute verbatim copies of the Source form of the Standard Version of this Package in any medium without restriction, either gratis or for a Distributor Fee, provided that you duplicate all of the original copyright notices and associated disclaimers. At your discretion, such verbatim copies may or may not include a Compiled form of the Package. (3) You may apply any bug fixes, portability changes, and other modifications made available from the Copyright Holder. The resulting Package will still be considered the Standard Version, and as such will be subject to the Original License. Distribution of Modified Versions of the Package as Source (4) You may Distribute your Modified Version as Source (either gratis or for a Distributor Fee, and with or without a Compiled form of the Modified Version) provided that you clearly document how it differs from the Standard Version, including, but not limited to, documenting any non-standard features, executables, or modules, and provided that you do at least ONE of the following: (a) make the Modified Version available to the Copyright Holder of the Standard Version, under the Original License, so that the Copyright Holder may include your modifications in the Standard Version. (b) ensure that installation of your Modified Version does not prevent the user installing or running the Standard Version. In addition, the Modified Version must bear a name that is different from the name of the Standard Version. (c) allow anyone who receives a copy of the Modified Version to make the Source form of the Modified Version available to others under (i) the Original License or (ii) a license that permits the licensee to freely copy, modify and redistribute the Modified Version using the same licensing terms that apply to the copy that the licensee received, and requires that the Source form of the Modified Version, and of any works derived from it, be made freely available in that license fees are prohibited but Distributor Fees are allowed. Distribution of Compiled Forms of the Standard Version or Modified Versions without the Source (5) You may Distribute Compiled forms of the Standard Version without the Source, provided that you include complete instructions on how to get the Source of the Standard Version. Such instructions must be valid at the time of your distribution. If these instructions, at any time while you are carrying out such distribution, become invalid, you must provide new instructions on demand or cease further distribution. If you provide valid instructions or cease distribution within thirty days after you become aware that the instructions are invalid, then you do not forfeit any of your rights under this license. (6) You may Distribute a Modified Version in Compiled form without the Source, provided that you comply with Section 4 with respect to the Source of the Modified Version. Aggregating or Linking the Package (7) You may aggregate the Package (either the Standard Version or Modified Version) with other packages and Distribute the resulting aggregation provided that you do not charge a licensing fee for the Package. Distributor Fees are permitted, and licensing fees for other components in the aggregation are permitted. The terms of this license apply to the use and Distribution of the Standard or Modified Versions as included in the aggregation. (8) You are permitted to link Modified and Standard Versions with other works, to embed the Package in a larger work of your own, or to build stand-alone binary or bytecode versions of applications that include the Package, and Distribute the result without restriction, provided the result does not expose a direct interface to the Package. Items That are Not Considered Part of a Modified Version (9) Works (including, but not limited to, modules and scripts) that merely extend or make use of the Package, do not, by themselves, cause the Package to be a Modified Version. In addition, such works are not considered parts of the Package itself, and are not subject to the terms of this license. General Provisions (10) Any use, modification, and distribution of the Standard or Modified Versions is governed by this Artistic License. By using, modifying or distributing the Package, you accept this license. Do not use, modify, or distribute the Package, if you do not accept this license. (11) If your Modified Version has been derived from a Modified Version made by someone other than you, you are nevertheless required to ensure that your Modified Version complies with the requirements of this license. (12) This license does not grant you the right to use any trademark, service mark, tradename, or logo of the Copyright Holder. (13) This license includes the non-exclusive, worldwide, free-of-charge patent license to make, have made, use, offer to sell, sell, import and otherwise transfer the Package with respect to any patent claims licensable by the Copyright Holder that are necessarily infringed by the Package. If you institute patent litigation (including a cross-claim or counterclaim) against any party alleging that the Package constitutes direct or contributory patent infringement, then this Artistic License to you shall terminate on the date that such litigation is filed. (14) Disclaimer of Warranty: THE PACKAGE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS "AS IS' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES. THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT ARE DISCLAIMED TO THE EXTENT PERMITTED BY YOUR LOCAL LAW. UNLESS REQUIRED BY LAW, NO COPYRIGHT HOLDER OR CONTRIBUTOR WILL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THE PACKAGE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

简介

Get the distribution of homologous sequences on chromosomes, based upon BLASTN. 展开 收起
Perl 等 5 种语言
Artistic-2.0
取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
Perl
1
https://gitee.com/lhtk/b2dsc.git
git@gitee.com:lhtk/b2dsc.git
lhtk
b2dsc
b2dsc
master

搜索帮助

14c37bed 8189591 565d56ea 8189591