1 Star 0 Fork 12

Mambabeing / SnowGraph

forked from 伍仕骏 / SnowGraph 
加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
MulanPSL-2.0

软件项目知识图谱的自动构建与智能问答

基本功能概览

  • Onlie Demo: http://106.75.143.22:3000/

  • 选择感兴趣的软件项目,并进入其知识图谱主页

    • 首先看到的是简介页面;
    • 点击页面左侧的“use it”标签,可以看到目前已经部署的所有软件项目知识图谱的列表;
    • 点击某个项目,进入其知识图谱主页;
    • 进入知识图谱主页后,首先看到的是该知识图谱的基本统计信息,即:各种类型的实体分别有多少个,各种类型的关联关系分别有多少条;
    • 知识图谱主页上显示了一张弦图来可视化这些基本统计信息:圆周上的每条弧代表一种类型的实体的数量,弧之间的弦代表两种实体之间的关联关系数量.
  • 智能代码搜索

    • 点击页面上方的“智能问答”标签,进入智能代码搜索页面;
    • 在搜索框中输入自然语言查询语句,系统会帮你找到与之相关的代码元素(类、接口、方法等),并给出它们之间的依赖关系图;
    • 默认支持英文查询,但对于中文的知识图谱(以"-chinese"作为名字后缀的知识图谱),也支持中文查询;
    • 代码搜索的基本原理:根据查询语句中的关键词与代码元素中的标识符以及注释中的关键词的匹配来找到候选的代码元素集合,之后根据它们之间的依赖关系远近选取出最合适的一个子图作为搜索结果.
    • 参考文献: Graph Embedding Based Code Search in Software Project, Internetware'18
  • 知识图谱的可视化浏览

    • 点击某个实体结点,页面右侧会显示这些实体中包含的具体属性;
    • 可以点击这些属性,以展开其中的具体文本内容;
    • 选中实体结点之后,可以在页面的右上方选取感兴趣的关联关系类型,从而在页面中浏览与该实体结点具有这种关联关系的其它实体结点;
    • 例如,如果我想看某个类有没有相关的文档,可以选取“codeMention”这种类型的关联关系;
    • 如果待显示的新结点比较多,系统会按照内置的优先级,每次点击“expand”按钮后显示出若干个,直到全部都以及显示出来为止.
  • 智能文档搜索

    • 点击页面上方的“语义搜索”标签,进入智能文档搜索页面;
    • 在搜索框中输入自然语言查询语句,系统会帮你找到与之相关的文档.

构建与部署

编译构建

本系统的依赖环境包括:

  1. Java 1.8+ (用于知识图谱的自动构建与后端服务器的运行)
  2. Node.js (用于前端服务器的运行)
  3. Maven 3.2+ (optional,用于从源代码开始编译构建)
  4. Python 3 (optional,用于word文档数据的预处理)

可以使用maven从源代码开始进行编译构建:

mvn package

或在此处下载已编译好的jar包.

数据准备

  1. Java源代码数据

    将整个项目的源代码统一解压到一个文件夹中即可.

  2. git版本库数据

    给出该项目的.git文件夹即可.

    (SVN版本库数据的处理方式见FAQ)

  3. html文档数据

    统一放在同一个文件夹中即可. 对于docx文档,可以使用此python脚本将其预处理为html格式.

  4. pptx演示文稿数据

    统一放在同一个文件夹中即可.

自动构建知识图谱

  1. 编写yaml配置文件

    在任意目录中新建一个yml文件,在该文件中配置:(1)知识图谱的输出文件夹路径;(2)需要允许哪些知识抽取模块;(3)这些知识抽取模块所输入的源数据的路径. 配置文件的示例如下:

    graphDir: E:/graph.db  # 知识图谱的输出文件夹路径,如果需要中文支持,该路径需要由"-chinese"来结尾
    
    # 依次执行如下数据解析插件
    cn.edu.pku.sei.intellide.graph.extraction.java.JavaExtractor: E:/data/src
    cn.edu.pku.sei.intellide.graph.extraction.git.GitExtractor: E:/data/.git
    cn.edu.pku.sei.intellide.graph.extraction.html.HtmlExtractor: E:/data/html
    cn.edu.pku.sei.intellide.graph.extraction.pptx.PptxExtractor: E:/data/pptx
    
    # 依次执行如下知识关联与挖掘插件
    cn.edu.pku.sei.intellide.graph.extraction.tokenization.TokenExtractor:
    cn.edu.pku.sei.intellide.graph.extraction.code_mention.CodeMentionExtractor:
    cn.edu.pku.sei.intellide.graph.extraction.doc_link.DocLinkExtractor:
  2. 运行如下命令,自动生成知识图谱

    java -jar intellide-graph.jar -gen {yml_config_path}

    运行完毕之后,可以在配置文件中所指定的输出文件夹路径中生成neo4j图数据库格式的知识图谱.

启动web服务

  1. 知识图谱准备

    将所有需要运行的知识图谱文件夹放到统一的一个文件目录下,例如:

    E:/graphs/lucene
    E:/graphs/jfreechart
    E:/graphs/poi
    ...

    在任意目录中新建一个json文件,描述这些知识图谱,例如:

    [
        {"name": "lucene", "description": "apache-lucene, a java library for text indexing"},
        {"name": "jfreechart", "description": "jfreechart, a java library for drawing diagrams"},
        {"name": "poi", "description": "apache-poi, a java library for editing Microsoft Office files"}
    ]

    编辑intellide-graph.jar中的BOOT-INF/classes/application.properties文件,例如:

    server.port=8004
    
    graphDir= E:/graphs/
    dataDir= E:/tmp/  # 临时文件存储路径
    infoDir= E:/graphs/graphs.json  #知识图谱描述文件
  2. 运行如下命令,启动后端服务器:

    java -Xms1024m -Xmx4096m -XX:MaxPermSize=2048m -XX:MaxNewSize=2048m -jar intellide-graph.jar -exec
  3. 启动前端服务器

    前端项目:woooking/snowview (intelli-graph branch)

    src/config.ts中配置后端服务器的URL

    编译项目:npm install

    启动前端服务器:npm start

  4. 浏览器访问:

    http://localhost:3000/
木兰宽松许可证, 第2版 2020年1月 http://license.coscl.org.cn/MulanPSL2 您对“软件”的复制、使用、修改及分发受木兰宽松许可证,第2版(“本许可证”)的如下条款的约束: 0. 定义 “软件” 是指由“贡献”构成的许可在“本许可证”下的程序和相关文档的集合。 “贡献” 是指由任一“贡献者”许可在“本许可证”下的受版权法保护的作品。 “贡献者” 是指将受版权法保护的作品许可在“本许可证”下的自然人或“法人实体”。 “法人实体” 是指提交贡献的机构及其“关联实体”。 “关联实体” 是指,对“本许可证”下的行为方而言,控制、受控制或与其共同受控制的机构,此处的控制是指有受控方或共同受控方至少50%直接或间接的投票权、资金或其他有价证券。 1. 授予版权许可 每个“贡献者”根据“本许可证”授予您永久性的、全球性的、免费的、非独占的、不可撤销的版权许可,您可以复制、使用、修改、分发其“贡献”,不论修改与否。 2. 授予专利许可 每个“贡献者”根据“本许可证”授予您永久性的、全球性的、免费的、非独占的、不可撤销的(根据本条规定撤销除外)专利许可,供您制造、委托制造、使用、许诺销售、销售、进口其“贡献”或以其他方式转移其“贡献”。前述专利许可仅限于“贡献者”现在或将来拥有或控制的其“贡献”本身或其“贡献”与许可“贡献”时的“软件”结合而将必然会侵犯的专利权利要求,不包括对“贡献”的修改或包含“贡献”的其他结合。如果您或您的“关联实体”直接或间接地,就“软件”或其中的“贡献”对任何人发起专利侵权诉讼(包括反诉或交叉诉讼)或其他专利维权行动,指控其侵犯专利权,则“本许可证”授予您对“软件”的专利许可自您提起诉讼或发起维权行动之日终止。 3. 无商标许可 “本许可证”不提供对“贡献者”的商品名称、商标、服务标志或产品名称的商标许可,但您为满足第4条规定的声明义务而必须使用除外。 4. 分发限制 您可以在任何媒介中将“软件”以源程序形式或可执行形式重新分发,不论修改与否,但您必须向接收者提供“本许可证”的副本,并保留“软件”中的版权、商标、专利及免责声明。 5. 免责声明与责任限制 “软件”及其中的“贡献”在提供时不带任何明示或默示的担保。在任何情况下,“贡献者”或版权所有者不对任何人因使用“软件”或其中的“贡献”而引发的任何直接或间接损失承担责任,不论因何种原因导致或者基于何种法律理论,即使其曾被建议有此种损失的可能性。 6. 语言 “本许可证”以中英文双语表述,中英文版本具有同等法律效力。如果中英文版本存在任何冲突不一致,以中文版为准。 条款结束 如何将木兰宽松许可证,第2版,应用到您的软件 如果您希望将木兰宽松许可证,第2版,应用到您的新软件,为了方便接收者查阅,建议您完成如下三步: 1, 请您补充如下声明中的空白,包括软件名、软件的首次发表年份以及您作为版权人的名字; 2, 请您在软件包的一级目录下创建以“LICENSE”为名的文件,将整个许可证文本放入该文件中; 3, 请将如下声明文本放入每个源文件的头部注释中。 Copyright (c) [Year] [name of copyright holder] [Software Name] is licensed under Mulan PSL v2. You can use this software according to the terms and conditions of the Mulan PSL v2. You may obtain a copy of Mulan PSL v2 at: http://license.coscl.org.cn/MulanPSL2 THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT, MERCHANTABILITY OR FIT FOR A PARTICULAR PURPOSE. See the Mulan PSL v2 for more details. Mulan Permissive Software License,Version 2 Mulan Permissive Software License,Version 2 (Mulan PSL v2) January 2020 http://license.coscl.org.cn/MulanPSL2 Your reproduction, use, modification and distribution of the Software shall be subject to Mulan PSL v2 (this License) with the following terms and conditions: 0. Definition Software means the program and related documents which are licensed under this License and comprise all Contribution(s). Contribution means the copyrightable work licensed by a particular Contributor under this License. Contributor means the Individual or Legal Entity who licenses its copyrightable work under this License. Legal Entity means the entity making a Contribution and all its Affiliates. Affiliates means entities that control, are controlled by, or are under common control with the acting entity under this License, ‘control’ means direct or indirect ownership of at least fifty percent (50%) of the voting power, capital or other securities of controlled or commonly controlled entity. 1. Grant of Copyright License Subject to the terms and conditions of this License, each Contributor hereby grants to you a perpetual, worldwide, royalty-free, non-exclusive, irrevocable copyright license to reproduce, use, modify, or distribute its Contribution, with modification or not. 2. Grant of Patent License Subject to the terms and conditions of this License, each Contributor hereby grants to you a perpetual, worldwide, royalty-free, non-exclusive, irrevocable (except for revocation under this Section) patent license to make, have made, use, offer for sale, sell, import or otherwise transfer its Contribution, where such patent license is only limited to the patent claims owned or controlled by such Contributor now or in future which will be necessarily infringed by its Contribution alone, or by combination of the Contribution with the Software to which the Contribution was contributed. The patent license shall not apply to any modification of the Contribution, and any other combination which includes the Contribution. If you or your Affiliates directly or indirectly institute patent litigation (including a cross claim or counterclaim in a litigation) or other patent enforcement activities against any individual or entity by alleging that the Software or any Contribution in it infringes patents, then any patent license granted to you under this License for the Software shall terminate as of the date such litigation or activity is filed or taken. 3. No Trademark License No trademark license is granted to use the trade names, trademarks, service marks, or product names of Contributor, except as required to fulfill notice requirements in section 4. 4. Distribution Restriction You may distribute the Software in any medium with or without modification, whether in source or executable forms, provided that you provide recipients with a copy of this License and retain copyright, patent, trademark and disclaimer statements in the Software. 5. Disclaimer of Warranty and Limitation of Liability THE SOFTWARE AND CONTRIBUTION IN IT ARE PROVIDED WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL ANY CONTRIBUTOR OR COPYRIGHT HOLDER BE LIABLE TO YOU FOR ANY DAMAGES, INCLUDING, BUT NOT LIMITED TO ANY DIRECT, OR INDIRECT, SPECIAL OR CONSEQUENTIAL DAMAGES ARISING FROM YOUR USE OR INABILITY TO USE THE SOFTWARE OR THE CONTRIBUTION IN IT, NO MATTER HOW IT’S CAUSED OR BASED ON WHICH LEGAL THEORY, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. 6. Language THIS LICENSE IS WRITTEN IN BOTH CHINESE AND ENGLISH, AND THE CHINESE VERSION AND ENGLISH VERSION SHALL HAVE THE SAME LEGAL EFFECT. IN THE CASE OF DIVERGENCE BETWEEN THE CHINESE AND ENGLISH VERSIONS, THE CHINESE VERSION SHALL PREVAIL. END OF THE TERMS AND CONDITIONS How to Apply the Mulan Permissive Software License,Version 2 (Mulan PSL v2) to Your Software To apply the Mulan PSL v2 to your work, for easy identification by recipients, you are suggested to complete following three steps: Fill in the blanks in following statement, including insert your software name, the year of the first publication of your software, and your name identified as the copyright owner; Create a file named "LICENSE" which contains the whole context of this License in the first directory of your software package; Attach the statement to the appropriate annotated syntax at the beginning of each source file. Copyright (c) [Year] [name of copyright holder] [Software Name] is licensed under Mulan PSL v2. You can use this software according to the terms and conditions of the Mulan PSL v2. You may obtain a copy of Mulan PSL v2 at: http://license.coscl.org.cn/MulanPSL2 THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT, MERCHANTABILITY OR FIT FOR A PARTICULAR PURPOSE. See the Mulan PSL v2 for more details.

简介

本项目由北京大学软件工程研究所负责,主要设计并实现了软件项目知识图谱自动构造,基于软件知识图谱的智能问答(QA),基于软件知识图谱的关联制品推荐这三个工具。 展开 收起
Java
MulanPSL-2.0
取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
Java
1
https://gitee.com/Mambabeing/intellide-graph.git
git@gitee.com:Mambabeing/intellide-graph.git
Mambabeing
intellide-graph
SnowGraph
master

搜索帮助