# Word-Dict **Repository Path**: EmpireDesert/Word-Dict ## Basic Information - **Project Name**: Word-Dict - **Description**: 构建中文词频词典-搜索引擎式切词(create chinese word dict of freq by segnment of search) - **Primary Language**: Python - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-09-11 - **Last Updated**: 2021-09-11 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # [Word-Dict](https://github.com/yongzhuo/Word-Dict) 构建中文词频词典-搜索引擎式切词(create chinese word dict of freq by segnment of search) # 使用 * 1. 收集到的语料(例如:[nlp_chinese_corpus](https://github.com/brightmart/nlp_chinese_corpus))切成一句一句的(不要太长), 放到corpus目录下(例子如corpus_1.txt) * 2. 运行python cut_search.py * 3. 单独处理单个字的情况,使之不要太长了 # data * 完整words_1000w.txt数据在百度网盘, 地址: [https://pan.baidu.com/s/1I3vydhmFEQ9nuPG2fDou8Q](https://pan.baidu.com/s/1I3vydhmFEQ9nuPG2fDou8Q) 提取码: rket