98 Star 188 Fork 96

lewsn2008/LBTSE

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
文件
该仓库未声明开源许可证文件(LICENSE),使用请关注具体项目描述及其代码上游依赖。
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
1.  The document index (Doc.idx) keeps information about each document.
It is a fixed width ISAM (Index sequential access mode) index, orderd by docID.
The information stored in each entry includes a pointer into the repository,
a document length, a document checksum.
  The url index (url.idx) is used to convert URLs into docIDs.
It is a list of URL checksums with their corresponding docIDs and is sorted by
checksum. In order to find the docID of a particular URL, the URL's checksum
is computed and a binary search is performed on the checksums file to find its
docID.

	./DocIndex
		got Doc.idx, Url.idx, DocId2Url.idx

2.  sort Url.idx|uniq > Url.idx.sort_uniq

3. Segment document to terms, (with finding document according to the url)
	./DocSegment Tianwang.raw.2559638448
		got Tianwang.raw.2559638448.seg

4. Create forward index (docic-->termid)
	./CrtForwardIdx Tianwang.raw.2559638448.seg > moon.fidx

5.# set | grep "LANG"
LANG=en; export LANG;
sort moon.fidx > moon.fidx.sort

6. Create inverted index (termid-->docid)
	./CrtInvertedIdx moon.fidx.sort > sun.iidx


------------------------
provding service

at http://162.105.80.60/TSE/

TSESearch	CGI program for query
Snapshot	CGI program for page snapshot
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
C++
1
https://gitee.com/lewsn2008/LBTSE.git
git@gitee.com:lewsn2008/LBTSE.git
lewsn2008
LBTSE
LBTSE
master

搜索帮助