代码拉取完成,页面将自动刷新
1. The document index (Doc.idx) keeps information about each document. It is a fixed width ISAM (Index sequential access mode) index, orderd by docID. The information stored in each entry includes a pointer into the repository, a document length, a document checksum. The url index (url.idx) is used to convert URLs into docIDs. It is a list of URL checksums with their corresponding docIDs and is sorted by checksum. In order to find the docID of a particular URL, the URL's checksum is computed and a binary search is performed on the checksums file to find its docID. ./DocIndex got Doc.idx, Url.idx, DocId2Url.idx 2. sort Url.idx|uniq > Url.idx.sort_uniq 3. Segment document to terms, (with finding document according to the url) ./DocSegment Tianwang.raw.2559638448 got Tianwang.raw.2559638448.seg 4. Create forward index (docic-->termid) ./CrtForwardIdx Tianwang.raw.2559638448.seg > moon.fidx 5.# set | grep "LANG" LANG=en; export LANG; sort moon.fidx > moon.fidx.sort 6. Create inverted index (termid-->docid) ./CrtInvertedIdx moon.fidx.sort > sun.iidx ------------------------ provding service at http://162.105.80.60/TSE/ TSESearch CGI program for query Snapshot CGI program for page snapshot
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。