Download this Project.
Run
cd FTServer
dotnet run -c Release
Press [Ctrl+C] to shut down.
Input a Full URL to index the Page, then search.
Move page forward by re-indexing the page.
[Word1 Word2 Word3] => text has Word1 and Word2 and Word3
["Word1 Word2 Word3"] => text has "Word1 Word2 Word3" as a whole
Search [https] or [http] => get almost all pages
The results order based on the id() number in class PageText, descending order.
A Page has many PageTexts. if don't need multiple Texts, modify Html.getDefaultTexts(Page), returns only one PageText.
the Page.GetRandomContent() method is used to keep the Search-Page-Content always changing, doesn't affect the real PageText order.
Use the ID number to control the order instead of loading all pages to memory. Or load top 100 pages to memory then re-order it by favor.
search (... String keywords, long startId, long count)
startId => which ID(the id when you created PageText) to start, use (startId=Long.MaxValue) to read from the top, descending order
count => records to read, important parameter, the search speed depends on this parameter, not how big the data is.
set the startId as the last id from the results of search minus one
startId = search( "keywords", startId, count);
nextpage_startId = startId - 1 // this 'minus one' has done inside search()
...
//read next page
search("keywords", nextpage_startId, count)
mostly, the nextpage_startId is posted from client browser when user reached the end of webpage, and set the default nextpage_startId=Long.MaxValue, in javascript the big number have to write as String ("'" + nextpage_startId + "'")
Open
public Page Html.Get(String url);
Set your private WebSite text
Page page = new Page();
page.url = url;
page.title = title;
page.text = bodyText
page... = ...
return page;
[user@localhost ~]$ cat /proc/sys/fs/file-max
803882
[user@localhost ~]$ ulimit -a | grep files
open files (-n) 500000
[user@localhost ~]$ ulimit -Hn
500000
[user@localhost ~]$ ulimit -Sn
500000
[user@localhost ~]$
$ vi /etc/security/limits.conf
* hard nofile 500000
* soft nofile 500000
root hard nofile 500000
root soft nofile 500000
[user@localhost ~]$ firewall-cmd --add-port=5066/tcp --permanent
//Stop OS File Indexing, Faster
[user@localhost ~]$ tracker daemon -k
//Remove cache, it has a slow db inside
[user@localhost ~]$ rm -rf .cache/tracker/
Transplant from Full Text Search Java JSP Version
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
1. 开源生态
2. 协作、人、软件
3. 评估模型