# holocene **Repository Path**: khazeus/holocene ## Basic Information - **Project Name**: holocene - **Description**: holocene for news search - **Primary Language**: Unknown - **License**: MulanPSL-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2022-05-25 - **Last Updated**: 2022-06-12 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Holocene ## Placement of News Data The jsonl file of all news crawled will be placed side-by-side by the root folder of Holocene. Therefore the path should be "../xxx.jsonl". ## Frontend Start First ensure that nodejs version of 12.18.3 is installed. Users could utilize 'n' package manager to install the correct version. ``` cd holocene-front; npm install; npm run start; ``` ## Backend Start ``` pip install -r requirements.txt; cd holocene_back; python -u manage.py runserver 8080; ``` ## Crawler start spider_name could be chosen in [APNews, ChinaDaily, FoxNews, GlobalTimes, USAToday] ``` cd crawlers/Webpage_Crawler; scrapy crawl [spider_name]; ``` ## Elasticsearch Start First download the latest elasticsearch from the official website https://www.elastic.co/cn/downloads/elasticsearch. Then modify the config/elasticsearch.yml file in the distribution to disable SSL support and any security related functions. ``` cd elasticsearch-[version]; ./bin/elasticsearch; ``` Wait for the elasticsearch to be online and then insert all the data collected. ``` cd holocene_back; python es_search/es_save.py; ```