# webspider-crawler-practice **Repository Path**: explorer_ading/webspider-crawler-practice ## Basic Information - **Project Name**: webspider-crawler-practice - **Description**: 专用于网络爬虫实践练习。 - **Primary Language**: Python - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2019-02-21 - **Last Updated**: 2022-07-01 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # webspider-crawler-practice ## Description 专用于网络爬虫实践练习。 ## webspider ### requests spider `pip3 install requests bs4` * [Beautifulsoup4](https://pypi.org/project/beautifulsoup4/) bs does not support nested tag search as html parsing. ### selenium `pip3 install selenium` * install chromedriver `pip3 install chromedriver-binary` or download chromedriver from [here](https://sites.google.com/a/chromium.org/chromedriver/downloads) * headless browser driver selections PhantomJS NodeJS HtmlUnit ## scrapy ### python-scrapy `pip3 install Scrapy` ### Quick start * create a project `scrapy startproject myScrapyProj` `cd myScrapyProj` `scrapy genspider example example.com` , it will generate a example.py in myScrapyProj/spiders folder. * how to run `scrapy crawl example` ## reference * [similar-sites](https://www.similarsites.com)