# webspider-crawler-practice

**Repository Path**: explorer_ading/webspider-crawler-practice

## Basic Information

- **Project Name**: webspider-crawler-practice
- **Description**: 专用于网络爬虫实践练习。
- **Primary Language**: Python
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2019-02-21
- **Last Updated**: 2022-07-01

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# webspider-crawler-practice

## Description
专用于网络爬虫实践练习。


## webspider 
### requests spider 
`pip3 install requests bs4`

* [Beautifulsoup4](https://pypi.org/project/beautifulsoup4/)
	bs does not support nested tag search as html parsing.

### selenium 
`pip3 install selenium`

* install chromedriver 
`pip3 install chromedriver-binary`
or download chromedriver from [here](https://sites.google.com/a/chromium.org/chromedriver/downloads)

* headless browser driver selections 
	PhantomJS
	NodeJS
	HtmlUnit


## scrapy 
### python-scrapy 
`pip3 install Scrapy`

### Quick start
* create a project
`scrapy startproject myScrapyProj`                   
`cd myScrapyProj`                                                                                                                                                                                                 
`scrapy genspider example example.com` , it will generate a example.py in myScrapyProj/spiders folder.     

* how to run
`scrapy crawl example`


## reference
* [similar-sites](https://www.similarsites.com)