# WebCrawler

**Repository Path**: torchcode/spider

## Basic Information

- **Project Name**: WebCrawler
- **Description**: 网络爬虫学习的案例
- **Primary Language**: Python
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2017-03-27
- **Last Updated**: 2020-12-19

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

### 网络爬虫的些小实例  

爬取妹子图：  
需要安装：  
requests  
BeautifulSoup  
lxml  
2017-03-01：  
爬取的网站： http://www.mzitu.com/all	网址已经不能访问了，所以。。。。。  

2017-04-01：  
修改成主页面抓取图片，有两个抓取方式  
1、正则表达式  
2、BeautifulSoup模块  
  
  
抓取糗事百科段子（文字）：  
需要安装：  
模块：  
pymongo  
BeautifulSoup  
lxml  

数据库：  
mongodb  
  
将糗事百科的文字段子抓取出来并保存到MongoDB中并增加去重复的处理