# python爬虫

**Repository Path**: Zwyboat/python-crawler

## Basic Information

- **Project Name**: python爬虫
- **Description**: python简单爬虫
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2023-07-12
- **Last Updated**: 2023-07-12

## Categories & Tags

**Categories**: Uncategorized

**Tags**: Python

## README

# Python爬虫

爬虫:自动抓取互联网数据的程序 

![输入图片说明](https://foruda.gitee.com/images/1689124582198617359/8ec9747e_12491811.png "image-20230711130842892.png")

## requests网页下载库

安装 pip install requests

requests是一个优雅简单的python http库,常常用于爬虫中对网页内容的下载 

怎么发送请求呢?

#### 发送request请求有两个方法

requests.get/post(url,params,data.headers,timeout,verify,allow_redirects.cookies)

url:要下载的目标网页的url

params : 字典形式, 设置url后面的参数,比如 ?id=123&name=xiaoming

data: 字典或字符串.一般用于post方法时提交数据

headers: 设置user-agent、refer等请求头

timeout: 超时时间 ,单位是秒

verity: true/false 是否进行https证书验证,默认是,需要自己设置证书地址

allow_redirects : ture/false 是否让requests 做重定向处理,默认是

cookies : 附带本地cookies数据

#### BeautifulSoup网页解析

BeautifulSoup是一个用于解析HTML和XML文档的库，它可以帮助我们从网页中提取所需的数据。它提供了各种查找和过滤方法，使得解析和提取数据变得更加方便。Beautiful Soup自动将输入文档转换为Unicode编码，输出文档转换为utf-8编码。BeautifulSoup支持Python标准库中的HTML解析器.



 BeautifulSoup(html,"html.parser") 解析为html格式