# Python

**Repository Path**: lwk178/Python

## Basic Information

- **Project Name**: Python
- **Description**: 学习Python
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2023-10-28
- **Last Updated**: 2023-11-26

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

>课程链接：https://www.bilibili.com/video/BV1Db4y1m7Ho?p=7&spm_id_from=pageDriver&vd_source=8993c1ef5f01eab4a49c039b6280eba2

## 一、创建scrapy项目

### 1、创建爬虫项目
scrapy startproject 项目名    
eg:  
``` shell
scrapy startproject scrapy_01
```

### 2、创建爬虫文件
进入项目的spiders，scrapy genspider 文件名 要爬取的网页  
eg: 
``` shell
scrapy genspider baidu www.baidu.com
```

### 3、运行爬虫代码    
scrapy crawl 爬虫名字 
eg: 
``` shell 
scrapy crawl baidu
```

## 二、scrapy项目介绍

### 1、项目的结构
![img.png](temp/imges/img_1.png)
### 2、response的属性和方法
* response.text 获取字符串
* response.body 获取二进制
* response.xpath() 解析内容
* response.extract() 提取selector对象的data属性值
* response.extract_first() 提取selector列表的第一个数据
### 3、scrapy的架构组成
![img.png](temp/imges/img_2.png)
### 3、scrapy的工作原理
![img.png](temp/imges/img.png)

### 4、scrapy shell
进入到scrapy shell(scrapy shell 地址)：`scrapy shell www.baidu.com`
可以用来调试
![img.png](temp/imges/img_3.png)

## 三、scrapy 链接提取器
![17010049997711701004998979.png](https://fastly.jsdelivr.net/gh/liweikangL/md_img@main/blog_img/17010049997711701004998979.png)
### 1、创建爬虫项目
scrapy startproject 项目名    
eg:
``` shell
scrapy startproject scrapy_readbook_101
```
### 2、跳转到spiders文件夹
``` shell
cd scrapy_readbook_101/scrapy_readbook_101/spiders 
```
### 3、创建爬虫文件
scrapy genspider -t crawl 爬虫文件名字 爬取的地址
eg:
``` shell
scrapy genspider -t crawl read https://www.dushu.com/book/1181.html
```