# crawlHouse

**Repository Path**: ruiqunwang/crawlHouse

## Basic Information

- **Project Name**: crawlHouse
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2021-09-04
- **Last Updated**: 2021-09-04

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# 使用selenium和ajax分析两种方法爬虫

> 爬取目标：http://gs.nnfcxx.com/index.php?s=/home1/index/sel/postid/商品房.html

## 方法一：selenium

主要步骤为：
* 进入要爬取的项目页源码
* 点击每个楼栋和单元
* 获取点击后动态加载页的数据信息
* 使用xpath进行数据解析
* 存储到excel文件中

## 方法二：ajax分析

主要步骤为：
* 获取要爬取的项目页面源码
* 从源码中提取相应按钮的id(banids)
* 定义相应的表单数据，对固定的URL发送POST请求，POST内容为一个表单数据
* 获取POST得到的页面源码
* 进行解析存储