# django_scrapyd **Repository Path**: tuyutian/django_scrapyd ## Basic Information - **Project Name**: django_scrapyd - **Description**: 使用django和scrapyd结合开发的爬虫 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 1 - **Created**: 2020-05-16 - **Last Updated**: 2021-09-22 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README pip install -r requirements.txt #x-www-form-urlencoded #post #url = http://127.0.0.1:8000/spider/scrapy #运行环境 scrapyd,scrapydweb,django,logparser,selenium windows下需要安装pywin32,chrome浏览器,chromedriver.exe已放于根目录 chromedriver需要与浏览器版本匹配,目录下版本为 81.0.4044.69 chromedriver下载地址http://npm.taobao.org/mirrors/chromedriver/ 运行命令 一定要进入进入 ContentSpider : scrapyd 根目录 py manage.py runserver 8500 根目录 scraydweb 进入 ContentSpider : logparser -dir E:/xxx/scrapy_site/ContentSpider/logs 还有两个setting文件的配置项需要改 然后开启队列监听scheduler.py 百度图片存储: 域名https://cdnimg.xxxxx bucketName ixxxx Access Key c222b7a0936049xxxxx Secret Key fe95ef95a3294973xxxxx #postman参数json示例 { "add_time": "2020-04-22", "allowed_domains": " ", "cate_id": 4, "charset": "uft-8", "id": 1, "list_xpath": " .//u/li/a/@href", "rules": [ { "match": ".//h1[@class=\"main-title\"]/text()", "name": "title" }, { "key": 1587637191655, "match": ".//div[@class=\"date-source\"]/a[@class=\"source ent-source\"]/text()", "name": "author", "value": "" }, { "key": 1587637233828, "match": ".//div[@class=\"channel-path\"]/a[2]/text()", "name": "tag", "value": "" }, { "key": 1587637245922, "match": ".//div[@id=\"artibody\"]", "name": "content", "value": "" }, { "key": 1587637281180, "match": ".//div[@class=\"date-source\"]/span[@class=\"date\"]/text()", "name": "create_time", "value": "" } ], "spider_name": "新浪体育中超列表爬虫", "start_urls": "http://sports.sina.com.cn/csl/", "update_time": "2020-04-22", "url_contain": " ", "url_no_contain": " ", "url_type": 1 }