# puppeteerUsage **Repository Path**: node-project-base/puppeteer-usage ## Basic Information - **Project Name**: puppeteerUsage - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-03-29 - **Last Updated**: 2021-12-08 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README **Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)* - [
Puppeteer demo
](#p-stylecolorb22222puppeteer-demop) - [Installation
](#p-stylecolorba55d3installationp) - [Then: install `mongoose` and `puppeteer` and `fs` modules](#then-install-mongoose-and-puppeteer-and-fs-modules) - [Modules Introduction
](#p-stylecolor9400d3modules-introductionp) - [Finally, let's install these modules
](#p-stylecolor9400d3finally-lets-install-these-modulesp) - [Run demo
](#p-stylecolorba55d3run-demop) - [Code Thinking Analysis
](#p-stylecolorba55d3code-thinking-analysisp) - [- StepOne : `require modules`](#--span-stylecolor9400d3stepone--require-modulesspan) - [- StepTwo :`write code for database`](#--span-stylecolor9400d3steptwo-write-code-for-databasespan) - [`db.js:`](#dbjs) - [`shopSchema.js`](#shopschemajs) - [`shopModel.js`](#shopmodeljs) - [- StepThree: `crawling data`](#--span-stylecolor9400d3stepthree-crawling-dataspan) - [`index.js`](#crawlselldatajs) - [The End
](#p-stylecolorredthe-endp) #Puppeteer demo
Puppeteer demo is a project about `crawl webpage data`. Before using, you need to do some installation and configuration to ensure that you can run this demo. So let's do these. --- ##Installation
First install [Node.js](https://npm.taobao.org/mirrors/node/v14.16.0/node-v14.16.0-x64.msi) and [MongoDB](https://fastdl.mongodb.org/windows/mongodb-windows-x86_64-4.4.4-signed.msi). Then: install `mongoose` and `puppeteer` and `fs` modules --- ###Modules Introduction
+ `Mongoose:` `mongoose` is a module to operate `MongoDB database`. Use this module, you can `connect`,`add`,`modify` and `query` the data you want from DataBase. If you don't know this module, please browse the relevant module documentation [Mongoose](https://mongoosejs.com/docs/guide.html). --- + `Puppeteer: ` `puppeteer` is a module for crawling web data, it's can be used not only for crawling static web data, but also for crawling dynamic web data. Moreover, it is an automatic testing tool, as long as the execution steps are written in the script, it will automatically execute according to the script steps.If you don't know this module, please browse the relevant module documentation [Puppeteer](https://learnku.com/docs/puppeteer/3.1.0/overview/8537). - `Tip:` You need to use `cnpm` command to install `puppeteer` module, because it is bigger and if you use `npm` command to download, it will slowly. If you don't know this module, please browse the relevant module documentation [fs](http://nodejs.cn/api/fs.html#fs_file_system). --- + `fs:` `fs` is a module to operate a file, you can readFile, writeFile and so on. ###Finally, let's install these modules
```shell $ npm install mongoose --save ``` ```shell $ npm install fs ``` ```shell $ cnpm install puppeteer --save ``` ##Run demo
Open command shell and run ```shell $ node index.js ``` or ```shell $ node crawSellData.js ``` ##Code Thinking Analysis
---  ### - StepOne : `require modules` `index.js:` ```javascript //require modules const puppeteer = require('puppeteer') const fs = require('fs') const shopmodel = require('./database/model/shopModel') ``` ### - StepTwo :`write code for database` #### `db.js:` ```javascript const mongoose = require('./database/connect/db') mongoose.connect('mongodb://locahost:27017/mall', err => { if(err) throw err console.log('database connect success') }) module.exports = mongoose ``` #### `shopSchema.js` ```javascript const mongoose = require('./database/connect/db') const schema = mongoose.Schema const shopSchema = new schema({ type:String, page:Number, shopData:Array }) module.exports = shopSchema ``` #### `shopModel.js` ```javascript const mongoose = require('../connect/db') const shopSchema = require('../schema/shopSchema') const shopModel = mongoose.model('shopData',shopSchema) module.exports = shopModel ``` ### - StepThree: `crawling data` #### `index.js` ```javascript //require puppeteer module const puppeteer = require('puppeteer') //Require fs module const fs = require('fs') //Require shopModel module const shopModel = require('./database/model/shopModel') //Create an asynchronous function and execute it ;(async ()=> { //Use the puppeteer.launch function of puppeteer to create browser and open chromium const browser = await puppeteer.launch({ headless:false, defaultViewport:{width:1920,height:1080}, executablePath:'D:\\chromium\\win64-856583\\chrome-win\\chrome.exe', args:[`--window-size=1920,1080`] }) //Create a page through the browser const page = await browser.newPage() //Navigate the page to the page where you want to get the data await page.goto('https://list.mogu.com/book/clothing/50240?acm=3.mce.1_10_1ko4s.132244.0.jjCFbst1u2nSl.pos_871-m_482170-sd_119&ptp=31.v5mL0b._head.0.4vUAFJwz') //Get elements from page const target = await page.$('.wall_nav_box a:nth-child(2)') //One second later, click element await target.click({delay:1000}) //Scroll the page to specified position await page.mouse.wheel({deltaY:1000}) //Wait for selector await page.waitForSelector('.goods_list_mod .goods_item:nth-child(40) .fill_img') //execute something await page.$$eval('.goods_list_mod .goods_item',elements=>{ let data = [] //Traversal element elements.map((el, index)=>{ data.push({ "product_id":index+1, //Get element textContent "title":getElementValue(el,'title',0), //Get element attribute "imagePath":getElementValue(el,'fill_img',0,'src'), //Get element textContent "price":getElementValue(el,'price_info',0), //Get element textContent "origin_price":getElementValue(el,'org_price',0), //Get element textContent "favorite":getElementValue(el,'fav_num',0) }) }) //Create function to get element value function getElementValue (element,target,item, operation) { //Get elements based on parameters let v = element.getElementsByClassName(target).item(item) //Check whether the element is obtained if(v===null) {} else { //After obtaining the element, judge whether to obtain the attribute value of the element. If necessary, // return the corresponding attribute value. Otherwise, return the text value if(operation==='src'){ return v.getAttribute('src') } return v.textContent } } return data }).then(res => { //Convert the received array into a string and write it to the sellData.json In the file fs.writeFile('./json/sellData.json',JSON.stringify(res), err => { if(err) throw err console.log('file write success') }) //Read the sellData.json File and output fs.readFile('./json/sellData.json','utf-8',(err, res)=> { if(err) throw err //Connect to the database and create a collection and write data shopModel.create({ "type":"sell", "page":1, "shopData":JSON.parse(res) }).then(res => console.log(res)) .catch(err => {throw err}) }) //console.log(res.length) }).catch(err => {throw err}) })() ``` #The End