# CNVDSpider **Repository Path**: ty001007/CNVDSpider ## Basic Information - **Project Name**: CNVDSpider - **Description**: 使用js爬取CNVD漏洞库共享数据Crawl CNVD shared vulnerabilities with js - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 2 - **Forks**: 0 - **Created**: 2021-09-29 - **Last Updated**: 2024-12-29 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # CNVDSpider Crawl CNVD shared vulnerabilities with js 完整教程查看[博客](https://www.jianshu.com/p/1d0f634f0c86) 写论文需要用到[CNVD漏洞库](https://www.cnvd.org.cn/)的数据,然而,该页面有反爬机制,无法抓取全部数据,因此,使用js绕过反爬,实现效果如下: ![CNVD共享漏洞爬虫效果](https://upload-images.jianshu.io/upload_images/5714082-d401b7faeba1bea9.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240) 可以直接到GitHub查看[完整代码](https://github.com/you8023/CNVDSpider),欢迎留言点赞打赏提issue点star ## 环境 * windows 10 * Chrome浏览器 * Sublime Text 3代码编辑器 ## 前期准备 注册该网页账号并登陆即可 ## 需求分析 1. 首先,我们需要该漏洞库的全部漏洞数据,但是,使用python书写爬虫会被反爬机制识别到,从而无法自动大量下载数据 2. 这里,发现该网页有共享的[xml数据](https://www.cnvd.org.cn/shareData/list?max=10&offset=50) ![共享漏洞](https://upload-images.jianshu.io/upload_images/5714082-df53e0ce9e594274.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240) 因此,我们考虑从这里做文章 3. 然鹅,一个个点击下载也十分耗时,因此,考虑使用js脚本进行下载 4. 这里有两个思路: * 一是分别控制脚本挨个点击链接并翻页 * 二是直接请求每个链接获得数据 5. 这里采用第二种思路,通过查看链接发现其均为`https://www.cnvd.org.cn/shareData/download/` + `一个数字`的形式,因此,直接使用循环遍历请求即可 ## 代码编写 确定了思路之后,直接开始编写代码,但是遇到了一个问题,就是浏览器无法通过js请求直接保存为本地文件,这里借鉴了一篇[博客](https://www.cnblogs.com/hapday/p/6292957.html),使用`FileSaver.js`这个脚本来实现js下载文件到本地 ### FileSaver.js 该脚本代码如下: ``` /* FileSaver.js * A saveAs() FileSaver implementation. * 1.3.2 * 2016-06-16 18:25:19 * * By Eli Grey, http://eligrey.com * License: MIT * See https://github.com/eligrey/FileSaver.js/blob/master/LICENSE.md */ /*global self */ /*jslint bitwise: true, indent: 4, laxbreak: true, laxcomma: true, smarttabs: true, plusplus: true */ /*! @source http://purl.eligrey.com/github/FileSaver.js/blob/master/FileSaver.js */ var saveAs = saveAs || (function(view) { "use strict"; // IE <10 is explicitly unsupported if (typeof view === "undefined" || typeof navigator !== "undefined" && /MSIE [1-9]\./.test(navigator.userAgent)) { return; } var doc = view.document // only get URL when necessary in case Blob.js hasn't overridden it yet , get_URL = function() { return view.URL || view.webkitURL || view; } , save_link = doc.createElementNS("http://www.w3.org/1999/xhtml", "a") , can_use_save_link = "download" in save_link , click = function(node) { var event = new MouseEvent("click"); node.dispatchEvent(event); } , is_safari = /constructor/i.test(view.HTMLElement) || view.safari , is_chrome_ios =/CriOS\/[\d]+/.test(navigator.userAgent) , throw_outside = function(ex) { (view.setImmediate || view.setTimeout)(function() { throw ex; }, 0); } , force_saveable_type = "application/octet-stream" // the Blob API is fundamentally broken as there is no "downloadfinished" event to subscribe to , arbitrary_revoke_timeout = 1000 * 40 // in ms , revoke = function(file) { var revoker = function() { if (typeof file === "string") { // file is an object URL get_URL().revokeObjectURL(file); } else { // file is a File file.remove(); } }; setTimeout(revoker, arbitrary_revoke_timeout); } , dispatch = function(filesaver, event_types, event) { event_types = [].concat(event_types); var i = event_types.length; while (i--) { var listener = filesaver["on" + event_types[i]]; if (typeof listener === "function") { try { listener.call(filesaver, event || filesaver); } catch (ex) { throw_outside(ex); } } } } , auto_bom = function(blob) { // prepend BOM for UTF-8 XML and text/* types (including HTML) // note: your browser will automatically convert UTF-16 U+FEFF to EF BB BF if (/^\s*(?:text\/\S*|application\/xml|\S*\/\S*\+xml)\s*;.*charset\s*=\s*utf-8/i.test(blob.type)) { return new Blob([String.fromCharCode(0xFEFF), blob], {type: blob.type}); } return blob; } , FileSaver = function(blob, name, no_auto_bom) { if (!no_auto_bom) { blob = auto_bom(blob); } // First try a.download, then web filesystem, then object URLs var filesaver = this , type = blob.type , force = type === force_saveable_type , object_url , dispatch_all = function() { dispatch(filesaver, "writestart progress write writeend".split(" ")); } // on any filesys errors revert to saving with object URLs , fs_error = function() { if ((is_chrome_ios || (force && is_safari)) && view.FileReader) { // Safari doesn't allow downloading of blob urls var reader = new FileReader(); reader.onloadend = function() { var url = is_chrome_ios ? reader.result : reader.result.replace(/^data:[^;]*;/, 'data:attachment/file;'); var popup = view.open(url, '_blank'); if(!popup) view.location.href = url; url=undefined; // release reference before dispatching filesaver.readyState = filesaver.DONE; dispatch_all(); }; reader.readAsDataURL(blob); filesaver.readyState = filesaver.INIT; return; } // don't create more object URLs than needed if (!object_url) { object_url = get_URL().createObjectURL(blob); } if (force) { view.location.href = object_url; } else { var opened = view.open(object_url, "_blank"); if (!opened) { // Apple does not allow window.open, see https://developer.apple.com/library/safari/documentation/Tools/Conceptual/SafariExtensionGuide/WorkingwithWindowsandTabs/WorkingwithWindowsandTabs.html view.location.href = object_url; } } filesaver.readyState = filesaver.DONE; dispatch_all(); revoke(object_url); } ; filesaver.readyState = filesaver.INIT; if (can_use_save_link) { object_url = get_URL().createObjectURL(blob); setTimeout(function() { save_link.href = object_url; save_link.download = name; click(save_link); dispatch_all(); revoke(object_url); filesaver.readyState = filesaver.DONE; }); return; } fs_error(); } , FS_proto = FileSaver.prototype , saveAs = function(blob, name, no_auto_bom) { return new FileSaver(blob, name || blob.name || "download", no_auto_bom); } ; // IE 10+ (native saveAs) if (typeof navigator !== "undefined" && navigator.msSaveOrOpenBlob) { return function(blob, name, no_auto_bom) { name = name || blob.name || "download"; if (!no_auto_bom) { blob = auto_bom(blob); } return navigator.msSaveOrOpenBlob(blob, name); }; } FS_proto.abort = function(){}; FS_proto.readyState = FS_proto.INIT = 0; FS_proto.WRITING = 1; FS_proto.DONE = 2; FS_proto.error = FS_proto.onwritestart = FS_proto.onprogress = FS_proto.onwrite = FS_proto.onabort = FS_proto.onerror = FS_proto.onwriteend = null; return saveAs; }( typeof self !== "undefined" && self || typeof window !== "undefined" && window || this.content )); // `self` is undefined in Firefox for Android content script context // while `this` is nsIContentFrameMessageManager // with an attribute `content` that corresponds to the window if (typeof module !== "undefined" && module.exports) { module.exports.saveAs = saveAs; } else if ((typeof define !== "undefined" && define !== null) && (define.amd !== null)) { define("FileSaver.js", function() { return saveAs; }); } ``` ### 下载共享漏洞 首先,封装函数以调用`FileSaver.js`: ``` var downloadTextFile = function(mobileCode,a) { if(!mobileCode) { mobileCode = ''; } var file = new File([mobileCode], a+".txt", { type: "text/plain;charset=utf-8" }); saveAs(file); } ``` 然后,因为该页面使用了`jQuery`,因此可以直接使用封装好的`ajax`请求资源链接,书写代码循环遍历漏洞库: ``` var a = 242; var timer = setInterval(function(){ a = a+1; if(a>733){clearInterval(timer)} $.ajax({method:'GET',url:'/shareData/download/'+a,success:function(res){ downloadTextFile(res,a)}} )}, 2000) ``` a为资源链接后面的数字,经过观察,从242开始,到733结束,结束的数字根据最新的漏洞xml链接而定,鼠标放在链接上,页面左下角就会显示链接: ![查看最新的资源链接](https://upload-images.jianshu.io/upload_images/5714082-5e4fe18ae54780bc.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240) 末尾的`2000`表示每隔2秒发送一次请求 ## 运行代码 1. 打开CNVD漏洞库的页面 2. 鼠标右键单击检查 3. 点击`console`控制台 4. 复制上述代码(三段代码合并在一起即可),也可以直接到GitHub下载[完整代码](https://github.com/you8023/CNVDSpider)复制(其中spider.js为完整js代码,filter为后续过滤结果的代码,欢迎留言点赞打赏提issue点star),粘贴到控制台中,按下回车,代码开始运行 5. 静等下载完毕即可,下载的文件存放在浏览器设定的下载路径里 ![运行代码步骤示意图](https://upload-images.jianshu.io/upload_images/5714082-55ac940656d06994.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240) ## 过滤结果 下载完成后,发现有一些资源为空,大小仅有1kb: ![初始结果](https://upload-images.jianshu.io/upload_images/5714082-1eeed168a52ebda1.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240) 因此,书写python将这些结果过滤掉: ``` import os def file_path(path): for (root, dirs, files) in os.walk(path): for file in files: del_small_file(root + '/' + file) def del_small_file(file_name): size = os.path.getsize(file_name) file_size = 2 * 1024 if size < file_size: os.remove(file_name) if __name__ == '__main__': path = r'./CNVD' file_path(path) ``` 其中,path为存放文件的地址 ## 完成结果 至此,CNVD漏洞库爬取完成,耗时大概10分钟,经过过滤,共成功抓取文件311个: ![爬取结果](https://upload-images.jianshu.io/upload_images/5714082-7cc06026fe20ea07.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240) 和网页上的原数据对比: ![CNVD共享数据页面](https://upload-images.jianshu.io/upload_images/5714082-9a9a6755e4633c31.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240) 数目吻合,表明我们已经爬取了该页面的所有共享数据