# WebCrawler-2 **Repository Path**: lanicon/WebCrawler-2 ## Basic Information - **Project Name**: WebCrawler-2 - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-03-17 - **Last Updated**: 2021-03-17 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # WebCrawler WebCrawler allows to extract all accessible URLs from a website. It's built using [`.NET Core`](https://www.microsoft.com/net/core) and `.NET Standard 1.4`, so you can host it anywhere (Windows, Linux, Mac). The crawler does not use regex to find links. Instead, Web pages are parsed using [AngleSharp](https://github.com/AngleSharp/AngleSharp), a parser which is built upon the official W3C specification. This allows to parse pages as a browser and handle tricky tags such as `base`. For HTML files, URLs are extracted from: - `` - `` - `