# HTMLProofer **Repository Path**: mirrors/HTMLProofer ## Basic Information - **Project Name**: HTMLProofer - **Description**: HTMLProofer 是一组用于验证 HTML 输出的测试,可用于检查图像引用是否合法、是否有 alt 标签、内部链接是否有效等问题 - **Primary Language**: Ruby - **License**: MIT - **Default Branch**: main - **Homepage**: https://www.oschina.net/p/htmlproofer - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 2 - **Created**: 2021-11-05 - **Last Updated**: 2026-01-31 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # HTMLProofer If you generate HTML files, _then this tool might be for you_! ## Project scope HTMLProofer is a set of tests to validate your HTML output. These tests check if your image references are legitimate, if they have alt tags, if your internal links are working, and so on. It's intended to be an all-in-one checker for your output. In scope for this project is any well-known and widely-used test for HTML document quality. A major use for this project is continuous integration -- so we must have reliable results. We usually balance correctness over performance. And, if necessary, we should be able to trace this program's detection of HTML errors back to documented best practices or standards, such as W3 specifications. **Third-party modules.** We want this product to be useful for continuous integration so we prefer to avoid subjective tests which are prone to false positive results, such as spell checkers, indentation checkers, etc. If you want to work on these items, please see [the section on custom tests](#custom-tests) and consider adding an implementation as a third-party module. **Advanced configuration.** Most front-end developers can test their HTML using [our command line program](#using-on-the-command-line). Advanced configuration will require using Ruby. ## Installation Add this line to your application's Gemfile: gem 'html-proofer' And then execute: $ bundle install Or install it yourself as: $ gem install html-proofer **NOTE:** When installation speed matters, set `NOKOGIRI_USE_SYSTEM_LIBRARIES` to `true` in your environment. This is useful for increasing the speed of your Continuous Integration builds. ## What's tested? Below is a mostly comprehensive list of checks that HTMLProofer can perform. ### Images `img` elements: - Whether all your images have alt tags - Whether your internal image references are not broken - Whether external images are showing - Whether your images are HTTP ### Links `a`, `link` elements: - Whether your internal links are working - Whether your internal hash references (`#linkToMe`) are working - Whether external links are working - Whether your links are HTTPS - Whether CORS/SRI is enabled ### Scripts `script` elements: - Whether your internal script references are working - Whether external scripts are loading - Whether CORS/SRI is enabled ### Favicon - Whether your favicons are valid. ### OpenGraph - Whether the images and URLs in the OpenGraph metadata are valid. ## Usage You can configure HTMLProofer to run on: - a file - a directory - an array of directories - an array of links It can also run through the command-line. ### Checking a single file If you simply want to check a single file, use the `check_file` method: ```ruby HTMLProofer.check_file("/path/to/a/file.html").run ``` ### Checking directories If you want to check a directory, use `check_directory`: ```ruby HTMLProofer.check_directory("./out").run ``` If you want to check multiple directories, use `check_directories`: ```ruby HTMLProofer.check_directories(["./one", "./two"]).run ``` ### Checking an array of links With `check_links`, you can also pass in an array of links: ```ruby HTMLProofer.check_links(["https://github.com", "https://jekyllrb.com"]).run ``` ### Swapping information Sometimes, the information in your HTML is not the same as how your server serves content. In these cases, you can use `swap_urls` to map the URL in a file to the URL you'd like it to become. For example: ```ruby run_proofer(file, :file, swap_urls: { %r{^https//placeholder.com} => "https://website.com" }) ``` In this case, any link that matches the `^https://placeholder.com` will be converted to `https://website.com`. A similar swapping process can be done for attributes: ```ruby run_proofer(file, :file, swap_attributes: { "img" => [["data-src", "src"]] }) ``` In this case, we are telling HTMLProofer that, for any `img` tag detected, for any `src` attribute, pretend it's actually the `src` attribute instead. Since the value is an array of arrays, you can pass in as many attribute swaps as you need for each element. ### Using on the command-line You'll also get a new program called `htmlproofer` with this gem. Terrific! Pass in options through the command-line as flags, like this: ```bash htmlproofer --extensions .html.erb ./out ``` Use `htmlproofer --help` to see all command line options. #### Special cases for the command-line For options which require an array of input, surround the value with quotes, and don't use any spaces. For example, to exclude an array of HTTP status code, you might do: ```bash htmlproofer --ignore-status-codes "999,401,404" ./out ``` For something like `url-ignore`, and other options that require an array of regular expressions, you can pass in a syntax like this: ```bash htmlproofer --ignore-urls "/www.github.com/,/foo.com/" ./out ``` Since `swap_urls` is a bit special, you'll pass in a pair of `RegEx:String` values. The escape sequences `\:` should be used to produce literal `:`s `htmlproofer` will figure out what you mean. ```bash htmlproofer --swap-urls "wow:cow,mow:doh" --extensions .html.erb --ignore-urls www.github.com ./out ``` Some configuration options, such as `--typheous`, `--cache`, or `--swap-attributes`, require well-formatted JSON. #### Adjusting for a `baseurl` If your Jekyll site has a `baseurl` configured, you'll need to adjust the generated url validation to cope with that. The easiest way is using the `swap_urls` option. For a `site.baseurl` value of `/BASEURL`, here's what that looks like on the command line: ```bash htmlproofer --assume-extension ./_site --swap-urls '^/BASEURL/:/' ``` or in your `Rakefile` ```ruby require "html-proofer" task :test do sh "bundle exec jekyll build" options = { swap_urls: "^/BASEURL/:/" } HTMLProofer.check_directory("./_site", options).run end ``` ### Using through Docker If you have trouble with (or don't want to) install Ruby/Nokogumbo, the command-line tool can be run through Docker. See [klakegg/html-proofer](https://hub.docker.com/r/klakegg/html-proofer) for more information. ## Ignoring content Add the `data-proofer-ignore` attribute to any tag to ignore it from every check. ```html Not checked. ``` This can also apply to parent elements, all the way up to the `` tag: ```html
``` ## Ignoring new files Say you've got some new files in a pull request, and your tests are failing because links to those files are not live yet. One thing you can do is run a diff against your base branch and explicitly ignore the new files, like this: ```ruby directories = ['content'] merge_base = %x(git merge-base origin/production HEAD).chomp diffable_files = %x(git diff -z --name-only --diff-filter=AC #{merge_base}).split("\0") diffable_files = diffable_files.select do |filename| next true if directories.include?(File.dirname(filename)) filename.end_with?(".md") end.map { |f| Regexp.new(File.basename(f, File.extname(f))) } HTMLProofer.check_directory("./output", { ignore_urls: diffable_files }).run ``` ## Configuration The `HTMLProofer` constructor takes an optional hash of additional options: | Option | Description | Default | | :---------------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------- | :------------------------------- | | `allow_hash_href` | If `true`, assumes `href="#"` anchors are valid | `true` | | `allow_missing_href` | If `true`, does not flag `a` tags missing `href`. In HTML5, this is technically allowed, but could also be human error. | `false` | | `assume_extension` | Automatically add specified extension to files for internal links, to allow extensionless URLs (as supported by most servers) | `.html` | | `checks` | An array of Strings indicating which checks you want to run | `['Links', 'Images', 'Scripts']` | | `check_external_hash` | Checks whether external hashes exist (even if the webpage exists) | `true` | | `check_internal_hash` | Checks whether internal hashes exist (even if the webpage exists) | `true` | | `check_sri` | Check that `` and `