# htmlunit-neko **Repository Path**: mirrors_HtmlUnit/htmlunit-neko ## Basic Information - **Project Name**: htmlunit-neko - **Description**: HtmlUnit adaptation of NekoHtml - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-08-08 - **Last Updated**: 2026-06-20 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Htmlunit-NekoHtml Parser [![Maven Central Version](https://img.shields.io/maven-central/v/org.htmlunit/neko-htmlunit)](https://central.sonatype.com/artifact/org.htmlunit/neko-htmlunit) [![Build Status](https://jenkins.wetator.org/buildStatus/icon?job=HtmlUnit+-+Neko)](https://jenkins.wetator.org/view/HtmlUnit/job/HtmlUnit%20-%20Neko/) [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![Java Version](https://img.shields.io/badge/Java-8%2B-orange.svg)](https://www.oracle.com/java/) The **Htmlunit-NekoHtml** Parser is a HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces. The parser can scan HTML files and "fix up" many common mistakes that human (and computer) authors make in writing HTML documents. NekoHTML adds missing parent elements; automatically closes elements with optional end tags; and can handle mismatched inline element tags. ## Key Features ✅ **Error Tolerant** - Handles malformed HTML gracefully ✅ **Standards Compliant** - Follows HTML parsing specifications ✅ **Well Tested** - Over 8,000 test cases ✅ **No External Dependencies** - Pure Java implementation ✅ **Java 17 Compatible** - Works with Java 17, 21 and beyond ✅ **Android Support** - Runs on Android platforms The **Htmlunit-NekoHtml** Parser is used by [HtmlUnit](https://htmlunit.sourceforge.io/). :heart: [Sponsor this project](https://github.com/sponsors/rbri) ### Project News **[Developer Blog](https://htmlunit.github.io/htmlunit-blog/)** [HtmlUnit@mastodon](https://fosstodon.org/@HtmlUnit) | [HtmlUnit@bsky](https://bsky.app/profile/htmlunit.bsky.social) | [HtmlUnit@Twitter](https://twitter.com/HtmlUnit) #### Version 5 Starting with version 5.0.0, **JDK 17 or higher is required**. If you are still on JDK 8, see [Legacy Support (JDK 8)](#legacy-support-jdk-8) below. ### Latest release Version 5.1.0 / May 31, 2026 ##### Security Advisories [CVE-2022-29546](https://nvd.nist.gov/vuln/detail/CVE-2022-29546): Fixed in versions 2.61.0+ Htmlunit-NekoHtml Parser suffers from a denial of service vulnerability on versions 2.60.0 and below. A specifically crafted input regarding the parsing of processing instructions leads to heap memory consumption. [CVE-2022-28366](https://nvd.nist.gov/vuln/detail/CVE-2022-28366): Fixed in versions 2.27+ Htmlunit-NekoHtml Parser suffers from a denial of service via crafted Processing Instruction vulnerability on versions 2.26 and below. ## Get it! ### Maven Add to your `pom.xml`: ```xml org.htmlunit neko-htmlunit 5.1.0 ``` ### Gradle Add to your `build.gradle`: ```groovy implementation group: 'org.htmlunit', name: 'neko-htmlunit', version: '5.1.0' ``` ## HowTo use ### DOMParser The DOMParser can be used together with the simple build in DOM implementation or with your own. final String html = " \n" + "\n" + "\n" + "

NekoHtml

\n" + "\n" + ""; final StringReader sr = new StringReader(html); final XMLInputSource in = new XMLInputSource(null, "foo", null, sr, null); // use the provided simple DocumentImpl final DOMParser parser = new DOMParser(HTMLDocumentImpl.class); parser.parse(in); HTMLDocumentImpl doc = (HTMLDocumentImpl) parser.getDocument(); NodeList headings = doc.getElementsByTagName("h1"); ### SAXParser Using the SAXParser is straigtforward - simple provide your own org.xml.sax.ContentHandler implementation. final String html = " \n" + "\n" + "\n" + "

NekoHtml

\n" + "\n" + ""; final StringReader sr = new StringReader(html); final XMLInputSource in = new XMLInputSource(null, "foo", null, sr, null); final SAXParser parser = new SAXParser(); ContentHandler myContentHandler = new MyContentHandler(); parser.setContentHandler(myContentHandler); parser.parse(in); ### Features The behavior of the scanner/parser can be influenced via a series of feature switches that control how the parser handles various HTML constructs and edge cases. ```java parser.setFeature(HTMLScanner.PLAIN_ATTRIBUTE_VALUES, true); ``` #### General Processing Features | Feature | Default | Description | |---------|---------|-------------| | **AUGMENTATIONS** | `false` | Include infoset augmentations in the parsing output. When enabled, provides additional metadata about the parsed elements including location information (line numbers, column numbers, character offsets). | | **REPORT_ERRORS** | `false` | Enable detailed error reporting during parsing. When enabled, the parser will report syntax errors, malformed markup, and other parsing issues through the configured error reporter. | #### Script and Style Processing | Feature | Default | Description | |---------|---------|-------------| | **SCRIPT_STRIP_COMMENT_DELIMS** | `false` | Automatically strip HTML comment delimiters (``) from `