1 Star 0 Fork 10

lubinglun/third_party_rust_regex

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
文件
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README

regex-syntax

This crate provides a robust regular expression parser.

Build status Crates.io Rust

Documentation

https://docs.rs/regex-syntax

Overview

There are two primary types exported by this crate: Ast and Hir. The former is a faithful abstract syntax of a regular expression, and can convert regular expressions back to their concrete syntax while mostly preserving its original form. The latter type is a high level intermediate representation of a regular expression that is amenable to analysis and compilation into byte codes or automata. An Hir achieves this by drastically simplifying the syntactic structure of the regular expression. While an Hir can be converted back to its equivalent concrete syntax, the result is unlikely to resemble the original concrete syntax that produced the Hir.

Example

This example shows how to parse a pattern string into its HIR:

use regex_syntax::Parser;
use regex_syntax::hir::{self, Hir};

let hir = Parser::new().parse("a|b").unwrap();
assert_eq!(hir, Hir::alternation(vec![
    Hir::literal(hir::Literal::Unicode('a')),
    Hir::literal(hir::Literal::Unicode('b')),
]));

Safety

This crate has no unsafe code and sets forbid(unsafe_code). While it's possible this crate could use unsafe code in the future, the standard for doing so is extremely high. In general, most code in this crate is not performance critical, since it tends to be dwarfed by the time it takes to compile a regular expression into an automaton. Therefore, there is little need for extreme optimization, and therefore, use of unsafe.

The standard for using unsafe in this crate is extremely high because this crate is intended to be reasonably safe to use with user supplied regular expressions. Therefore, while there may be bugs in the regex parser itself, they should never result in memory unsafety unless there is either a bug in the compiler or the standard library. (Since regex-syntax has zero dependencies.)

Crate features

By default, this crate bundles a fairly large amount of Unicode data tables (a source size of ~750KB). Because of their large size, one can disable some or all of these data tables. If a regular expression attempts to use Unicode data that is not available, then an error will occur when translating the Ast to the Hir.

The full set of features one can disable are in the "Crate features" section of the documentation.

Testing

Simply running cargo test will give you very good coverage. However, because of the large number of features exposed by this crate, a test script is included in this directory which will test several feature combinations. This is the same script that is run in CI.

Motivation

The primary purpose of this crate is to provide the parser used by regex. Specifically, this crate is treated as an implementation detail of the regex, and is primarily developed for the needs of regex.

Since this crate is an implementation detail of regex, it may experience breaking change releases at a different cadence from regex. This is only possible because this crate is not a public dependency of regex.

Another consequence of this de-coupling is that there is no direct way to compile a regex::Regex from a regex_syntax::hir::Hir. Instead, one must first convert the Hir to a string (via its std::fmt::Display) and then compile that via Regex::new. While this does repeat some work, compilation typically takes much longer than parsing.

Stated differently, the coupling between regex and regex-syntax exists only at the level of the concrete syntax.

马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/belanLu/third_party_rust_regex.git
git@gitee.com:belanLu/third_party_rust_regex.git
belanLu
third_party_rust_regex
third_party_rust_regex
master

搜索帮助