regex_automata::nfa::thompson

Struct Compiler

Source
pub struct Compiler { /* private fields */ }
Expand description

A builder for compiling an NFA from a regex’s high-level intermediate representation (HIR).

This compiler provides a way to translate a parsed regex pattern into an NFA state graph. The NFA state graph can either be used directly to execute a search (e.g., with a Pike VM), or it can be further used to build a DFA.

This compiler provides APIs both for compiling regex patterns directly from their concrete syntax, or via a regex_syntax::hir::Hir.

This compiler has various options that may be configured via thompson::Config.

Note that a compiler is not the same as a thompson::Builder. A Builder provides a lower level API that is uncoupled from a regex pattern’s concrete syntax or even its HIR. Instead, it permits stitching together an NFA by hand. See its docs for examples.

§Example: compilation from concrete syntax

This shows how to compile an NFA from a pattern string while setting a size limit on how big the NFA is allowed to be (in terms of bytes of heap used).

use regex_automata::{
    nfa::thompson::{NFA, pikevm::PikeVM},
    Match,
};

let config = NFA::config().nfa_size_limit(Some(1_000));
let nfa = NFA::compiler().configure(config).build(r"(?-u)\w")?;

let re = PikeVM::new_from_nfa(nfa)?;
let mut cache = re.create_cache();
let mut caps = re.create_captures();
let expected = Some(Match::must(0, 3..4));
re.captures(&mut cache, "!@#A#@!", &mut caps);
assert_eq!(expected, caps.get_match());

§Example: compilation from HIR

This shows how to hand assemble a regular expression via its HIR, and then compile an NFA directly from it.

use regex_automata::{nfa::thompson::{NFA, pikevm::PikeVM}, Match};
use regex_syntax::hir::{Hir, Class, ClassBytes, ClassBytesRange};

let hir = Hir::class(Class::Bytes(ClassBytes::new(vec![
    ClassBytesRange::new(b'0', b'9'),
    ClassBytesRange::new(b'A', b'Z'),
    ClassBytesRange::new(b'_', b'_'),
    ClassBytesRange::new(b'a', b'z'),
])));

let config = NFA::config().nfa_size_limit(Some(1_000));
let nfa = NFA::compiler().configure(config).build_from_hir(&hir)?;

let re = PikeVM::new_from_nfa(nfa)?;
let mut cache = re.create_cache();
let mut caps = re.create_captures();
let expected = Some(Match::must(0, 3..4));
re.captures(&mut cache, "!@#A#@!", &mut caps);
assert_eq!(expected, caps.get_match());

Implementations§

Source§

impl Compiler

Source

pub fn new() -> Compiler

Create a new NFA builder with its default configuration.

Source

pub fn build(&self, pattern: &str) -> Result<NFA, BuildError>

Compile the given regular expression pattern into an NFA.

If there was a problem parsing the regex, then that error is returned.

Otherwise, if there was a problem building the NFA, then an error is returned. The only error that can occur is if the compiled regex would exceed the size limits configured on this builder, or if any part of the NFA would exceed the integer representations used. (For example, too many states might plausibly occur on a 16-bit target.)

§Example
use regex_automata::{nfa::thompson::{NFA, pikevm::PikeVM}, Match};

let config = NFA::config().nfa_size_limit(Some(1_000));
let nfa = NFA::compiler().configure(config).build(r"(?-u)\w")?;

let re = PikeVM::new_from_nfa(nfa)?;
let mut cache = re.create_cache();
let mut caps = re.create_captures();
let expected = Some(Match::must(0, 3..4));
re.captures(&mut cache, "!@#A#@!", &mut caps);
assert_eq!(expected, caps.get_match());
Source

pub fn build_many<P: AsRef<str>>( &self, patterns: &[P], ) -> Result<NFA, BuildError>

Compile the given regular expression patterns into a single NFA.

When matches are returned, the pattern ID corresponds to the index of the pattern in the slice given.

§Example
use regex_automata::{nfa::thompson::{NFA, pikevm::PikeVM}, Match};

let config = NFA::config().nfa_size_limit(Some(1_000));
let nfa = NFA::compiler().configure(config).build_many(&[
    r"(?-u)\s",
    r"(?-u)\w",
])?;

let re = PikeVM::new_from_nfa(nfa)?;
let mut cache = re.create_cache();
let mut caps = re.create_captures();
let expected = Some(Match::must(1, 1..2));
re.captures(&mut cache, "!A! !A!", &mut caps);
assert_eq!(expected, caps.get_match());
Source

pub fn build_from_hir(&self, expr: &Hir) -> Result<NFA, BuildError>

Compile the given high level intermediate representation of a regular expression into an NFA.

If there was a problem building the NFA, then an error is returned. The only error that can occur is if the compiled regex would exceed the size limits configured on this builder, or if any part of the NFA would exceed the integer representations used. (For example, too many states might plausibly occur on a 16-bit target.)

§Example
use regex_automata::{nfa::thompson::{NFA, pikevm::PikeVM}, Match};
use regex_syntax::hir::{Hir, Class, ClassBytes, ClassBytesRange};

let hir = Hir::class(Class::Bytes(ClassBytes::new(vec![
    ClassBytesRange::new(b'0', b'9'),
    ClassBytesRange::new(b'A', b'Z'),
    ClassBytesRange::new(b'_', b'_'),
    ClassBytesRange::new(b'a', b'z'),
])));

let config = NFA::config().nfa_size_limit(Some(1_000));
let nfa = NFA::compiler().configure(config).build_from_hir(&hir)?;

let re = PikeVM::new_from_nfa(nfa)?;
let mut cache = re.create_cache();
let mut caps = re.create_captures();
let expected = Some(Match::must(0, 3..4));
re.captures(&mut cache, "!@#A#@!", &mut caps);
assert_eq!(expected, caps.get_match());
Source

pub fn build_many_from_hir<H: Borrow<Hir>>( &self, exprs: &[H], ) -> Result<NFA, BuildError>

Compile the given high level intermediate representations of regular expressions into a single NFA.

When matches are returned, the pattern ID corresponds to the index of the pattern in the slice given.

§Example
use regex_automata::{nfa::thompson::{NFA, pikevm::PikeVM}, Match};
use regex_syntax::hir::{Hir, Class, ClassBytes, ClassBytesRange};

let hirs = &[
    Hir::class(Class::Bytes(ClassBytes::new(vec![
        ClassBytesRange::new(b'\t', b'\r'),
        ClassBytesRange::new(b' ', b' '),
    ]))),
    Hir::class(Class::Bytes(ClassBytes::new(vec![
        ClassBytesRange::new(b'0', b'9'),
        ClassBytesRange::new(b'A', b'Z'),
        ClassBytesRange::new(b'_', b'_'),
        ClassBytesRange::new(b'a', b'z'),
    ]))),
];

let config = NFA::config().nfa_size_limit(Some(1_000));
let nfa = NFA::compiler().configure(config).build_many_from_hir(hirs)?;

let re = PikeVM::new_from_nfa(nfa)?;
let mut cache = re.create_cache();
let mut caps = re.create_captures();
let expected = Some(Match::must(1, 1..2));
re.captures(&mut cache, "!A! !A!", &mut caps);
assert_eq!(expected, caps.get_match());
Source

pub fn configure(&mut self, config: Config) -> &mut Compiler

Apply the given NFA configuration options to this builder.

§Example
use regex_automata::nfa::thompson::NFA;

let config = NFA::config().nfa_size_limit(Some(1_000));
let nfa = NFA::compiler().configure(config).build(r"(?-u)\w")?;
assert_eq!(nfa.pattern_len(), 1);
Source

pub fn syntax(&mut self, config: Config) -> &mut Compiler

Set the syntax configuration for this builder using syntax::Config.

This permits setting things like case insensitivity, Unicode and multi line mode.

This syntax configuration only applies when an NFA is built directly from a pattern string. If an NFA is built from an HIR, then all syntax settings are ignored.

§Example
use regex_automata::{nfa::thompson::NFA, util::syntax};

let syntax_config = syntax::Config::new().unicode(false);
let nfa = NFA::compiler().syntax(syntax_config).build(r"\w")?;
// If Unicode were enabled, the number of states would be much bigger.
assert!(nfa.states().len() < 15);

Trait Implementations§

Source§

impl Clone for Compiler

Source§

fn clone(&self) -> Compiler

Returns a copy of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for Compiler

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dst: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dst. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.