regex_automata

Enum MatchKind

Source
#[non_exhaustive]
pub enum MatchKind { All, LeftmostFirst, }
Expand description

The kind of match semantics to use for a regex pattern.

The default match kind is LeftmostFirst, and this corresponds to the match semantics used by most backtracking engines, such as Perl.

§Leftmost first or “preference order” match semantics

Leftmost-first semantics determine which match to report when there are multiple paths through a regex that match at the same position. The tie is essentially broken by how a backtracker would behave. For example, consider running the regex foofoofoo|foofoo|foo on the haystack foofoo. In this case, both the foofoo and foo branches match at position 0. So should the end of the match be 3 or 6?

A backtracker will conceptually work by trying foofoofoo and failing. Then it will try foofoo, find the match and stop there. Thus, the leftmost-first match position is 6. This is called “leftmost-first” or “preference order” because the order of the branches as written in the regex pattern is what determines how to break the tie.

(Note that leftmost-longest match semantics, which break ties by always taking the longest matching string, are not currently supported by this crate. These match semantics tend to be found in POSIX regex engines.)

This example shows how leftmost-first semantics work, and how it even applies to multi-pattern regexes:

use regex_automata::{
    nfa::thompson::pikevm::PikeVM,
    Match,
};

let re = PikeVM::new_many(&[
    r"foofoofoo",
    r"foofoo",
    r"foo",
])?;
let mut cache = re.create_cache();
let got: Vec<Match> = re.find_iter(&mut cache, "foofoo").collect();
let expected = vec![Match::must(1, 0..6)];
assert_eq!(expected, got);

§All matches

The All match semantics report any and all matches, and generally will attempt to match as much as possible. It doesn’t respect any sort of match priority at all, so things like non-greedy matching don’t work in this mode.

The fact that non-greedy matching doesn’t work generally makes most forms of unanchored non-overlapping searches have unintuitive behavior. Namely, unanchored searches behave as if there is a (?s-u:.)*? prefix at the beginning of the pattern, which is specifically non-greedy. Since it will be treated as greedy in All match semantics, this generally means that it will first attempt to consume all of the haystack and is likely to wind up skipping matches.

Generally speaking, All should only be used in two circumstances:

  • When running an anchored search and there is a desire to match as much as possible. For example, when building a reverse regex matcher to find the start of a match after finding the end. In this case, the reverse search is anchored to the end of the match found by the forward search.
  • When running overlapping searches. Since All encodes all possible matches, this is generally what you want for an overlapping search. If you try to use leftmost-first in an overlapping search, it is likely to produce counter-intuitive results since leftmost-first specifically excludes some matches from its underlying finite state machine.

This example demonstrates the counter-intuitive behavior of All semantics when using a standard leftmost unanchored search:

use regex_automata::{
    nfa::thompson::pikevm::PikeVM,
    Match, MatchKind,
};

let re = PikeVM::builder()
    .configure(PikeVM::config().match_kind(MatchKind::All))
    .build("foo")?;
let hay = "first foo second foo wat";
let mut cache = re.create_cache();
let got: Vec<Match> = re.find_iter(&mut cache, hay).collect();
// Notice that it completely skips the first 'foo'!
let expected = vec![Match::must(0, 17..20)];
assert_eq!(expected, got);

This second example shows how All semantics are useful for an overlapping search. Note that we use lower level lazy DFA APIs here since the NFA engines only currently support a very limited form of overlapping search.

use regex_automata::{
    hybrid::dfa::{DFA, OverlappingState},
    HalfMatch, Input, MatchKind,
};

let re = DFA::builder()
    // If we didn't set 'All' semantics here, then the regex would only
    // match 'foo' at offset 3 and nothing else. Why? Because the state
    // machine implements preference order and knows that the 'foofoo' and
    // 'foofoofoo' branches can never match since 'foo' will always match
    // when they match and take priority.
    .configure(DFA::config().match_kind(MatchKind::All))
    .build(r"foo|foofoo|foofoofoo")?;
let mut cache = re.create_cache();
let mut state = OverlappingState::start();
let input = Input::new("foofoofoo");
let mut got = vec![];
loop {
    re.try_search_overlapping_fwd(&mut cache, &input, &mut state)?;
    let m = match state.get_match() {
        None => break,
        Some(m) => m,
    };
    got.push(m);
}
let expected = vec![
    HalfMatch::must(0, 3),
    HalfMatch::must(0, 6),
    HalfMatch::must(0, 9),
];
assert_eq!(expected, got);

Variants (Non-exhaustive)§

This enum is marked as non-exhaustive
Non-exhaustive enums could have additional variants added in future. Therefore, when matching against variants of non-exhaustive enums, an extra wildcard arm must be added to account for any future variants.
§

All

Report all possible matches.

§

LeftmostFirst

Report only the leftmost matches. When multiple leftmost matches exist, report the match corresponding to the part of the regex that appears first in the syntax.

Trait Implementations§

Source§

impl Clone for MatchKind

Source§

fn clone(&self) -> MatchKind

Returns a copy of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for MatchKind

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for MatchKind

Source§

fn default() -> MatchKind

Returns the “default value” for a type. Read more
Source§

impl PartialEq for MatchKind

Source§

fn eq(&self, other: &MatchKind) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl Copy for MatchKind

Source§

impl Eq for MatchKind

Source§

impl StructuralPartialEq for MatchKind

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dst: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dst. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.