parser - Text Parsing Utilities

Parsing utility module.

This module provides TextParser which can be used to model any dataclass to a block of text.

class ParserFn

Bases: TypedDict

Parser function in a dict compatible with the dataclasses.field() metadata param.

TextParser_fn: Callable[[str], Any]
class TextParser

Bases: ABC

Helper abstract dataclass that parses a text according to the fields’ rules.

In order to enable text parsing in a dataclass, subclass it with TextParser.

The provided parse method is a factory which parses the supplied text and creates an instance with populated dataclass fields. This takes text as an argument and for each field in the dataclass, the field’s parser function is run against the whole text. The returned value is then assigned to the field of the new instance. If the field does not have a parser function its default value or factory is used instead. If no default is available either, an exception is raised.

This class provides a selection of parser functions and a function to wrap parser functions with generic functions. Parser functions are designed to be passed to the fields’ metadata param. The most commonly used parser function is expected to be the find method, which runs a regular expression against the text to find matches.

Example

The following example makes use of and demonstrates every parser function available:

from dataclasses import dataclass, field
from enum import Enum
from framework.parser import TextParser

class Colour(Enum):
    BLACK = 1
    WHITE = 2

    @classmethod
    def from_str(cls, text: str):
        match text:
            case "black":
                return cls.BLACK
            case "white":
                return cls.WHITE
            case _:
                return None # unsupported colour

    @classmethod
    def make_parser(cls):
        # make a parser function that finds a match and
        # then makes it a Colour object through Colour.from_str
        return TextParser.wrap(TextParser.find(r"is a (\w+)"), cls.from_str)

@dataclass
class Animal(TextParser):
    kind: str = field(metadata=TextParser.find(r"is a \w+ (\w+)"))
    name: str = field(metadata=TextParser.find(r"^(\w+)"))
    colour: Colour = field(metadata=Colour.make_parser())
    age: int = field(metadata=TextParser.find_int(r"aged (\d+)"))

steph = Animal.parse("Stephanie is a white cat aged 10")
print(steph) # Animal(kind='cat', name='Stephanie', colour=<Colour.WHITE: 2>, age=10)
static wrap(parser_fn: ParserFn, wrapper_fn: Callable) ParserFn

Makes a wrapped parser function.

parser_fn is called and if a non-None value is returned, wrapper_function is called with it. Otherwise the function returns early with None. In pseudo-code:

intermediate_value := parser_fn(input)
if intermediary_value is None then
    output := None
else
    output := wrapper_fn(intermediate_value)
Parameters:
  • parser_fn (ParserFn) – The dictionary storing the parser function to be wrapped.

  • wrapper_fn (Callable) – The function that wraps parser_fn.

Returns:

A dictionary for the dataclasses.field metadata argument containing the

newly wrapped parser function.

Return type:

ParserFn

static find(pattern: str | re.Pattern[str], flags: RegexFlag = RegexFlag.NOFLAG, named: bool = False) ParserFn

Makes a parser function that finds a regular expression match in the text.

If the pattern has any capturing groups, it returns None if no match was found, otherwise a tuple containing the values per each group is returned. If the pattern has only one capturing group and a match was found, its value is returned. If the pattern has no capturing groups then either True or False is returned if the pattern had a match or not.

Parameters:
  • pattern (str | re.Pattern[str]) – The regular expression pattern.

  • flags (RegexFlag) – The regular expression flags. Ignored if the given pattern is already compiled.

  • named (bool) – If set to True only the named capturing groups will be returned, as a dictionary.

Returns:

A dictionary for the dataclasses.field metadata argument containing the find

parser function.

Return type:

ParserFn

static find_int(pattern: str | re.Pattern[str], flags: RegexFlag = RegexFlag.NOFLAG, int_base: int = 0) ParserFn

Makes a parser function that converts the match of find() to int.

This function is compatible only with a pattern containing one capturing group.

Parameters:
  • pattern (str | re.Pattern[str]) – The regular expression pattern.

  • flags (RegexFlag) – The regular expression flags. Ignored if the given pattern is already compiled.

  • int_base (int) – The base of the number to convert from.

Raises:

InternalError – If the pattern does not have exactly one capturing group.

Returns:

A dictionary for the dataclasses.field metadata argument containing the

find() parser function wrapped by the int built-in.

Return type:

ParserFn

classmethod parse(text: str) typing_extensions.Self

Creates a new instance of the class from the given text.

A new class instance is created with all the fields that have a parser function in their metadata. Fields without one are ignored and are expected to have a default value, otherwise the class initialization will fail.

A field is populated with the value returned by its corresponding parser function.

Parameters:

text (str) – the text to parse

Raises:

InternalError – if the parser did not find a match and the field does not have a default value or default factory.

Returns:

A new instance of the class.

Return type:

typing_extensions.Self

__init__() None