Skip to content

edsnlp.pipes.core.normalizer.factory

create_component [source]

Normalisation pipeline. Modifies the NORM attribute, acting on five dimensions :

  • lowercase: using the default NORM
  • accents: deterministic and fixed-length normalisation of accents.
  • quotes: deterministic and fixed-length normalisation of quotation marks.
  • spaces: "removal" of spaces tokens (via the tag_ attribute).
  • pollution: "removal" of pollutions (via the tag_ attribute).

Parameters

PARAMETER DESCRIPTION
nlp

The pipeline object.

TYPE: PipelineProtocol

name

The component name.

TYPE: str DEFAULT: 'normalizer'

lowercase

Whether to remove case.

TYPE: bool DEFAULT: True

accents

Accents configuration object

TYPE: Union[bool, Dict[str, Any]] DEFAULT: True

quotes

Quotes configuration object

TYPE: Union[bool, Dict[str, Any]] DEFAULT: True

spaces

Spaces configuration object

TYPE: Union[bool, Dict[str, Any]] DEFAULT: True

pollution

Optional Pollution configuration object.

TYPE: Union[bool, Dict[str, Any]] DEFAULT: True