Skip to content

edsnlp.language

EDSDefaults [source]

Bases: FrenchDefaults

Defaults for the EDSLanguage class Mostly identical to the FrenchDefaults, but without tokenization info

EDSLanguage [source]

Bases: French

French clinical language. It is shipped with the EDSTokenizer tokenizer that better handles tokenization for French clinical documents

EDSTokenizer [source]

Bases: Tokenizer

Tokenizer class for French clinical documents. It better handles tokenization around: - numbers: "ACR5" -> ["ACR", "5"] instead of ["ACR5"] - newlines: "

" -> [" ", " ", " "] instead of ["

"] and should be around 5-6 times faster than its standard French counterpart.

Parameters

vocab: Vocab The spacy vocabulary

__call__ [source]

Tokenizes the text using the EDSTokenizer

Parameters

PARAMETER DESCRIPTION
text

TYPE: str

RETURNS DESCRIPTION
Doc

create_eds_tokenizer [source]

Creates a factory that returns new EDSTokenizer instances

RETURNS DESCRIPTION
EDSTokenizer