edsnlp.language
EDSDefaults [source]
Bases: FrenchDefaults
Defaults for the EDSLanguage class Mostly identical to the FrenchDefaults, but without tokenization info
EDSLanguage [source]
Bases: French
French clinical language. It is shipped with the EDSTokenizer tokenizer that better handles tokenization for French clinical documents
EDSTokenizer [source]
Bases: Tokenizer
Tokenizer class for French clinical documents. It better handles tokenization around: - numbers: "ACR5" -> ["ACR", "5"] instead of ["ACR5"] - newlines: "
" -> [" ", " ", " "] instead of ["
"] and should be around 5-6 times faster than its standard French counterpart.
Parameters
vocab: Vocab The spacy vocabulary
__call__ [source]
Tokenizes the text using the EDSTokenizer
Parameters
| PARAMETER | DESCRIPTION |
|---|---|
text | TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
Doc | |
create_eds_tokenizer [source]
Creates a factory that returns new EDSTokenizer instances
| RETURNS | DESCRIPTION |
|---|---|
EDSTokenizer | |