Getting started
EDS-NLP is a collaborative NLP framework that aims at extracting information from French clinical notes. At its core, it is a collection of components or pipes, either rule-based functions or deep learning modules. These components are organized into a novel efficient and modular pipeline system, built for hybrid and multitask models. We use spaCy to represent documents and their annotations, and Pytorch as a deep-learning backend for trainable components.
EDS-NLP is versatile and can be used on any textual document. The rule-based components are fully compatible with spaCy's pipelines, and vice versa. This library is a product of collaborative effort, and we encourage further contributions to enhance its capabilities.
Check out our interactive demo !
Quick start
Installation
You can install EDS-NLP via pip
. We recommend pinning the library version in your projects, or use a strict package manager like Poetry.
pip install edsnlp==0.17.2
or if you want to use the trainable components (using pytorch)
pip install "edsnlp[ml]==0.17.2"
A first pipeline
Once you've installed the library, let's begin with a very simple example that extracts mentions of COVID19 in a text, and detects whether they are negated.
import edsnlp, edsnlp.pipes as eds
nlp = edsnlp.blank("eds") # (1)
terms = dict(
covid=["covid", "coronavirus"], # (2)
)
# Sentencizer component, needed for negation detection
nlp.add_pipe(eds.sentences()) # (3)
# Matcher component
nlp.add_pipe(eds.matcher(terms=terms)) # (4)
# Negation detection
nlp.add_pipe(eds.negation())
# Process your text in one call !
doc = nlp("Le patient n'est pas atteint de covid")
doc.ents # (5)
# Out: (covid,)
doc.ents[0]._.negation # (6)
# Out: True
- 'eds' is the name of the language, which defines the tokenizer.
- This example terminology provides a very simple, and by no means exhaustive, list of synonyms for COVID19.
- Similarly to spaCy, pipes are added via the
nlp.add_pipe
method. - See the matching tutorial for mode details.
- spaCy stores extracted entities in the
Doc.ents
attribute. - The
eds.negation
component has adds anegation
custom attribute.
This example is complete, it should run as-is.
Tutorials
To learn more about EDS-NLP, we have prepared a series of tutorials that should cover the main features of the library.
Available pipeline components
See the Core components overview for more information.
Component | Description |
---|---|
eds.normalizer | Non-destructive input text normalisation |
eds.sentences | Better sentence boundary detection |
eds.matcher | A simple yet powerful entity extractor |
eds.terminology | A simple yet powerful terminology matcher |
eds.contextual_matcher | A conditional entity extractor |
eds.endlines | An unsupervised model to classify each end line |
See the Qualifiers overview for more information.
Pipeline | Description |
---|---|
eds.negation | Rule-based negation detection |
eds.family | Rule-based family context detection |
eds.hypothesis | Rule-based speculation detection |
eds.reported_speech | Rule-based reported speech detection |
eds.history | Rule-based medical history detection |
See the Miscellaneous components overview for more information.
Component | Description |
---|---|
eds.dates | Date extraction and normalisation |
eds.consultation_dates | Identify consultation dates |
eds.quantities | Quantity extraction and normalisation |
eds.sections | Section detection |
eds.reason | Rule-based hospitalisation reason detection |
eds.tables | Tables detection |
eds.split | Doc splitting |
eds.explode | Explode entities between multiples copies of a document |
See the NER overview for more information.
Component | Description |
---|---|
eds.covid | A COVID mentions detector |
eds.charlson | A Charlson score extractor |
eds.sofa | A SOFA score extractor |
eds.elston_ellis | An Elston & Ellis code extractor |
eds.emergency_priority | A priority score extractor |
eds.emergency_ccmu | A CCMU score extractor |
eds.emergency_gemsa | A GEMSA score extractor |
eds.tnm | A TNM score extractor |
eds.adicap | A ADICAP codes extractor |
eds.drugs | A drug mentions extractor |
eds.cim10 | A CIM10 terminology matcher |
eds.umls | An UMLS terminology matcher |
eds.ckd | CKD extractor |
eds.copd | COPD extractor |
eds.cerebrovascular_accident | Cerebrovascular accident extractor |
eds.congestive_heart_failure | Congestive heart failure extractor |
eds.connective_tissue_disease | Connective tissue disease extractor |
eds.dementia | Dementia extractor |
eds.diabetes | Diabetes extractor |
eds.hemiplegia | Hemiplegia extractor |
eds.leukemia | Leukemia extractor |
eds.liver_disease | Liver disease extractor |
eds.lymphoma | Lymphoma extractor |
eds.myocardial_infarction | Myocardial infarction extractor |
eds.peptic_ulcer_disease | Peptic ulcer disease extractor |
eds.peripheral_vascular_disease | Peripheral vascular disease extractor |
eds.solid_tumor | Solid tumor extractor |
eds.alcohol | Alcohol consumption extractor |
eds.tobacco | Tobacco consumption extractor |
See the Trainable components overview for more information.
Name | Description |
---|---|
eds.transformer | Embed text with a transformer model |
eds.text_cnn | Contextualize embeddings with a CNN |
eds.span_pooler | A span embedding component that aggregates word embeddings |
eds.ner_crf | A trainable component to extract entities |
eds.extractive_qa | A trainable component for extractive question answering |
eds.span_classifier | A trainable component for multi-class multi-label span classification |
eds.span_linker | A trainable entity linker (i.e. to a list of concepts) |
eds.biaffine_dep_parser | A trainable biaffine dependency parser |
Disclaimer
The performances of an extraction pipeline may depend on the population and documents that are considered.
Contributing to EDS-NLP
We welcome contributions ! Fork the project and propose a pull request. Take a look at the dedicated page for detail.
Citation
If you use EDS-NLP, please cite us as below.
@misc{edsnlp,
author = {Wajsburt, Perceval and Petit-Jean, Thomas and Dura, Basile and Cohen, Ariel and Jean, Charline and Bey, Romain},
doi = {10.5281/zenodo.6424993},
title = {EDS-NLP: efficient information extraction from French clinical notes},
url = {https://aphp.github.io/edsnlp}
}