Skip to content

edsnlp.data.polars

from_polars

The PolarsReader (or edsnlp.data.from_polars) handles reading from a table and yields documents. At the moment, only entities and attributes are loaded. Relations and events are not supported.

Example

import edsnlp

nlp = edsnlp.blank("eds")
nlp.add_pipe(...)
doc_iterator = edsnlp.data.from_polars(df, nlp=nlp, converter="omop")
annotated_docs = nlp.pipe(doc_iterator)

Generator vs list

edsnlp.data.from_polars returns a Stream. To iterate over the documents multiple times efficiently or to access them by index, you must convert it to a list

docs = list(edsnlp.data.from_polars(df, converter="omop"))

Parameters

PARAMETER DESCRIPTION
data

Polars object

TYPE: Union[DataFrame, LazyFrame]

shuffle

Whether to shuffle the data. If "dataset", the whole dataset will be shuffled at the beginning (of every epoch if looping).

TYPE: Literal['dataset', False] DEFAULT: False

seed

The seed to use for shuffling.

TYPE: Optional[int] DEFAULT: None

loop

Whether to loop over the data indefinitely.

TYPE: bool DEFAULT: False

converter

Converters to use to convert the rows of the DataFrame (represented as dicts) to Doc objects. These are documented on the Converters page.

TYPE: Optional[AsList[Union[str, Callable]]] DEFAULT: None

kwargs

Additional keyword arguments to pass to the converter. These are documented on the Converters page.

DEFAULT: {}

RETURNS DESCRIPTION
Stream

to_polars

edsnlp.data.to_polars writes a list of documents as a polars dataframe.

Example

import edsnlp

nlp = edsnlp.blank("eds")
nlp.add_pipe(...)

doc = nlp("My document with entities")

edsnlp.data.to_polars([doc], converter="omop")

Parameters

PARAMETER DESCRIPTION
data

The data to write (either a list of documents or a Stream).

TYPE: Union[Any, Stream]

dtypes

Dictionary of column names to dtypes. This is passed to the schema parameter of pl.from_dicts.

TYPE: Optional[dict] DEFAULT: None

converter

Converter to use to convert the documents to dictionary objects before storing them in the dataframe. These are documented on the Converters page.

TYPE: Optional[Union[str, Callable]] DEFAULT: None

execute

Whether to execute the writing operation immediately or to return a stream

TYPE: bool DEFAULT: True

kwargs

Additional keyword arguments to pass to the converter. These are documented on the Converters page.

DEFAULT: {}