Skip to content

edsnlp.pipes.core.endlines.model

EndLinesModel [source]

Model to classify if an end line is a real one or it should be a space.

Parameters

PARAMETER DESCRIPTION
nlp

spaCy nlp pipeline to use for matching.

TYPE: PipelineProtocol

fit_and_predict [source]

Fit the model and predict for the training data

Parameters

PARAMETER DESCRIPTION
corpus

An iterable of Documents

TYPE: Iterable[Doc]

RETURNS DESCRIPTION
DataFrame

one line by end_line prediction

predict [source]

Use the model for inference

The df should have the following columns: ["A1","A2","A3","A4","B1","B2","BLANK_LINE"]

Parameters

PARAMETER DESCRIPTION
df

The df should have the following columns: ["A1","A2","A3","A4","B1","B2","BLANK_LINE"]

TYPE: DataFrame

RETURNS DESCRIPTION
DataFrame

The result is added to the column PREDICTED_END_LINE

save [source]

Save a pickle of the model. It could be read by the pipeline later.

Parameters

PARAMETER DESCRIPTION
path

path to file .pkl, by default base_model.pkl

TYPE: str DEFAULT: 'base_model.pkl'