Skip to content

edsnlp.evaluate

evaluate [source]

Evaluate a model on a dataset. This function can be called from the command line or from a script.

By default, the model is loaded from artifacts/model-last, and the results are stored both in artifacts/test_metrics.json and in the model's artifacts/model-last/meta.json file.

Parameters

PARAMETER DESCRIPTION
data

A function that generates samples for evaluation

TYPE: SampleGenerator

model_path

The path to the model to evaluate

TYPE: Path DEFAULT: 'artifacts/model-last'

scorer

A function that computes metrics on the model. You can also pass a dict:

scorer = {
    "ner": NerExactMetric(...),
    ...
}

TYPE: GenericScorer

task_metadata

Metadata about the evaluation task. This will be stored in the model's meta.json file and is primarily meant to be parsed by the Hugging Face Hub, e.g., but also to remove previous results for the same dataset.

task_metadata={
    "task": {"type": "token-classification"},
    "dataset": {
        "name": "my_dataset",
        "type": "private",
    },
}

TYPE: Dict DEFAULT: {}