`edsnlp.core.torch_component`

`TorchComponent` [source]

Bases: BaseComponent, Module, Generic[BatchOutput, BatchInput]

A TorchComponent is a Component that can be trained and inherits torch.nn.Module. You can use it either as a torch module inside a more complex neural network, or as a standalone component in a Pipeline.

In addition to the methods of a torch module, a TorchComponent adds a few methods to handle preprocessing and collating features, as well as caching intermediate results for components that share a common subcomponent.

`post_init` [source]

This method completes the attributes of the component, by looking at some documents. It is especially useful to build vocabularies or detect the labels of a classification task.

Parameters

PARAMETER DESCRIPTION

gold_data

The documents to use for initialization.

TYPE: Iterable[Doc]

exclude

The names of components to exclude from initialization. This argument will be gradually updated with the names of initialized components

TYPE: Set[str]

`preprocess` [source]

Preprocess the document to extract features that will be used by the neural network and its subcomponents on to perform its predictions.

Parameters

PARAMETER DESCRIPTION

doc

Document to preprocess

TYPE: Doc

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary (optionally nested) containing the features extracted from the document.

`collate` [source]

Collate the batch of features into a single batch of tensors that can be used by the forward method of the component.

Parameters

PARAMETER DESCRIPTION

batch

Batch of features

TYPE: Dict[str, Any]

RETURNS	DESCRIPTION
`BatchInput`	Dictionary (optionally nested) containing the collated tensors

`batch_to_device` [source]

Move the batch of tensors to the specified device.

Parameters

PARAMETER DESCRIPTION

batch

Batch of tensors

TYPE: BatchInput

device

Device to move the tensors to

TYPE: Optional[Union[str, device]]

RETURNS	DESCRIPTION
`BatchInput`

`forward` [source]

Perform the forward pass of the neural network.

Parameters

PARAMETER DESCRIPTION

batch

Batch of tensors (nested dictionary) computed by the collate method

TYPE: BatchInput

RETURNS	DESCRIPTION
`BatchOutput`

`compute_training_metrics` [source]

Compute post-gather metrics on the batch output. This is a no-op by default. This is useful to compute averages when doing multi-gpu training or mini-batch accumulation since full denominators are not known during the forward pass.

`module_forward` [source]

This is a wrapper around torch.nn.Module.__call__ to avoid conflict with the components __call__ method.

`preprocess_batch` [source]

Convenience method to preprocess a batch of documents. Features corresponding to the same path are grouped together in a list, under the same key.

Parameters

PARAMETER DESCRIPTION

docs

Batch of documents

TYPE: Sequence[Doc]

supervision

Whether to extract supervision features or not

DEFAULT: False

RETURNS	DESCRIPTION
`Dict[str, Sequence[Any]]`	The batch of features

`prepare_batch` [source]

Convenience method to preprocess a batch of documents and collate them Features corresponding to the same path are grouped together in a list, under the same key.

Parameters

PARAMETER DESCRIPTION

docs

Batch of documents

TYPE: Sequence[Doc]

supervision

Whether to extract supervision features or not

TYPE: bool DEFAULT: False

device

Device to move the tensors to

TYPE: Optional[Union[str, device]] DEFAULT: None

RETURNS	DESCRIPTION
`Dict[str, Sequence[Any]]`

`batch_process` [source]

Process a batch of documents using the neural network. This differs from the pipe method in that it does not return an iterator, but executes the component on the whole batch at once.

Parameters

PARAMETER DESCRIPTION

docs

Batch of documents

TYPE: Sequence[Doc]

RETURNS	DESCRIPTION
`Sequence[Doc]`	Batch of updated documents

`postprocess` [source]

Update the documents with the predictions of the neural network. By default, this is a no-op.

Parameters

PARAMETER DESCRIPTION

docs

List of documents to update

TYPE: Sequence[Doc]

results

Batch of predictions, as returned by the forward method

TYPE: BatchOutput

inputs

List of preprocessed features, as returned by the preprocess method

TYPE: List[Dict[str, Any]]

RETURNS	DESCRIPTION
`Sequence[Doc]`

`preprocess_supervised` [source]

Preprocess the document to extract features that will be used by the neural network to perform its training. By default, this returns the same features as the preprocess method.

Parameters

PARAMETER DESCRIPTION

doc

Document to preprocess

TYPE: Doc

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary (optionally nested) containing the features extracted from the document.

`pipe` [source]

Applies the component on a collection of documents. It is recommended to use the Pipeline.pipe method instead of this one to apply a pipeline on a collection of documents, to benefit from the caching of intermediate results.

Parameters

PARAMETER DESCRIPTION

docs

Input docs

TYPE: Iterable[Doc]

batch_size

Batch size to use when making batched to be process at once

DEFAULT: 1

edsnlp.core.torch_component

TorchComponent [source]

post_init [source]

Parameters

preprocess [source]

Parameters

collate [source]

Parameters

batch_to_device [source]

Parameters

forward [source]

Parameters

compute_training_metrics [source]

module_forward [source]

preprocess_batch [source]

Parameters

prepare_batch [source]

Parameters

batch_process [source]

Parameters

postprocess [source]

Parameters

preprocess_supervised [source]

Parameters

pipe [source]

Parameters

`edsnlp.core.torch_component`

`TorchComponent` [source]

`post_init` [source]

`preprocess` [source]

`collate` [source]

`batch_to_device` [source]

`forward` [source]

`compute_training_metrics` [source]

`module_forward` [source]

`preprocess_batch` [source]

`prepare_batch` [source]

`batch_process` [source]

`postprocess` [source]

`preprocess_supervised` [source]

`pipe` [source]