edsnlp.core.torch_component
TorchComponent [source]
Bases: BaseComponent, Module, Generic[BatchOutput, BatchInput]
A TorchComponent is a Component that can be trained and inherits torch.nn.Module. You can use it either as a torch module inside a more complex neural network, or as a standalone component in a Pipeline.
In addition to the methods of a torch module, a TorchComponent adds a few methods to handle preprocessing and collating features, as well as caching intermediate results for components that share a common subcomponent.
post_init [source]
This method completes the attributes of the component, by looking at some documents. It is especially useful to build vocabularies or detect the labels of a classification task.
Parameters
| PARAMETER | DESCRIPTION |
|---|---|
gold_data | The documents to use for initialization. TYPE: |
exclude | The names of components to exclude from initialization. This argument will be gradually updated with the names of initialized components TYPE: |
preprocess [source]
Preprocess the document to extract features that will be used by the neural network and its subcomponents on to perform its predictions.
Parameters
| PARAMETER | DESCRIPTION |
|---|---|
doc | Document to preprocess TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any] | Dictionary (optionally nested) containing the features extracted from the document. |
collate [source]
Collate the batch of features into a single batch of tensors that can be used by the forward method of the component.
Parameters
| PARAMETER | DESCRIPTION |
|---|---|
batch | Batch of features TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
BatchInput | Dictionary (optionally nested) containing the collated tensors |
batch_to_device [source]
Move the batch of tensors to the specified device.
Parameters
| PARAMETER | DESCRIPTION |
|---|---|
batch | Batch of tensors TYPE: |
device | Device to move the tensors to TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
BatchInput | |
forward [source]
Perform the forward pass of the neural network.
Parameters
| PARAMETER | DESCRIPTION |
|---|---|
batch | Batch of tensors (nested dictionary) computed by the collate method TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
BatchOutput | |
compute_training_metrics [source]
Compute post-gather metrics on the batch output. This is a no-op by default. This is useful to compute averages when doing multi-gpu training or mini-batch accumulation since full denominators are not known during the forward pass.
module_forward [source]
This is a wrapper around torch.nn.Module.__call__ to avoid conflict with the components __call__ method.
preprocess_batch [source]
Convenience method to preprocess a batch of documents. Features corresponding to the same path are grouped together in a list, under the same key.
Parameters
| PARAMETER | DESCRIPTION |
|---|---|
docs | Batch of documents TYPE: |
supervision | Whether to extract supervision features or not DEFAULT: |
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Sequence[Any]] | The batch of features |
prepare_batch [source]
Convenience method to preprocess a batch of documents and collate them Features corresponding to the same path are grouped together in a list, under the same key.
Parameters
| PARAMETER | DESCRIPTION |
|---|---|
docs | Batch of documents TYPE: |
supervision | Whether to extract supervision features or not TYPE: |
device | Device to move the tensors to TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Sequence[Any]] | |
batch_process [source]
Process a batch of documents using the neural network. This differs from the pipe method in that it does not return an iterator, but executes the component on the whole batch at once.
Parameters
| PARAMETER | DESCRIPTION |
|---|---|
docs | Batch of documents TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
Sequence[Doc] | Batch of updated documents |
postprocess [source]
Update the documents with the predictions of the neural network. By default, this is a no-op.
Parameters
| PARAMETER | DESCRIPTION |
|---|---|
docs | List of documents to update TYPE: |
results | Batch of predictions, as returned by the forward method TYPE: |
inputs | List of preprocessed features, as returned by the preprocess method TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
Sequence[Doc] | |
preprocess_supervised [source]
Preprocess the document to extract features that will be used by the neural network to perform its training. By default, this returns the same features as the preprocess method.
Parameters
| PARAMETER | DESCRIPTION |
|---|---|
doc | Document to preprocess TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any] | Dictionary (optionally nested) containing the features extracted from the document. |
pipe [source]
Applies the component on a collection of documents. It is recommended to use the Pipeline.pipe method instead of this one to apply a pipeline on a collection of documents, to benefit from the caching of intermediate results.
Parameters
| PARAMETER | DESCRIPTION |
|---|---|
docs | Input docs TYPE: |
batch_size | Batch size to use when making batched to be process at once DEFAULT: |