Skip to content

`edsnlp.processing.spark`

`execute_spark_backend` [source]

This execution mode uses Spark to parallelize the processing of the documents. The documents are first stored in a Spark DataFrame (if it was not already the case) and then processed in parallel using Spark.

Beware, if the original reader was not a SparkReader (edsnlp.data.from_spark), the local docs → spark dataframe conversion might take some time, and the whole process might be slower than using the multiprocessing backend.