edsnlp.processing.spark
execute_spark_backend [source]
This execution mode uses Spark to parallelize the processing of the documents. The documents are first stored in a Spark DataFrame (if it was not already the case) and then processed in parallel using Spark.
Beware, if the original reader was not a SparkReader (edsnlp.data.from_spark), the local docs → spark dataframe conversion might take some time, and the whole process might be slower than using the multiprocessing backend.