Skip to content

edsnlp.processing.spark

execute_spark_backend [source]

This execution mode uses Spark to parallelize the processing of the documents. The documents are first stored in a Spark DataFrame (if it was not already the case) and then processed in parallel using Spark.

Beware, if the original reader was not a SparkReader (edsnlp.data.from_spark), the local docsspark dataframe conversion might take some time, and the whole process might be slower than using the multiprocessing backend.