Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
6.1.3 source code.tar.gz | 2025-09-01 | 254.0 MB | |
6.1.3 source code.zip | 2025-09-01 | 435.7 MB | |
README.md | 2025-09-01 | 5.4 kB | |
Totals: 3 Items | 689.7 MB | 2 |
📢 Spark NLP 6.1.3: NerDL Graph Checker, Reader2Doc Enhancements, Ranking Finisher
We are pleased to announce Spark NLP 6.1.3, introducing a new graph validation annotator for NER training, enhancements to Reader2Doc for flexible document handling, and a new ranking finisher for AutoGGUFReranker outputs. This release focuses on improving training robustness, document processing flexibility, and retrieval ranking capabilities.
🔥 Highlights
- New NerDLGraphChecker annotator to validate NER training graphs before training starts.
- Reader2Doc enhancements with options for consolidated output and filtering.
- New AutoGGUFRerankerFinisher for ranking, filtering, and normalizing reranker outputs.
🚀 New Features & Enhancements
Named Entity Recognition (NER)
NerDLGraphChecker
:
A new annotator that validates whether a suitable NerDL graph is available for a given training dataset before embeddings or training start. This helps avoid wasted computation in custom training scenarios. (Link to notebook)
- Must be placed before embedding or
NerDLApproach
annotators. - Requires token and label columns in the dataset.
- Automatically extracts embedding dimensions from the pipeline to validate graph compatibility.
Document Processing
Reader2Doc
Enhancements:
New configuration options provide more control over output formatting:
outputAsDocument
: Concatenates all sentences into a single document.excludeNonText
: Filters out non-textual elements (e.g., tables, images) from the document.
Ranking & Retrieval
AutoGGUFRerankerFinisher
:
A finisher for processing AutoGGUFReranker
outputs, adding advanced ranking and filtering capabilities (Link to notebook):
- Top-k document selection.
- Score threshold filtering.
- Min-max score normalization (0–1 range).
- Sorting by relevance score.
- Rank assignment in metadata while preserving document structure.
🐛 Bug Fixes
None.
❤️ Community Support
- Slack Live discussion with the Spark NLP community and team
- GitHub Bug reports, feature requests, and contributions
- Discussions Share ideas and engage with other community members
- Medium Spark NLP technical articles
- JohnSnowLabs Medium Official blog
- YouTube Spark NLP tutorials and demos
Installation
Python
:::shell
pip install spark-nlp==6.1.3
Spark Packages
spark-nlp on Apache Spark 3.0.x–3.4.x (Scala 2.12):
:::shell
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:6.1.3
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:6.1.3
GPU
:::shell
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:6.1.3
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:6.1.3
Apple Silicon (M1 & M2)
:::shell
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:6.1.3
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:6.1.3
AArch64
:::shell
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:6.1.3
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:6.1.3
Maven
spark-nlp:
:::xml
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp_2.12</artifactId>
<version>6.1.3</version>
</dependency>
spark-nlp-gpu:
:::xml
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-gpu_2.12</artifactId>
<version>6.1.3</version>
</dependency>
spark-nlp-silicon:
:::xml
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-silicon_2.12</artifactId>
<version>6.1.3</version>
</dependency>
spark-nlp-aarch64:
:::xml
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-aarch64_2.12</artifactId>
<version>6.1.3</version>
</dependency>
FAT JARs
- CPU: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-6.1.3.jar
- GPU: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-assembly-6.1.3.jar
- M1: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-silicon-assembly-6.1.3.jar
- AArch64: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-aarch64-assembly-6.1.3.jar
What's Changed
- https://github.com/JohnSnowLabs/spark-nlp/pull/14657
- https://github.com/JohnSnowLabs/spark-nlp/pull/14656
- https://github.com/JohnSnowLabs/spark-nlp/pull/14653
Full Changelog: https://github.com/JohnSnowLabs/spark-nlp/compare/6.1.2...6.1.3