Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
6.1.1 source code.tar.gz | 2025-08-05 | 254.0 MB | |
6.1.1 source code.zip | 2025-08-05 | 435.7 MB | |
README.md | 2025-08-05 | 6.9 kB | |
Totals: 3 Items | 689.7 MB | 0 |
📢 Spark NLP 6.1.1: Enhanced LLM Performance and Expanded Data Ingestion Capabilities
We are thrilled to announce Spark NLP 6.1.1, a focused release that delivers significant performance improvements and enhanced functionality for large language models and universal data ingestion. This release continues our commitment to providing state-of-the-art AI capabilities within the native Spark ecosystem, with optimized inference performance and expanded multimodal support.
🔥 Highlights
- Performance Boost for llama.cpp models: Inference optimizations in
AutoGGUFModel
andAutoGGUFEmbeddings
deliver improvements for large language model workflows on GPU. - Multimodal Vision Models Restored: The
AutoGGUFVisionModel
annotator is back with full functionality and latest SOTA VLMs, enabling sophisticated vision-language processing capabilities. - Enhanced Table Processing: New
Reader2Table
annotator streamlines tabular data extraction from multiple document formats with seamless pipeline integration. - Upgraded openVINO backend: We upgraded our openVINO backend to 2025.02 and added hyperthreading configuration options to maximize performance on multi-core systems.
🚀 New Features & Enhancements
Large Language Models (LLMs)
- Optimized
AutoGGUFModel
Performance: We improved the inference of llama.cpp models and achieved a 10% performance increase forAutoGGUFModel
on GPU. - Restored
AutoGGUFVisionModel
: The multimodal vision model annotator is fully operational again, enabling powerful vision-language processing capabilities. Users can now process images alongside text for comprehensive multimodal AI applications while using the latest SOTA vision-language models. - Enhanced Model Compatibility:
AutoGGUFModel
can now seamlessly load the language model components from pretrainedAutoGGUFVisionModel
instances, providing greater flexibility in model deployment and usage. (Link to notebook) - Robust Model Loading: Pretrained AutoGGUF-based annotators now load despite the inclusion of deprecated parameters, ensuring broader compatibility.
- Updated Default Models: All AutoGGUF annotators now use more recent and capable pretrained models:
Annotator | Default pretrained model |
---|---|
AutoGGUFModel | Phi_4_mini_instruct_Q4_K_M_gguf |
AutoGGUFEmbeddings | Qwen3_Embedding_0.6B_Q8_0_gguf |
AutoGGUFVisionModel | Qwen2.5_VL_3B_Instruct_Q4_K_M_gguf |
Document Ingestion
Reader2Table
Annotator: This powerful new annotator provides a streamlined interface for extracting and processing tabular data from various document formats (Link to notebook). It offers:- Unified API for interacting with Spark NLP readers
- Enhanced flexibility through reader-specific configurations
- Improved maintainability and scalability for data loading workflows
- Support for multiple formats including HTML, Word (.doc/.docx), Excel (.xls/.xlsx), PowerPoint (.ppt/.pptx), Markdown (.md), and CSV (.csv)
Performance Optimizations
- OpenVINO Upgrade: We upgrade the backend to 2025.02 and added comprehensive hyperthreading configuration options for the OpenVINO backend, enabling users to optimize performance on multi-core systems by fine-tuning thread allocation and CPU utilization.
🐛 Bug Fixes
None
❤️ Community Support
- Slack: For live discussion with the Spark NLP community and the team.
- GitHub: Bug reports, feature requests, and contributions.
- Discussions: Engage with other community members, share ideas, and show off how you use Spark NLP!
- Medium: Spark NLP articles.
- JohnSnowLabs official Medium
- YouTube: Spark NLP video tutorials.
Installation
Python
:::shell
pip install spark-nlp==6.1.1
Spark Packages
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x (Scala 2.12):
:::shell
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:6.1.1
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:6.1.1
GPU
:::shell
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:6.1.1
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:6.1.1
Apple Silicon
:::shell
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:6.1.1
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:6.1.1
AArch64
:::shell
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:6.1.1
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:6.1.1
Maven
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x:
:::xml
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp_2.12</artifactId>
<version>6.1.1</version>
</dependency>
spark-nlp-gpu:
:::xml
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-gpu_2.12</artifactId>
<version>6.1.1</version>
</dependency>
spark-nlp-silicon:
:::xml
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-silicon_2.12</artifactId>
<version>6.1.1</version>
</dependency>
spark-nlp-aarch64:
:::xml
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-aarch64_2.12</artifactId>
<version>6.1.1</version>
</dependency>
FAT JARs
- CPU on Apache Spark 3.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-6.1.1.jar
- GPU on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-assembly-6.1.1.jar
- M1 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-silicon-assembly-6.1.1.jar
- AArch64 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-aarch64-assembly-6.1.1.jar
What's Changed
- https://github.com/JohnSnowLabs/spark-nlp/pull/14641 by @prabod
- https://github.com/JohnSnowLabs/spark-nlp/pull/14640 by @danilojsl
- https://github.com/JohnSnowLabs/spark-nlp/pull/14644 by @DevinTDHA and @C-K-Loan
Full Changelog: https://github.com/JohnSnowLabs/spark-nlp/compare/6.1.0...6.1.1