The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
6.1.1 source code.tar.gz	2025-08-05	254.0 MB	0
6.1.1 source code.zip	2025-08-05	435.7 MB	0
README.md	2025-08-05	6.9 kB	0
Totals: 3 Items		689.7 MB	0

📢 Spark NLP 6.1.1: Enhanced LLM Performance and Expanded Data Ingestion Capabilities

We are thrilled to announce Spark NLP 6.1.1, a focused release that delivers significant performance improvements and enhanced functionality for large language models and universal data ingestion. This release continues our commitment to providing state-of-the-art AI capabilities within the native Spark ecosystem, with optimized inference performance and expanded multimodal support.

🔥 Highlights

Performance Boost for llama.cpp models: Inference optimizations in AutoGGUFModel and AutoGGUFEmbeddings deliver improvements for large language model workflows on GPU.
Multimodal Vision Models Restored: The AutoGGUFVisionModel annotator is back with full functionality and latest SOTA VLMs, enabling sophisticated vision-language processing capabilities.
Enhanced Table Processing: New Reader2Table annotator streamlines tabular data extraction from multiple document formats with seamless pipeline integration.
Upgraded openVINO backend: We upgraded our openVINO backend to 2025.02 and added hyperthreading configuration options to maximize performance on multi-core systems.

🚀 New Features & Enhancements

Large Language Models (LLMs)

Optimized AutoGGUFModel Performance: We improved the inference of llama.cpp models and achieved a 10% performance increase for AutoGGUFModel on GPU.
Restored AutoGGUFVisionModel: The multimodal vision model annotator is fully operational again, enabling powerful vision-language processing capabilities. Users can now process images alongside text for comprehensive multimodal AI applications while using the latest SOTA vision-language models.
Enhanced Model Compatibility: AutoGGUFModel can now seamlessly load the language model components from pretrained AutoGGUFVisionModel instances, providing greater flexibility in model deployment and usage. (Link to notebook)
Robust Model Loading: Pretrained AutoGGUF-based annotators now load despite the inclusion of deprecated parameters, ensuring broader compatibility.
Updated Default Models: All AutoGGUF annotators now use more recent and capable pretrained models:

Annotator	Default pretrained model
AutoGGUFModel	Phi_4_mini_instruct_Q4_K_M_gguf
AutoGGUFEmbeddings	Qwen3_Embedding_0.6B_Q8_0_gguf
AutoGGUFVisionModel	Qwen2.5_VL_3B_Instruct_Q4_K_M_gguf

Document Ingestion

Reader2Table Annotator: This powerful new annotator provides a streamlined interface for extracting and processing tabular data from various document formats (Link to notebook). It offers:
Unified API for interacting with Spark NLP readers
Enhanced flexibility through reader-specific configurations
Improved maintainability and scalability for data loading workflows
Support for multiple formats including HTML, Word (.doc/.docx), Excel (.xls/.xlsx), PowerPoint (.ppt/.pptx), Markdown (.md), and CSV (.csv)

Performance Optimizations

OpenVINO Upgrade: We upgrade the backend to 2025.02 and added comprehensive hyperthreading configuration options for the OpenVINO backend, enabling users to optimize performance on multi-core systems by fine-tuning thread allocation and CPU utilization.

🐛 Bug Fixes

None

❤️ Community Support

Slack: For live discussion with the Spark NLP community and the team.
GitHub: Bug reports, feature requests, and contributions.
Discussions: Engage with other community members, share ideas, and show off how you use Spark NLP!
Medium: Spark NLP articles.
JohnSnowLabs official Medium
YouTube: Spark NLP video tutorials.

Installation

Python

:::shell
pip install spark-nlp==6.1.1

Spark Packages

spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x (Scala 2.12):

:::shell
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:6.1.1
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:6.1.1

GPU

:::shell
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:6.1.1
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:6.1.1

Apple Silicon

:::shell
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:6.1.1
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:6.1.1

AArch64

:::shell
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:6.1.1
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:6.1.1

Maven

spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x:

:::xml
<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp_2.12</artifactId>
    <version>6.1.1</version>
</dependency>

spark-nlp-gpu:

:::xml
<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-gpu_2.12</artifactId>
    <version>6.1.1</version>
</dependency>

spark-nlp-silicon:

:::xml
<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-silicon_2.12</artifactId>
    <version>6.1.1</version>
</dependency>

spark-nlp-aarch64:

:::xml
<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-aarch64_2.12</artifactId>
    <version>6.1.1</version>
</dependency>

FAT JARs

CPU on Apache Spark 3.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-6.1.1.jar
GPU on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-assembly-6.1.1.jar
M1 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-silicon-assembly-6.1.1.jar
AArch64 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-aarch64-assembly-6.1.1.jar

What's Changed

https://github.com/JohnSnowLabs/spark-nlp/pull/14641 by @prabod
https://github.com/JohnSnowLabs/spark-nlp/pull/14640 by @danilojsl
https://github.com/JohnSnowLabs/spark-nlp/pull/14644 by @DevinTDHA and @C-K-Loan

Full Changelog: https://github.com/JohnSnowLabs/spark-nlp/compare/6.1.0...6.1.1

Source: README.md, updated 2025-08-05

Spark NLP Files

State of the Art Natural Language Processing

📢 Spark NLP 6.1.1: Enhanced LLM Performance and Expanded Data Ingestion Capabilities

🔥 Highlights

🚀 New Features & Enhancements

Large Language Models (LLMs)

Document Ingestion

Performance Optimizations

🐛 Bug Fixes

❤️ Community Support

Installation

Python

Spark Packages

Maven

FAT JARs

What's Changed

Spark NLP Files

State of the Art Natural Language Processing

Get an email when there's a new version of Spark NLP

📢 Spark NLP 6.1.1: Enhanced LLM Performance and Expanded Data Ingestion Capabilities

🔥 Highlights

🚀 New Features & Enhancements

Large Language Models (LLMs)

Document Ingestion

Performance Optimizations

🐛 Bug Fixes

❤️ Community Support

Installation

Python

Spark Packages

Maven

FAT JARs

What's Changed