Download Latest Version 6.1.1 source code.tar.gz (254.0 MB)
Email in envelope

Get an email when there's a new version of Spark NLP

Home / 6.1.1
Name Modified Size InfoDownloads / Week
Parent folder
6.1.1 source code.tar.gz 2025-08-05 254.0 MB
6.1.1 source code.zip 2025-08-05 435.7 MB
README.md 2025-08-05 6.9 kB
Totals: 3 Items   689.7 MB 0

📢 Spark NLP 6.1.1: Enhanced LLM Performance and Expanded Data Ingestion Capabilities

We are thrilled to announce Spark NLP 6.1.1, a focused release that delivers significant performance improvements and enhanced functionality for large language models and universal data ingestion. This release continues our commitment to providing state-of-the-art AI capabilities within the native Spark ecosystem, with optimized inference performance and expanded multimodal support.

🔥 Highlights

  • Performance Boost for llama.cpp models: Inference optimizations in AutoGGUFModel and AutoGGUFEmbeddings deliver improvements for large language model workflows on GPU.
  • Multimodal Vision Models Restored: The AutoGGUFVisionModel annotator is back with full functionality and latest SOTA VLMs, enabling sophisticated vision-language processing capabilities.
  • Enhanced Table Processing: New Reader2Table annotator streamlines tabular data extraction from multiple document formats with seamless pipeline integration.
  • Upgraded openVINO backend: We upgraded our openVINO backend to 2025.02 and added hyperthreading configuration options to maximize performance on multi-core systems.

🚀 New Features & Enhancements

Large Language Models (LLMs)

  • Optimized AutoGGUFModel Performance: We improved the inference of llama.cpp models and achieved a 10% performance increase for AutoGGUFModel on GPU.
  • Restored AutoGGUFVisionModel: The multimodal vision model annotator is fully operational again, enabling powerful vision-language processing capabilities. Users can now process images alongside text for comprehensive multimodal AI applications while using the latest SOTA vision-language models.
  • Enhanced Model Compatibility: AutoGGUFModel can now seamlessly load the language model components from pretrained AutoGGUFVisionModel instances, providing greater flexibility in model deployment and usage. (Link to notebook)
  • Robust Model Loading: Pretrained AutoGGUF-based annotators now load despite the inclusion of deprecated parameters, ensuring broader compatibility.
  • Updated Default Models: All AutoGGUF annotators now use more recent and capable pretrained models:
Annotator Default pretrained model
AutoGGUFModel Phi_4_mini_instruct_Q4_K_M_gguf
AutoGGUFEmbeddings Qwen3_Embedding_0.6B_Q8_0_gguf
AutoGGUFVisionModel Qwen2.5_VL_3B_Instruct_Q4_K_M_gguf

Document Ingestion

  • Reader2Table Annotator: This powerful new annotator provides a streamlined interface for extracting and processing tabular data from various document formats (Link to notebook). It offers:
  • Unified API for interacting with Spark NLP readers
  • Enhanced flexibility through reader-specific configurations
  • Improved maintainability and scalability for data loading workflows
  • Support for multiple formats including HTML, Word (.doc/.docx), Excel (.xls/.xlsx), PowerPoint (.ppt/.pptx), Markdown (.md), and CSV (.csv)

Performance Optimizations

  • OpenVINO Upgrade: We upgrade the backend to 2025.02 and added comprehensive hyperthreading configuration options for the OpenVINO backend, enabling users to optimize performance on multi-core systems by fine-tuning thread allocation and CPU utilization.

🐛 Bug Fixes

None

❤️ Community Support

  • Slack: For live discussion with the Spark NLP community and the team.
  • GitHub: Bug reports, feature requests, and contributions.
  • Discussions: Engage with other community members, share ideas, and show off how you use Spark NLP!
  • Medium: Spark NLP articles.
  • JohnSnowLabs official Medium
  • YouTube: Spark NLP video tutorials.

Installation

Python

:::shell
pip install spark-nlp==6.1.1

Spark Packages

spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x (Scala 2.12):

:::shell
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:6.1.1
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:6.1.1

GPU

:::shell
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:6.1.1
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:6.1.1

Apple Silicon

:::shell
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:6.1.1
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:6.1.1

AArch64

:::shell
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:6.1.1
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:6.1.1

Maven

spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x:

:::xml
<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp_2.12</artifactId>
    <version>6.1.1</version>
</dependency>

spark-nlp-gpu:

:::xml
<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-gpu_2.12</artifactId>
    <version>6.1.1</version>
</dependency>

spark-nlp-silicon:

:::xml
<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-silicon_2.12</artifactId>
    <version>6.1.1</version>
</dependency>

spark-nlp-aarch64:

:::xml
<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-aarch64_2.12</artifactId>
    <version>6.1.1</version>
</dependency>

FAT JARs

What's Changed

Full Changelog: https://github.com/JohnSnowLabs/spark-nlp/compare/6.1.0...6.1.1

Source: README.md, updated 2025-08-05