Experience the power of large language models like never before, unleashing the full potential of Natural Language Processing (NLP) with Spark NLP, the open source library that delivers scalable LLMs. The full code base is open under the Apache 2.0 license, including pre-trained models and pipelines. The only NLP library built natively on Apache Spark. The most widely used NLP library in the enterprise. Spark ML provides a set of machine learning applications that can be built using two main components, estimators and transformers. The estimators have a method that secures and trains a piece of data to such an application. The transformer is generally the result of a fitting process and applies changes to the target dataset. These components have been embedded to be applicable to Spark NLP. Pipelines are a mechanism for combining multiple estimators and transformers in a single workflow. They allow multiple chained transformations along a machine-learning task.
Features
- Text Preprocessing
- Parsing and Analysis
- Sentiment and Classification
- Classification and Question Answering
- Machine Translation and Generation
- Integration and Interoperability (ONNX, OpenVINO)
- Pre-trained Models (36000+ in +200 languages)
- Multi-lingual Support