Showing 15 open source projects for "data extraction"

View related business solutions
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 1
    RAGFlow

    RAGFlow

    RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine

    RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. It offers a streamlined RAG workflow for businesses of any scale, combining LLM (Large Language Models) to provide truthful question-answering capabilities, backed by well-founded citations from various complex formatted data.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 2
    docext

    docext

    An on-premises, OCR-free unstructured data extraction

    docext is a document intelligence toolkit that uses vision-language models to extract structured information from documents such as PDFs, forms, and scanned images. The system is designed to operate entirely on-premises, allowing organizations to process sensitive documents without relying on external cloud services. Unlike traditional document processing pipelines that rely heavily on optical character recognition, docext leverages multimodal AI models capable of understanding both visual...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    River ML

    River ML

    Online machine learning in Python

    River is a Python library for online machine learning. It aims to be the most user-friendly library for doing machine learning on streaming data. River is the result of a merger between creme and scikit-multiflow.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Adversarial Robustness Toolbox

    Adversarial Robustness Toolbox

    Adversarial Robustness Toolbox (ART) - Python Library for ML security

    ...ART provides tools that enable developers and researchers to evaluate, defend, certify and verify Machine Learning models and applications against the adversarial threats of Evasion, Poisoning, Extraction, and Inference. ART supports all popular machine learning frameworks (TensorFlow, Keras, PyTorch, MXNet, sci-kit-learn, XGBoost, LightGBM, CatBoost, GPy, etc.), all data types (images, tables, audio, video, etc.) and machine learning tasks (classification, object detection, generation, certification, etc.).
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    FlexLLMGen

    FlexLLMGen

    Running large language models on a single GPU

    FlexLLMGen is an open-source inference engine designed to run large language models efficiently on limited hardware resources such as a single GPU. The system focuses on high-throughput generation workloads where large batches of text must be processed quickly, such as large-scale data extraction or document analysis tasks. Instead of requiring expensive multi-GPU systems, the framework uses techniques such as memory offloading, compression, and optimized batching to run large models on commodity hardware. The architecture distributes computation and memory usage across the GPU, CPU, and disk in order to maximize the number of tokens processed during inference. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Datapipe

    Datapipe

    Real-time, incremental ETL library for ML with record-level depend

    Datapipe is a real-time, incremental ETL library for Python with record-level dependency tracking. Datapipe is designed to streamline the creation of data processing pipelines. It excels in scenarios where data is continuously changing, requiring pipelines to adapt and process only the modified data efficiently. This library tracks dependencies for each record in the pipeline, ensuring minimal and efficient data processing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    PoseidonQ  - AI/ML Based QSAR Modeling

    PoseidonQ - AI/ML Based QSAR Modeling

    ML based QSAR Modelling And Translation of Model to Deployable WebApps

    - This Software was made with an intention to make QSAR/QSPR development more efficient and reproducible. - Published in ACS, Journal of Chemical Information and Modeling . Link : https://pubs.acs.org/doi/10.1021/acs.jcim.4c02372 - Simple to use and no compromise on essential features necessary to make reliable QSAR models. - From Generating Reliable ML Based QSAR Models to Developing Your Own QSAR WebApp. For any feedback or queries, contact kabeermuzammil614@gmail.com - Available on...
    Downloads: 67 This Week
    Last Update:
    See Project
  • 8
    MMOCR

    MMOCR

    OpenMMLab Text Detection, Recognition and Understanding Toolbox

    MMOCR is an open-source toolbox based on PyTorch and mmdetection for text detection, text recognition, and the corresponding downstream tasks including key information extraction. It is part of the OpenMMLab project. The toolbox supports not only text detection and text recognition, but also their downstream tasks such as key information extraction. The toolbox supports a wide variety of state-of-the-art models for text detection, text recognition and key information extraction. The modular design of MMOCR enables users to define their own optimizers, data preprocessors, and model components such as backbones, necks and heads as well as losses. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    LSTMs for Human Activity Recognition

    LSTMs for Human Activity Recognition

    Human Activity Recognition example using TensorFlow on smartphone

    LSTM-Human-Activity-Recognition is a machine learning project that demonstrates how recurrent neural networks can be used to recognize human activities from sensor data. The repository implements a deep learning model based on Long Short-Term Memory (LSTM) networks to classify physical activities using time-series data collected from wearable sensors. The project uses the well-known Human Activity Recognition dataset derived from smartphone accelerometer and gyroscope signals. Through the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 10
    Pattern

    Pattern

    Web mining module for Python, with tools for scraping

    Pattern is an open-source Python library that provides tools for web mining, natural language processing, machine learning, and network analysis. The project integrates multiple capabilities into a single framework that allows developers to collect, process, and analyze textual data from the web. It includes modules for web scraping and crawling that can retrieve information from sources such as social media platforms, search engines, and online knowledge bases. In addition to data mining features, the library offers natural language processing functionality including part-of-speech tagging, sentiment analysis, and n-gram extraction.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    Texthero

    Texthero

    Text preprocessing, representation and visualization from zero to hero

    Texthero is a python package to work with text data efficiently. It empowers NLP developers with a tool to quickly understand any text-based dataset and it provides a solid pipeline to clean and represent text data, from zero to hero.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    MachineLearningStocks

    MachineLearningStocks

    Using python and scikit-learn to make stock predictions

    MachineLearningStocks is a Python-based template project that demonstrates how machine learning can be applied to predicting stock market performance. The project provides a structured workflow that collects financial data, processes features, trains predictive models, and evaluates trading strategies. Using libraries such as pandas and scikit-learn, the repository shows how historical financial indicators can be transformed into machine learning features. The model attempts to predict...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Delta ML

    Delta ML

    Deep learning based natural language and speech processing platform

    ...It helps you to train, develop, and deploy NLP and/or speech models. Use configuration files to easily tune parameters and network structures. What you see in training is what you get in serving: all data processing and features extraction are integrated into a model graph. Text classification, named entity recognition, question and answering, text summarization, etc. Uniform I/O interfaces and no changes for new models.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    RoboSat

    RoboSat

    Semantic segmentation on aerial and satellite imagery

    RoboSat is an end-to-end pipeline written in Python 3 for feature extraction from aerial and satellite imagery. Features can be anything visually distinguishable in the imagery for example: buildings, parking lots, roads, or cars.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    A Python function library to extract EEG feature from EEG time series in standard Python and numpy data structure. Features include classical spectral analysis, entropies, fractal dimensions, DFA, inter-channel synchrony and order, etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB