Showing 26 open source projects for "data extraction"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 1
    audioFlux

    audioFlux

    A library for audio and music analysis, feature extraction

    A library for audio and music analysis, and feature extraction. Can be used for deep learning, pattern recognition, signal processing, bioinformatics, statistics, finance, etc. audioflux is a deep learning tool library for audio and music analysis, feature extraction. It supports dozens of time-frequency analysis transformation methods and hundreds of corresponding time-domain and frequency-domain feature combinations. It can be provided to deep learning networks for training and is used to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    RAGFlow

    RAGFlow

    RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine

    RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. It offers a streamlined RAG workflow for businesses of any scale, combining LLM (Large Language Models) to provide truthful question-answering capabilities, backed by well-founded citations from various complex formatted data.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 3
    docext

    docext

    An on-premises, OCR-free unstructured data extraction

    docext is a document intelligence toolkit that uses vision-language models to extract structured information from documents such as PDFs, forms, and scanned images. The system is designed to operate entirely on-premises, allowing organizations to process sensitive documents without relying on external cloud services. Unlike traditional document processing pipelines that rely heavily on optical character recognition, docext leverages multimodal AI models capable of understanding both visual...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    River ML

    River ML

    Online machine learning in Python

    River is a Python library for online machine learning. It aims to be the most user-friendly library for doing machine learning on streaming data. River is the result of a merger between creme and scikit-multiflow.
    Downloads: 1 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 5
    DeepCamera

    DeepCamera

    Open-Source AI Camera. Empower any camera/CCTV

    ...SharpAI yolov7_reid is an open-source Python application that leverages AI technologies to detect intruders with traditional surveillance cameras. The source code is here It leverages Yolov7 as a person detector, FastReID for person feature extraction, Milvus the local vector database for self-supervised learning to identify unseen persons, Labelstudio to host images locally and for further usage such as label data and train your own classifier. It also integrates with Home-Assistant to empower smart homes with AI technology.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 6
    Smile

    Smile

    Statistical machine intelligence and learning engine

    Smile is a fast and comprehensive machine learning engine. With advanced data structures and algorithms, Smile delivers the state-of-art performance. Compared to this third-party benchmark, Smile outperforms R, Python, Spark, H2O, xgboost significantly. Smile is a couple of times faster than the closest competitor. The memory usage is also very efficient. If we can train advanced machine learning models on a PC, why buy a cluster? Write applications quickly in Java, Scala, or any JVM...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    Adversarial Robustness Toolbox

    Adversarial Robustness Toolbox

    Adversarial Robustness Toolbox (ART) - Python Library for ML security

    ...ART provides tools that enable developers and researchers to evaluate, defend, certify and verify Machine Learning models and applications against the adversarial threats of Evasion, Poisoning, Extraction, and Inference. ART supports all popular machine learning frameworks (TensorFlow, Keras, PyTorch, MXNet, sci-kit-learn, XGBoost, LightGBM, CatBoost, GPy, etc.), all data types (images, tables, audio, video, etc.) and machine learning tasks (classification, object detection, generation, certification, etc.).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    FlexLLMGen

    FlexLLMGen

    Running large language models on a single GPU

    FlexLLMGen is an open-source inference engine designed to run large language models efficiently on limited hardware resources such as a single GPU. The system focuses on high-throughput generation workloads where large batches of text must be processed quickly, such as large-scale data extraction or document analysis tasks. Instead of requiring expensive multi-GPU systems, the framework uses techniques such as memory offloading, compression, and optimized batching to run large models on commodity hardware. The architecture distributes computation and memory usage across the GPU, CPU, and disk in order to maximize the number of tokens processed during inference. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Vearch

    Vearch

    A distributed system for embedding-based vector retrieval

    ...Through the module of the plugin, a complete default visual search system can be deployed just with one click. Otherwise, you can easily customize your own image, video, or text feature extraction algorithm plugin. This GIF provides a clear demonstration of the project vearch usage and its internal structure. The use of vearch is mainly divided into three steps. Firstly, create DB and Space, then import your data, and finally, you can search on your own dataset.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    Weaviate

    Weaviate

    Weaviate is a cloud-native, modular, real-time vector search engine

    Weaviate in a nutshell: Weaviate is a vector search engine and vector database. Weaviate uses machine learning to vectorize and store data, and to find answers to natural language queries. With Weaviate you can also bring your custom ML models to production scale. Weaviate in detail: Weaviate is a low-latency vector search engine with out-of-the-box support for different media types (text, images, etc.). It offers Semantic Search, Question-Answer-Extraction, Classification, Customizable Models (PyTorch/TensorFlow/Keras), and more. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    GeoDMA

    GeoDMA

    Geographic feature extraction and data mining

    GeoDMA is a plugin for TerraView software, used for geographical data mining. With a single image, the user can perform segmentation, attributes extraction, normalization and classification.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Datapipe

    Datapipe

    Real-time, incremental ETL library for ML with record-level depend

    Datapipe is a real-time, incremental ETL library for Python with record-level dependency tracking. Datapipe is designed to streamline the creation of data processing pipelines. It excels in scenarios where data is continuously changing, requiring pipelines to adapt and process only the modified data efficiently. This library tracks dependencies for each record in the pipeline, ensuring minimal and efficient data processing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    PoseidonQ  - AI/ML Based QSAR Modeling

    PoseidonQ - AI/ML Based QSAR Modeling

    ML based QSAR Modelling And Translation of Model to Deployable WebApps

    - This Software was made with an intention to make QSAR/QSPR development more efficient and reproducible. - Published in ACS, Journal of Chemical Information and Modeling . Link : https://pubs.acs.org/doi/10.1021/acs.jcim.4c02372 - Simple to use and no compromise on essential features necessary to make reliable QSAR models. - From Generating Reliable ML Based QSAR Models to Developing Your Own QSAR WebApp. For any feedback or queries, contact kabeermuzammil614@gmail.com - Available on...
    Downloads: 28 This Week
    Last Update:
    See Project
  • 14
    pattern_classification

    pattern_classification

    A collection of tutorials and examples for solving machine learning

    ...The project aims to help learners understand the process of building predictive models by presenting structured explanations and practical examples. It includes notebooks and guides that demonstrate data preprocessing, feature extraction, model training, and evaluation techniques used in machine learning workflows. The repository also covers algorithms such as Bayesian classification, logistic regression, neural networks, clustering methods, and ensemble models. In addition to algorithm tutorials, the project contains supplementary resources such as dataset collections, visualization examples, and links to recommended books and talks. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    MMOCR

    MMOCR

    OpenMMLab Text Detection, Recognition and Understanding Toolbox

    MMOCR is an open-source toolbox based on PyTorch and mmdetection for text detection, text recognition, and the corresponding downstream tasks including key information extraction. It is part of the OpenMMLab project. The toolbox supports not only text detection and text recognition, but also their downstream tasks such as key information extraction. The toolbox supports a wide variety of state-of-the-art models for text detection, text recognition and key information extraction. The modular design of MMOCR enables users to define their own optimizers, data preprocessors, and model components such as backbones, necks and heads as well as losses. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    LSTMs for Human Activity Recognition

    LSTMs for Human Activity Recognition

    Human Activity Recognition example using TensorFlow on smartphone

    LSTM-Human-Activity-Recognition is a machine learning project that demonstrates how recurrent neural networks can be used to recognize human activities from sensor data. The repository implements a deep learning model based on Long Short-Term Memory (LSTM) networks to classify physical activities using time-series data collected from wearable sensors. The project uses the well-known Human Activity Recognition dataset derived from smartphone accelerometer and gyroscope signals. Through the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Pattern

    Pattern

    Web mining module for Python, with tools for scraping

    Pattern is an open-source Python library that provides tools for web mining, natural language processing, machine learning, and network analysis. The project integrates multiple capabilities into a single framework that allows developers to collect, process, and analyze textual data from the web. It includes modules for web scraping and crawling that can retrieve information from sources such as social media platforms, search engines, and online knowledge bases. In addition to data mining features, the library offers natural language processing functionality including part-of-speech tagging, sentiment analysis, and n-gram extraction.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Texthero

    Texthero

    Text preprocessing, representation and visualization from zero to hero

    Texthero is a python package to work with text data efficiently. It empowers NLP developers with a tool to quickly understand any text-based dataset and it provides a solid pipeline to clean and represent text data, from zero to hero.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    MachineLearningStocks

    MachineLearningStocks

    Using python and scikit-learn to make stock predictions

    MachineLearningStocks is a Python-based template project that demonstrates how machine learning can be applied to predicting stock market performance. The project provides a structured workflow that collects financial data, processes features, trains predictive models, and evaluates trading strategies. Using libraries such as pandas and scikit-learn, the repository shows how historical financial indicators can be transformed into machine learning features. The model attempts to predict...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Delta ML

    Delta ML

    Deep learning based natural language and speech processing platform

    ...It helps you to train, develop, and deploy NLP and/or speech models. Use configuration files to easily tune parameters and network structures. What you see in training is what you get in serving: all data processing and features extraction are integrated into a model graph. Text classification, named entity recognition, question and answering, text summarization, etc. Uniform I/O interfaces and no changes for new models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Phenalysis

    Phenalysis

    Analyze agronomic plant research plots in aerial orthomosaic images.

    A graphical user interface to import, analyze and export plots from orthomosaic images of agronomic trials. Please cite the following reference in your work if you use Phenalysis: Khan Z and Miklavcic SJ (2019) An Automatic Field Plot Extraction Method From Aerial Orthomosaic Images. Front. Plant Sci. 10:683. doi: https://doi.org/10.3389/fpls.2019.00683 This tool is being developed through the sponsorship of the Australian Research Council's Industrial Transformation Research Hub on...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    RoboSat

    RoboSat

    Semantic segmentation on aerial and satellite imagery

    RoboSat is an end-to-end pipeline written in Python 3 for feature extraction from aerial and satellite imagery. Features can be anything visually distinguishable in the imagery for example: buildings, parking lots, roads, or cars.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Convolutional Recurrent Neural Network

    Convolutional Recurrent Neural Network

    Convolutional Recurrent Neural Network (CRNN) for image-based sequence

    Convolutional Recurrent Neural Network provides an implementation of the Convolutional Recurrent Neural Network (CRNN) architecture, a deep learning model designed for image-based sequence recognition tasks such as optical character recognition and scene text recognition. The architecture combines convolutional neural networks for extracting visual features from images with recurrent neural networks that model sequential dependencies in the extracted features. This hybrid approach allows the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    android-activity-miner

    android-activity-miner

    Activity-Miner for Android

    A mobile application to create accelerometer based activity recognition models directly on the phone. The configuration of the segmentation and feature extraction process chain requires expert knownledge. The prototype was developed in 2012 in a bachelor thesis at the University of Kassel and was optimized and enhanced for an experiment in 2015.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    Accelerated Feature Extraction Tool

    A fast GPU accelerated feature extraction software for speech analysis

    A fast feature extraction software tool for speech analysis and processing. It incorporates standard MFCC, PLP, and TRAPS features. The tool is a specially designed to process very large audio data sets. It uses GPU acceleration if compatible GPU available (CUDA as weel as OpenCL, NVIDIA, AMD, and Intel GPUs are supported). CPU SSE intrinsic instruction set is used in cases where no compatible GPU present.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB