Showing 49 open source projects for "extraction"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 1
    GROBID

    GROBID

    A machine learning software for extracting information

    ...In 2011 the tool has been made available in open source. Work on GROBID has been steady as a side project since the beginning and is expected to continue as such. Header extraction and parsing from article in PDF format. The extraction here covers the usual bibliographical information (e.g. title, abstract, authors, affiliations, keywords, etc.). References extraction and parsing from articles in PDF format, around .87 F1-score against on an independent PubMed Central set of 1943 PDF containing 90,125 references, and around .89 on a similar bioRxiv set of 2000 PDF (using the Deep Learning citation model). ...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 2
    audioFlux

    audioFlux

    A library for audio and music analysis, feature extraction

    A library for audio and music analysis, and feature extraction. Can be used for deep learning, pattern recognition, signal processing, bioinformatics, statistics, finance, etc. audioflux is a deep learning tool library for audio and music analysis, feature extraction. It supports dozens of time-frequency analysis transformation methods and hundreds of corresponding time-domain and frequency-domain feature combinations.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 3
    pyAudioAnalysis

    pyAudioAnalysis

    Python Audio Analysis Library: Feature Extraction, Classification

    ...The project provides a collection of tools that allow developers to extract meaningful features from audio files and use those features for classification, segmentation, and analysis. The library supports multiple audio processing workflows, including feature extraction from raw audio signals, training of machine learning models, and automatic audio segmentation. It also includes utilities for visualizing audio features and analyzing patterns within sound recordings, which can be useful in applications such as speech recognition, music classification, and acoustic event detection. Because the library integrates machine learning algorithms with signal processing tools, it enables researchers to develop complete audio analysis pipelines using a single framework.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    docext

    docext

    An on-premises, OCR-free unstructured data extraction

    docext is a document intelligence toolkit that uses vision-language models to extract structured information from documents such as PDFs, forms, and scanned images. The system is designed to operate entirely on-premises, allowing organizations to process sensitive documents without relying on external cloud services. Unlike traditional document processing pipelines that rely heavily on optical character recognition, docext leverages multimodal AI models capable of understanding both visual...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Access competitive interest rates on your digital assets.

    Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 5
    RAGFlow

    RAGFlow

    RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine

    RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. It offers a streamlined RAG workflow for businesses of any scale, combining LLM (Large Language Models) to provide truthful question-answering capabilities, backed by well-founded citations from various complex formatted data.
    Downloads: 16 This Week
    Last Update:
    See Project
  • 6
    Adversarial Robustness Toolbox

    Adversarial Robustness Toolbox

    Adversarial Robustness Toolbox (ART) - Python Library for ML security

    ...ART provides tools that enable developers and researchers to evaluate, defend, certify and verify Machine Learning models and applications against the adversarial threats of Evasion, Poisoning, Extraction, and Inference. ART supports all popular machine learning frameworks (TensorFlow, Keras, PyTorch, MXNet, sci-kit-learn, XGBoost, LightGBM, CatBoost, GPy, etc.), all data types (images, tables, audio, video, etc.) and machine learning tasks (classification, object detection, generation, certification, etc.).
    Downloads: 9 This Week
    Last Update:
    See Project
  • 7
    spaCy models

    spaCy models

    Models for the spaCy Natural Language Processing (NLP) library

    ...The library respects your time, and tries to avoid wasting it. It's easy to install, and its API is simple and productive. spaCy excels at large-scale information extraction tasks. It's written from the ground up in carefully memory-managed Cython. If your application needs to process entire web dumps, spaCy is the library you want to be using. Since its release in 2015, spaCy has become an industry standard with a huge ecosystem. Choose from a variety of plugins, integrate with your machine learning stack and build custom components and workflows.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 8
    River ML

    River ML

    Online machine learning in Python

    River is a Python library for online machine learning. It aims to be the most user-friendly library for doing machine learning on streaming data. River is the result of a merger between creme and scikit-multiflow.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    DeepCamera

    DeepCamera

    Open-Source AI Camera. Empower any camera/CCTV

    ...SharpAI yolov7_reid is an open-source Python application that leverages AI technologies to detect intruders with traditional surveillance cameras. The source code is here It leverages Yolov7 as a person detector, FastReID for person feature extraction, Milvus the local vector database for self-supervised learning to identify unseen persons, Labelstudio to host images locally and for further usage such as label data and train your own classifier. It also integrates with Home-Assistant to empower smart homes with AI technology.
    Downloads: 13 This Week
    Last Update:
    See Project
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    Transformers

    Transformers

    State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX

    ...Using pre-trained models can reduce your compute costs, carbon footprint, and save you the time and resources required to train a model from scratch. These models support common tasks in different modalities. Text, for tasks like text classification, information extraction, question answering, summarization, translation, text generation, in over 100 languages. Images, for tasks like image classification, object detection, and segmentation. Audio, for tasks like speech recognition and audio classification. Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets and then share them with the community on our model hub. ...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 11
    The SpeechBrain Toolkit

    The SpeechBrain Toolkit

    A PyTorch-based Speech Toolkit

    ...Separation methods such as Conv-TasNet, DualPath RNN, and SepFormer are implemented as well. SpeechBrain provides efficient and GPU-friendly speech augmentation pipelines and acoustic features extraction.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 12
    eos

    eos

    A lightweight 3D Morphable Face Model library in modern C++

    eos is a lightweight 3D Morphable Face Model fitting library that provides basic functionality to use face models, as well as camera and shape fitting functionality. It's written in modern C++11/14. MorphableModel and PcaModel classes to represent 3DMMs, with basic operations like draw_sample(). Supports the Surrey Face Model (SFM), 4D Face Model (4DFM), Basel Face Model (BFM) 2009 and 2017, and the Liverpool-York Head Model (LYHM) out-of-the-box.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 13
    Vearch

    Vearch

    A distributed system for embedding-based vector retrieval

    ...Through the module of the plugin, a complete default visual search system can be deployed just with one click. Otherwise, you can easily customize your own image, video, or text feature extraction algorithm plugin. This GIF provides a clear demonstration of the project vearch usage and its internal structure. The use of vearch is mainly divided into three steps. Firstly, create DB and Space, then import your data, and finally, you can search on your own dataset.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 14
    ktrain

    ktrain

    ktrain is a Python library that makes deep learning AI more accessible

    ktrain is a Python library that makes deep learning and AI more accessible and easier to apply. ktrain is a lightweight wrapper for the deep learning library TensorFlow Keras (and other libraries) to help build, train, and deploy neural networks and other machine learning models. Inspired by ML framework extensions like fastai and ludwig, ktrain is designed to make deep learning and AI more accessible and easier to apply for both newcomers and experienced practitioners. With only a few lines...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 15
    Smile

    Smile

    Statistical machine intelligence and learning engine

    Smile is a fast and comprehensive machine learning engine. With advanced data structures and algorithms, Smile delivers the state-of-art performance. Compared to this third-party benchmark, Smile outperforms R, Python, Spark, H2O, xgboost significantly. Smile is a couple of times faster than the closest competitor. The memory usage is also very efficient. If we can train advanced machine learning models on a PC, why buy a cluster? Write applications quickly in Java, Scala, or any JVM...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 16
    Simd Library

    Simd Library

    C++ image processing and machine learning library with using of SIMD

    The Simd Library is a free open-source image processing and machine learning library, designed for C and C++ programmers. It provides many useful high-performance algorithms for image processing such as pixel format conversion, image scaling and filtration, extraction of statistical information from images, motion detection, object detection and classification, neural networks. The algorithms are optimized with using of different SIMD CPU extensions. In particular, the library supports the following CPU extensions: SSE, AVX, AVX-512, and AMX for x86/x64, and NEON for ARM. The Simd Library has C API and also contains useful C++ classes and functions to facilitate access to C API. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Weaviate

    Weaviate

    Weaviate is a cloud-native, modular, real-time vector search engine

    ...Weaviate in detail: Weaviate is a low-latency vector search engine with out-of-the-box support for different media types (text, images, etc.). It offers Semantic Search, Question-Answer-Extraction, Classification, Customizable Models (PyTorch/TensorFlow/Keras), and more. Built from scratch in Go, Weaviate stores both objects and vectors, allowing for combining vector search with structured filtering with the fault-tolerance of a cloud-native database, all accessible through GraphQL, REST, and various language clients.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 18
    FlexLLMGen

    FlexLLMGen

    Running large language models on a single GPU

    FlexLLMGen is an open-source inference engine designed to run large language models efficiently on limited hardware resources such as a single GPU. The system focuses on high-throughput generation workloads where large batches of text must be processed quickly, such as large-scale data extraction or document analysis tasks. Instead of requiring expensive multi-GPU systems, the framework uses techniques such as memory offloading, compression, and optimized batching to run large models on commodity hardware. The architecture distributes computation and memory usage across the GPU, CPU, and disk in order to maximize the number of tokens processed during inference. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    tika-python

    tika-python

    Python binding to the Apache Tika™ REST services

    A Python port of the Apache Tika library that makes Tika available using the Tika REST Server. This makes Apache Tika available as a Python library, installable via Setuptools, Pip and easy to install. To use this library, you need to have Java 7+ installed on your system as tika-python starts up the Tika REST server in the background. To get this working in a disconnected environment, download a tika server file (both tika-server.jar and tika-server.jar.md5, which can be found here) and set...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    GeoDMA

    GeoDMA

    Geographic feature extraction and data mining

    GeoDMA is a plugin for TerraView software, used for geographical data mining. With a single image, the user can perform segmentation, attributes extraction, normalization and classification.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    Datapipe

    Datapipe

    Real-time, incremental ETL library for ML with record-level depend

    Datapipe is a real-time, incremental ETL library for Python with record-level dependency tracking. Datapipe is designed to streamline the creation of data processing pipelines. It excels in scenarios where data is continuously changing, requiring pipelines to adapt and process only the modified data efficiently. This library tracks dependencies for each record in the pipeline, ensuring minimal and efficient data processing.
    Downloads: 128 This Week
    Last Update:
    See Project
  • 22
    PoseidonQ  - AI/ML Based QSAR Modeling

    PoseidonQ - AI/ML Based QSAR Modeling

    ML based QSAR Modelling And Translation of Model to Deployable WebApps

    - This Software was made with an intention to make QSAR/QSPR development more efficient and reproducible. - Published in ACS, Journal of Chemical Information and Modeling . Link : https://pubs.acs.org/doi/10.1021/acs.jcim.4c02372 - Simple to use and no compromise on essential features necessary to make reliable QSAR models. - From Generating Reliable ML Based QSAR Models to Developing Your Own QSAR WebApp. For any feedback or queries, contact kabeermuzammil614@gmail.com - Available on...
    Downloads: 34 This Week
    Last Update:
    See Project
  • 23
    pattern_classification

    pattern_classification

    A collection of tutorials and examples for solving machine learning

    ...The project aims to help learners understand the process of building predictive models by presenting structured explanations and practical examples. It includes notebooks and guides that demonstrate data preprocessing, feature extraction, model training, and evaluation techniques used in machine learning workflows. The repository also covers algorithms such as Bayesian classification, logistic regression, neural networks, clustering methods, and ensemble models. In addition to algorithm tutorials, the project contains supplementary resources such as dataset collections, visualization examples, and links to recommended books and talks. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Promptify

    Promptify

    se GPT or other prompt based models to get structured output

    ...Instead of manually crafting prompts for each task, Promptify introduces a unified architecture that combines prompt templates, language model interfaces, and processing pipelines into a single framework. This approach allows developers to perform tasks such as text classification, named entity recognition, question answering, and information extraction using consistent prompt templates. The library supports integration with multiple large language model providers, enabling users to experiment with various models without changing their overall workflow.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    MMOCR

    MMOCR

    OpenMMLab Text Detection, Recognition and Understanding Toolbox

    MMOCR is an open-source toolbox based on PyTorch and mmdetection for text detection, text recognition, and the corresponding downstream tasks including key information extraction. It is part of the OpenMMLab project. The toolbox supports not only text detection and text recognition, but also their downstream tasks such as key information extraction. The toolbox supports a wide variety of state-of-the-art models for text detection, text recognition and key information extraction. The modular design of MMOCR enables users to define their own optimizers, data preprocessors, and model components such as backbones, necks and heads as well as losses. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB