Showing 219 open source projects for "extraction"

View related business solutions
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • Train ML Models With SQL You Already Know Icon
    Train ML Models With SQL You Already Know

    BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

    Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
    Try Free
  • 1
    Promptify

    Promptify

    se GPT or other prompt based models to get structured output

    ...Instead of manually crafting prompts for each task, Promptify introduces a unified architecture that combines prompt templates, language model interfaces, and processing pipelines into a single framework. This approach allows developers to perform tasks such as text classification, named entity recognition, question answering, and information extraction using consistent prompt templates. The library supports integration with multiple large language model providers, enabling users to experiment with various models without changing their overall workflow.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    MMOCR

    MMOCR

    OpenMMLab Text Detection, Recognition and Understanding Toolbox

    MMOCR is an open-source toolbox based on PyTorch and mmdetection for text detection, text recognition, and the corresponding downstream tasks including key information extraction. It is part of the OpenMMLab project. The toolbox supports not only text detection and text recognition, but also their downstream tasks such as key information extraction. The toolbox supports a wide variety of state-of-the-art models for text detection, text recognition and key information extraction. The modular design of MMOCR enables users to define their own optimizers, data preprocessors, and model components such as backbones, necks and heads as well as losses. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    PromethAI

    PromethAI

    Open-source framework that gives you AI Agents

    PromethAI-Backend is a backend framework for AI-driven automation and knowledge extraction. It is designed to integrate with large language models (LLMs) to provide AI-enhanced workflows, including content generation, summarization, and data analysis.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    SageMaker Experiments Python SDK

    SageMaker Experiments Python SDK

    Experiment tracking and metric logging for Amazon SageMaker notebooks

    ...There is no relationship between Trial Components such as ordering. Trial Component: A description of a single step in a machine learning workflow. For example data cleaning, feature extraction, model training, model evaluation, etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 5
    PIFuHD

    PIFuHD

    High-Resolution 3D Human Digitization from A Single Image

    PIFuHD (Pixel-Aligned Implicit Function for 3D human reconstruction at high resolution) is a method and codebase to reconstruct high-fidelity 3D human meshes from a single image. It extends prior PIFu work by increasing resolution and detail, enabling fine geometry in cloth folds, hair, and subtle surface features. The method operates by learning an implicit occupancy / surface function conditioned on the image and camera projection; at inference time it queries dense points to reconstruct a...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 6
    GXSM

    GXSM

    Scanning Probe Microscopy Controller and Data Visualization Software

    GXSM -- Gnome X Scanning Microscopy: A multi-channel image and vector-probe data acquisition and visualization system designed for SPM techniques (STM,AFM..), but also SPA-LEED/LEED/LEEM data analysis. A plug-in interface allows any user add-on data-processing and special hardware and instrument support. Latest: NC-AFM and related explorative methods as SQDM can be configured. High-Speed external PAC-PLL hardware option with digital DSP link. Based on several hardware options it supports...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 7
    Pentest-Tools

    Pentest-Tools

    A collection of custom security tools for quick needs.

    Pentest-Tools is a collection of penetration testing scripts and utilities designed to help security professionals and ethical hackers perform vulnerability assessments. It includes a wide range of tools for tasks like web scraping, reconnaissance, data extraction, and network analysis. The suite is modular, allowing users to choose the tools that best fit their specific pentesting needs, from web application analysis to network penetration testing.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 8
    LSTMs for Human Activity Recognition

    LSTMs for Human Activity Recognition

    Human Activity Recognition example using TensorFlow on smartphone

    LSTM-Human-Activity-Recognition is a machine learning project that demonstrates how recurrent neural networks can be used to recognize human activities from sensor data. The repository implements a deep learning model based on Long Short-Term Memory (LSTM) networks to classify physical activities using time-series data collected from wearable sensors. The project uses the well-known Human Activity Recognition dataset derived from smartphone accelerometer and gyroscope signals. Through the...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    hui

    hui

    hewies user interface - 3D scientific visualisation tool

    Python project with goal to provide FOSS library to extract, analyse and visualise data in a 3D fashion. The instance will connect to a data source, ods sheet, csv, sql DB, pyodbc the instance will analyse and/or transform the data to be presented to the visualisation functionality the instance will visualise the data in a 3D fashion, likely using third party FOSS
    Downloads: 0 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 10
    Pattern

    Pattern

    Web mining module for Python, with tools for scraping

    ...In addition to data mining features, the library offers natural language processing functionality including part-of-speech tagging, sentiment analysis, and n-gram extraction. The framework also includes machine learning algorithms that support classification, clustering, and vector space modeling for text analysis tasks. Another component of the library provides tools for analyzing and visualizing networks, making it useful for studying relationships between entities in large datasets.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    mlscraper

    mlscraper

    ML-based HTML scraper that learns extraction rules from examples

    ...This approach simplifies web scraping tasks by shifting the focus from rule-writing to example-based training. Internally, the project processes HTML documents, identifies relevant elements in the DOM, and builds extraction logic based on statistical or heuristic analysis of the training samples. The result is a developer-oriented tool that aims to automate common scraping workflows.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 12
    Tensorflow Transformers

    Tensorflow Transformers

    State of the art faster Transformer with Tensorflow 2.0

    Imagine auto-regressive generation to be 90x faster. tf-transformers (Tensorflow Transformers) is designed to harness the full power of Tensorflow 2, designed specifically for Transformer based architecture. These models can be applied on text, for tasks like text classification, information extraction, question answering, summarization, translation, text generation, in over 100 languages. Images, for tasks like image classification, object detection, and segmentation. Audio, for tasks like speech recognition and audio classification. Faster AutoReggressive Decoding, TFlite support, creating TFRecords is simple. Auto-Batching tf.data.dataset or tf.ragged tensors. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Scylla

    Scylla

    Intelligent proxy pool for collecting and managing public proxies

    Scylla is an open source proxy pool system designed to collect, validate, and manage large numbers of public proxy servers for use in web scraping and data extraction workflows. It automatically crawls the internet to discover proxy IP addresses and evaluates their availability and reliability before adding them to a usable pool. It includes a JSON API that allows developers and applications to retrieve proxy information programmatically, making it easier to integrate proxy rotation into scraping tools or automation scripts. ...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 14
    VSGAN

    VSGAN

    VapourSynth Single Image Super-Resolution Generative Adversarial

    ...The Network will be applied in quadrants of the image to reduce up-front VRAM usage. You can use any RGB video input, including float32 (e.g., RGBS) inputs. Using VapourSynth you can pass a Video directly to VSGAN, without any frame extraction needed. Any edit you make in the VapourSynth script with or without VSGAN can be re-used for any other video. VSGAN is released under the MIT License, ensuring it will stay free, with the ability to be used commercially.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Hugging Face Transformer

    Hugging Face Transformer

    CPU/GPU inference server for Hugging Face transformer models

    Optimize and deploy in production Hugging Face Transformer models in a single command line. At Lefebvre Dalloz we run in-production semantic search engines in the legal domain, in the non-marketing language it's a re-ranker, and we based ours on Transformer. In that setup, latency is key to providing a good user experience, and relevancy inference is done online for hundreds of snippets per user query. Most tutorials on Transformer deployment in production are built over Pytorch and FastAPI....
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    KoNLPy

    KoNLPy

    Python package for Korean natural language processing

    KoNLPy is a natural language processing (NLP) library for the Korean language, offering tokenization, morphological analysis, and named entity recognition.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 17
    Bandwidth

    Bandwidth

    Monitor monthly internet Transmit and Receive bandwidth usage - Linux

    Keep track of bandwidth usage Allows Linux users to monitor their Transmit and Receive bandwidth usage with a simple text based menu, via your browser or from the command line. Some of us are unable to get "unlimited", "all that you can eat", internet packages and are left trying to stay within our Download/Upload limits, whilst paying dearly for the "privilege". Equally, we didn't have the foresight or the money to purchase an snmp managed router, so we are unable to strip the traffic...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    aseryla

    aseryla

    Aseryla code repositories

    This project describes a model of how the semantic human memory represents the information relevant to the objects of the world in text format. It provides a system and a GUI application capable of extracting and managing concepts and relations from English texts. https://aseryla2.sourceforge.io/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Texthero

    Texthero

    Text preprocessing, representation and visualization from zero to hero

    Texthero is a python package to work with text data efficiently. It empowers NLP developers with a tool to quickly understand any text-based dataset and it provides a solid pipeline to clean and represent text data, from zero to hero.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Chatette

    Chatette

    A powerful dataset generator for Rasa NLU, inspired by Chatito

    Chatette is a Python-based tool for generating training datasets for Natural Language Understanding (NLU) models, particularly those used with Rasa NLU. It employs a domain-specific language to define templates, enabling the creation of diverse and extensive training examples for intent classification and entity recognition.​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    MachineLearningStocks

    MachineLearningStocks

    Using python and scikit-learn to make stock predictions

    MachineLearningStocks is a Python-based template project that demonstrates how machine learning can be applied to predicting stock market performance. The project provides a structured workflow that collects financial data, processes features, trains predictive models, and evaluates trading strategies. Using libraries such as pandas and scikit-learn, the repository shows how historical financial indicators can be transformed into machine learning features. The model attempts to predict...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    lxspider

    lxspider

    Educational Python web scraping case collection for many sites

    lxSpider is a collection of web scraping examples designed primarily for learning and experimentation with data extraction techniques. It gathers numerous crawler implementations that demonstrate how to collect data from a wide range of websites and online services. It focuses heavily on practical cases that illustrate how different platforms handle requests, authentication parameters, and anti-scraping protections. lxSpider includes examples targeting areas such as e-commerce platforms, social media services, content sites, research databases, and information portals. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Synonyms

    Synonyms

    Chinese synonyms, chat robot, intelligent question and answer toolkit

    ...Better Chinese synonyms, chatbot, intelligent question and answer toolkit. synonymsCan be used for many tasks in natural language understanding, text alignment, recommendation algorithms, similarity calculation, semantic shifting, keyword extraction, concept extraction, automatic summarization, search engines, etc. Print synonyms in a friendly way for easy debugging. "Synonyms Cilin" was compiled by Mei Jiaju and others in 1983, and now widely used is "Synonyms Cilin Extended Edition" maintained by the Social Computing and Information Retrieval Research Center of Harbin Institute of Technology. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    ruia

    ruia

    Async Python framework for fast and flexible web scraping spiders

    Ruia is an asynchronous web scraping micro-framework built for Python that focuses on simplicity, speed, and flexibility when creating web crawlers. Ruia is powered by Python’s asyncio library along with aiohttp, enabling developers to perform concurrent network requests efficiently and scrape data from websites with minimal overhead. Ruia follows a “write less, run faster” philosophy, emphasizing concise code and streamlined spider development. It provides a structured approach to building...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 25
    CNN for Image Retrieval
    cnn-for-image-retrieval is a research-oriented project that demonstrates the use of convolutional neural networks (CNNs) for image retrieval tasks. The repository provides implementations of CNN-based methods to extract feature representations from images and use them for similarity-based retrieval. It focuses on applying deep learning techniques to improve upon traditional handcrafted descriptors by learning features directly from data. The code includes training and evaluation scripts that...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB