Showing 230 open source projects for "extract"

View related business solutions
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • Application Monitoring That Won't Slow Your App Down Icon
    Application Monitoring That Won't Slow Your App Down

    AppSignal's Rust-based agent is lightweight and stable. Already running in thousands of production apps.

    Full APM with errors, performance, logs, and uptime monitoring. 99.999% uptime SLA on the platform itself.
    Start Free
  • 1
    BettaFish

    BettaFish

    Public opinion analysis system

    BettaFish is an open-source, multi-agent public opinion analysis system built to automate the collection, deep analysis, and reporting of social media data at scale through conversational queries. It uses a modular architecture of specialized agents that collaborate to crawl mainstream platforms, extract multimodal content like text and short video, and synthesize insights through both statistical and large language model techniques. With a design that lets users pose questions in natural language and receive structured reports, charts, and visualizations, the system aims to break information cocoons and provide comprehensive views of trends and public sentiment. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Reader 3

    Reader 3

    Quick illustration of how one can easily read books together with LLMs

    This project is a minimalist, self-hosted EPUB reader designed to help users browse and read EPUB books one chapter at a time through a lightweight local server, making it especially easy to extract or work with chapters in external tools like large language models. It was created primarily as a simple demonstration of how to combine local book reading with LLM workflows without heavy dependencies or complicated setup, and it runs with just a small Python script and a basic HTTP server. The interface focuses on clarity and ease of use, offering straightforward navigation of book chapters rather than full-featured e-reading capabilities. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    DeepWiki Open

    DeepWiki Open

    AI-Powered Wiki Generator for GitHub/Gitlab/Bitbucket Repositories

    DeepWiki Open is an open-source, AI-powered wiki generator that automatically creates fully navigable, richly structured wiki documentation for GitHub, GitLab, or Bitbucket repositories by combining code analysis, vector embeddings, retrieval-augmented generation (RAG), and visualization tools. Users can enter a repository URL and the system will clone the project, build semantic embeddings of its codebase, extract architecture and relationships, generate human-readable documentation, and produce visual diagrams to help explain complex code structure. DeepWiki’s output turns raw repositories into interactive, web-style wikis complete with navigable sections, diagrams, and contextual explanations, making it easier for developers and collaborators to understand unfamiliar code. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    pep484 stubs for Django

    pep484 stubs for Django

    PEP-484 stubs for Django

    ...You can show your support by liking the PR. This project does not affect your runtime at all. It only affects mypy type checking process. The current implementation uses Django's runtime to extract information about models, so it might crash if your installed apps or models.py are broken.
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    TimeMixer

    TimeMixer

    Decomposable Multiscale Mixing for Time Series Forecasting

    ...Instead of relying on traditional recurrent or transformer-based architectures, TimeMixer is implemented as a fully multilayer perceptron–based model that performs temporal mixing across different resolutions of the data. The architecture introduces specialized components such as Past-Decomposable-Mixing blocks, which extract information from historical sequences at different scales, and Future-Multipredictor-Mixing modules that combine predictions from multiple forecasting paths. This design allows the model to integrate complementary information across scales and produce more accurate predictions for complex temporal patterns.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Semantra

    Semantra

    Multi-tool for semantic search

    ...The software analyzes text and PDF documents stored locally and creates embeddings that allow queries to retrieve results based on conceptual similarity. It is primarily intended for individuals who need to extract insights from large document collections, including researchers, journalists, students, and historians. The system runs from the command line and automatically launches a local web interface where users can perform interactive searches and examine document passages related to a query. By relying on semantic embeddings and contextual analysis, the tool can identify passages that are relevant even when the query uses different wording than the source documents.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    newspaper4k

    newspaper4k

    Python library for scraping and analyzing online news articles easily

    ...It is a continuation and active fork of the original newspaper3k library, which had stopped receiving updates, with the goal of keeping the ecosystem maintained while adding improvements and bug fixes. It provides developers with tools to automatically download web pages, extract the main article content, and collect associated metadata such as titles, authors, images, and publication dates. Newspaper4k also includes natural language processing capabilities that can generate summaries and identify keywords from extracted article text. Newspaper4k supports both single-article extraction and full news site processing, allowing users to build sources representing entire publications and iterate through their articles. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    videodl

    videodl

    Lightweight Python tool for downloading videos from many platforms

    ...It supports numerous video platforms across both Chinese and international streaming ecosystems, enabling users to fetch content from many popular services through a unified interface. Videodl works by implementing platform-specific client modules that extract video information and download links from supported services. Videodl can integrate with external command-line utilities to improve downloading performance, handle streaming formats such as HLS, and manage encrypted or segmented media streams. Additional utilities can also enable faster downloads, resume interrupted transfers, and process complex playlist structures.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    spider_collection

    spider_collection

    Collection of Python web scraping scripts for data extraction tasks

    spider_collection is a collection of Python web crawler scripts created primarily for experimentation, learning, and practical scraping tasks. spider_collection gathers multiple independent spiders designed to collect data from different platforms and services, demonstrating a variety of scraping techniques and workflows. These crawlers make use of common Python scraping tools such as requests, parsel, BeautifulSoup, and the Scrapy framework to extract structured information from web pages. Several scripts also incorporate multi-threading and proxy usage to improve scraping efficiency and help avoid common anti-scraping limitations. In addition to raw data collection, some spiders include basic data processing and analysis using tools such as pandas and simple visualization with matplotlib. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 10
    owllook

    owllook

    Vertical novel search engine with unified reading and tracking tools

    ...It focuses on providing a simple and comfortable reading experience with features such as searching for books, following updates, bookmarking chapters, and maintaining a personal bookshelf. It aggregates results from multiple search engines and applies parsing rules to extract novel metadata, chapters, and content in a consistent format. Owllook also includes functionality for tracking reading history, displaying rankings based on search activity, and recommending books using a similarity-based approach. Owllook is built using asynchronous technologies to support efficient data retrieval and responsive interactions while reading or searching.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    AI Powered Knowledge Graph Generator

    AI Powered Knowledge Graph Generator

    AI Powered Knowledge Graph Generator

    ...Knowledge graphs organize information as networks of nodes and relationships, allowing applications to analyze connections between concepts, datasets, or real-world entities. By incorporating AI techniques such as natural language processing and semantic reasoning, the project enables systems to automatically extract relationships and insights from large volumes of data. These capabilities make knowledge graph platforms particularly useful for applications such as recommendation engines, enterprise knowledge management, and research data exploration. The system emphasizes structured data modeling and graph-based queries that allow users to explore relationships that would be difficult to identify using traditional relational databases.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    DocETL

    DocETL

    A system for agentic LLM-powered data processing and ETL

    DocETL is an open-source system designed to build and execute data processing pipelines powered by large language models, particularly for analyzing complex collections of documents and unstructured datasets. The platform allows developers and researchers to construct structured workflows that extract, transform, and organize information from sources such as reports, transcripts, legal documents, and other text-heavy data. Instead of relying on single prompts or ad-hoc scripts, DocETL provides a declarative pipeline framework that breaks complex document analysis tasks into manageable operations that can be optimized and orchestrated automatically. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    AudioNotes

    AudioNotes

    Extract audio and video content and organize it into a Markdown note

    AudioNotes is an application (or proof-of-concept) that likely combines audio recording or playback with note-taking or annotation functionality — enabling users to record voice or audio and attach textual or timestamped notes, making it ideal for lectures, interviews, meetings, or personal memos. Such a tool offers a more expressive and flexible way to capture and revisit information: instead of just typed notes or raw audio, users get both audio context and structured notes. As an...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Instructor

    Instructor

    Structured outputs for llms

    Instructor is a tool that enables developers to extract structured data from natural language using Large Language Models (LLMs). Integrating with Python's Pydantic library allows users to define desired output structures through type hints, facilitating schema validation and seamless integration with IDEs. Instructor supports various LLM providers, including OpenAI, Anthropic, Litellm, and Cohere, offering flexibility in implementation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Tarsier

    Tarsier

    Vision utilities for web interaction agents

    At Reworkd, we iterated on all these problems across tens of thousands of real web tasks to build a powerful perception system for web agents... Tarsier! In the video below, we use Tarsier to provide webpage perception for a minimalistic GPT-4 LangChain web agent. Tarsier visually tags interactable elements on a page via brackets + an ID e.g. [23]. In doing this, we provide a mapping between elements and IDs for an LLM to take actions upon (e.g. CLICK [23]). We define interactable elements...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    cognee

    cognee

    Deterministic LLMs Outputs for AI Applications and AI Agents

    ...Any kind of data works; unstructured text or raw media files, PDFs, tables, presentations, JSON files, and so many more. Add small or large files, or many files at once. We map out a knowledge graph from all the facts and relationships we extract from your data. Then, we establish graph topology and connect related knowledge clusters, enabling the LLM to "understand" the data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    AudioMuse-AI

    AudioMuse-AI

    AudioMuse-AI is an Open Source Dockerized environment

    ...AudioMuse-AI integrates with several popular self-hosted music servers including Jellyfin, Navidrome, and Emby, allowing users to extend existing media servers with advanced AI-powered recommendation capabilities. The system uses machine learning and audio analysis tools such as Librosa and ONNX models to extract features directly from audio tracks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Python-Spider

    Python-Spider

    Python3 web crawler practice

    Python-Spider is a repository intended to teach or provide examples for writing web spiders / crawlers in Python — part of a broader learning and resource collection by its author. The code and documentation are oriented toward beginners or intermediate learners who want to learn how to fetch, parse, and extract data from websites programmatically. As part of the author’s public learning-path repositories, python-spider likely includes examples of HTTP requests, HTML parsing, maybe concurrency or scheduling to crawl multiple pages, and techniques to handle common web-scraping issues. For people wanting to get hands-on with building scrapers, collecting data, or learning how to navigate web programming in Python, this repository acts as a didactic reference or starting point. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    BlogWizard

    BlogWizard

    Generate blog articles from video or audio

    BlogWizard is a demo/utility project built on top of Groq’s LLM infrastructure that converts video or audio content into well-structured blog posts, enabling creators to repurpose multimedia content into text — useful for SEO, accessibility, or reaching audiences that prefer reading. The tool uses transcription (e.g. via Whisper) to extract text from audio/video, then runs an LLM-based generation pipeline to transform that content into coherent, readable blog-format posts — with sections, formatting, and possibly metadata. This bridges the gap between modern multimedia content (podcasts, YouTube videos, interviews) and traditional written content, making cross-format publishing more efficient. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    ktrain

    ktrain

    ktrain is a Python library that makes deep learning AI more accessible

    ktrain is a Python library that makes deep learning and AI more accessible and easier to apply. ktrain is a lightweight wrapper for the deep learning library TensorFlow Keras (and other libraries) to help build, train, and deploy neural networks and other machine learning models. Inspired by ML framework extensions like fastai and ludwig, ktrain is designed to make deep learning and AI more accessible and easier to apply for both newcomers and experienced practitioners. With only a few lines...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Vedo

    Vedo

    A python module for scientific analysis of 3D data

    ...Tools to visualize and edit meshes (cutting a mesh with another mesh, slicing, normalizing, moving vertex positions, etc..). Split mesh based on surface connectivity. Extract the largest connected area. Calculate areas, volumes, center of mass, average sizes etc. Calculate vertex and face normals, curvatures, feature edges. Fill mesh holes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Archive Extractor
    ...Alternatively, you can have 7z installed at the path "C:\Program Files\7-Zip" (this is usually set by default as well). Please note that if you only have 7z installed, you will not be able to extract .rar files, but only .zip or .7z files. This tool is primarily designed to extract files from password-protected Rar/Zip/7z archives, although it also works on unprotected archives. You can extract one or more archives of different types at a time. It is important to note that the passwords must be known; this is NOT a "cracking software" or "password recovery tool."
    Downloads: 6 This Week
    Last Update:
    See Project
  • 23
    PdfBooklet
    PdfBooklet is a Python Gtk application which allows to make books or booklets from existing pdf files. It can also adjust margins, rotate, scale, merge files or extract pages.
    Leader badge
    Downloads: 236 This Week
    Last Update:
    See Project
  • 24
    Wapiti

    Wapiti

    Wapiti is a web-application vulnerability scanner

    Wapiti is a vulnerability scanner for web applications. It currently search vulnerabilities like XSS, SQL and XPath injections, file inclusions, command execution, XXE injections, CRLF injections, Server Side Request Forgery, Open Redirects... It use the Python 3 programming language.
    Leader badge
    Downloads: 25 This Week
    Last Update:
    See Project
  • 25
    Video Frame Extractor

    Video Frame Extractor

    Extracts semi-random frames from all MP4 videos

    .... ## How to use: - Place this program in the folder containing your MP4 videos. - Double-click on VideoFrameExtractor.exe to run it. - When prompted, enter the number of frames you want to extract from each video. - Wait for the program to finish processing all videos. - Find your extracted frames in the 'extracted_frames' folder. The frames are extracted at evenly distributed points throughout each video. For example, if you choose 3 frames, they will be taken at the 25%, 50%, and 75% marks of each video. (Source code is included with the program .zip file.)
    Leader badge
    Downloads: 22 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB