Showing 26 open source projects for "extraction"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 1
    watercrawl

    watercrawl

    AI-ready web crawler that extracts and structures website content

    WaterCrawl is an open source web crawling and data extraction platform designed to transform website content into structured data suitable for machine learning and AI workflows. It enables developers and researchers to crawl web pages, extract meaningful information, and convert it into formats that are easier to process and analyze. It provides a modern crawling system that can automatically navigate links, control crawl depth, and collect content from targeted sections of a website. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 2
    kimuraframework

    kimuraframework

    AI-first Ruby framework for building fast, flexible web scraping spide

    Kimurai is an open source web scraping framework written in Ruby that simplifies the process of building automated data extraction tools. It provides a clean domain-specific language that allows developers to define scraping logic and data schemas with minimal boilerplate code. Kimurai can use AI-assisted extraction to identify where data resides in HTML pages, automatically generating selectors that are cached for future use so subsequent scraping runs operate with pure Ruby performance. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    Kaldi

    Kaldi

    kaldi-asr/kaldi is the official location of the Kaldi project

    ...Kaldi is designed for researchers who need a highly customizable environment to experiment with new algorithms, as well as for practitioners who want robust, production-ready ASR pipelines. It includes extensive tools for data preparation, feature extraction, acoustic and language modeling, decoding, and evaluation. With its modular design, Kaldi allows users to adapt the system to a wide range of languages and domains. As one of the most influential projects in speech recognition, it has become a foundation for much of the modern work in ASR.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 4
    newspaper4k

    newspaper4k

    Python library for scraping and analyzing online news articles easily

    ...Newspaper4k also includes natural language processing capabilities that can generate summaries and identify keywords from extracted article text. Newspaper4k supports both single-article extraction and full news site processing, allowing users to build sources representing entire publications and iterate through their articles. It maintains compatibility with the original project so that existing code written for newspaper3k can continue working with minimal changes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 5
    X-osint

    X-osint

    Open source OSINT tool for gathering data on emails, phones, and IPs

    X-osint is an open source intelligence framework designed to collect and analyze publicly available information from multiple sources. It focuses on gathering useful and credible data about entities such as phone numbers, email addresses, and IP addresses using a range of automated OSINT techniques. It provides investigators and researchers with a centralized interface for running information-gathering tasks that would normally require multiple separate tools. X-osint can also perform...
    Downloads: 34 This Week
    Last Update:
    See Project
  • 6
    DotnetSpider

    DotnetSpider

    Lightweight .NET framework for fast web crawling and data scraping

    DotnetSpider is a web crawling and data extraction framework built on the .NET Standard platform. It is designed to help developers create efficient and scalable crawlers for collecting structured data from websites. It provides a high-level API that simplifies the process of defining spiders, managing requests, and extracting content from web pages. Developers can create custom spiders by extending base classes and configuring pipelines that handle downloading, parsing, and storing collected data. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 7
    EMBA

    EMBA

    The firmware security analyzer

    EMBA is designed as the central firmware analysis tool for penetration testers and product security teams. It supports the complete security analysis process starting with firmware extraction, doing static analysis and dynamic analysis via emulation and finally generating a web report. EMBA automatically discovers possible weak spots and vulnerabilities in firmware. Examples are insecure binaries, old and outdated software components, potentially vulnerable scripts, or hard-coded passwords. EMBA is a command line tool with the possibility to generate an easy-to-use web report for further analysis. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    Papers We Love

    Papers We Love

    Papers from the computer science community to read and discuss

    Papers We Love (PWL) is a global open source community dedicated to reading, discussing, and sharing influential computer science research papers. The repository serves as a curated directory of academic papers that have shaped the field of computing, providing a centralized location for documents that were previously scattered across various online sources. While licensing restrictions prevent hosting all papers directly, PWL offers links to their original sources and clearly marks hosted...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    GitHound

    GitHound

    Search GitHub for leaked API keys, credentials, and exposed secrets

    GitHound is a reconnaissance and security scanning tool designed to search GitHub for exposed secrets such as API keys, credentials, and other sensitive tokens. It works by combining GitHub search queries (often called “GitHub dorks”) with pattern matching techniques to locate potential secrets across public repositories. Instead of scanning only a limited set of repositories, the tool leverages GitHub’s Code Search API to analyze results from across the entire public GitHub ecosystem,...
    Downloads: 3 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 10
    Open Semantic Search

    Open Semantic Search

    Open source semantic search and text analytics for large document sets

    Open Semantic Search is an open source research and analytics platform designed for searching, analyzing, and exploring large collections of documents using semantic search technologies. It provides an integrated search server combined with a document processing pipeline that supports crawling, text extraction, and automated analysis of content from many different sources. Open Semantic Search includes an ETL framework that can ingest documents, process them through analysis steps, and enrich the data with extracted information such as named entities and metadata. It also supports optical character recognition to extract text from images and scanned documents, including images embedded inside PDF files. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11

    UniversalTextExtractor

    Command-line toolset for extracting text from files

    Command-line toolset for extracting text from files (documents, images, archives) into SQLite with OCR support. Simple, expandable, one shell script only.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12

    Esegui SB

    Flexible video encoding script supporting multiple formats and codecs.

    ...Video Segmentation and Merging: Split videos into segments and merge them back together. Track Disposition Management: Set default tracks for audio, video, and subtitles. Audio Track Extraction: Extract and encode audio tracks independently. Audio normalization: Adjusts audio levels to ensure consistent volume across tracks. The script leverages FFmpeg's built-in libraries and tools for these functions. For detailed instructions and troubleshooting, refer to the provided documentation.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    p3d

    p3d

    General data-reduction tool for fiber-fed integral-field spectrographs

    p3d is a general data-reduction tool for use with fiber-fed integral-field spectrographs (IFSs); although, the spectrum viewer works with spectrum data cubes of any origin. The tool is built about the proprietary software IDL (Harris/EXELIS; see http://www.harrisgeospatial.com), but can be used without any license. Most slowly running loops are implemented in parallelized C (OpenMP).
    Downloads: 8 This Week
    Last Update:
    See Project
  • 14
    Seqs-Extractor
    Seqs Extractor is a useful tool, and can reduce ambiguities in analyses which uses BLAST command ine, commonly in the next generation sequencing, Transcriptomics, Proteomics, etc and help extract BLASTed sequences and sequences that contains microsatellites. Seqs Extrator also turns the BLAST command line more friendly.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    CNN for Image Retrieval
    cnn-for-image-retrieval is a research-oriented project that demonstrates the use of convolutional neural networks (CNNs) for image retrieval tasks. The repository provides implementations of CNN-based methods to extract feature representations from images and use them for similarity-based retrieval. It focuses on applying deep learning techniques to improve upon traditional handcrafted descriptors by learning features directly from data. The code includes training and evaluation scripts that...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    AICtools

    AICtools

    Workflow and set of tools for Automated Imagery Classification

    AICtools is a GIS workflow and set of tools to facilitate Automated Imagery Classification (AIC) and analysis of surface features over time. Allows bulk processing of large data sets, including automated metadata processing/filtering, compressed archive extraction and file manipulation, raster band compositing, pre-processing, mosaicking, clipping. Automates a subset of operations involved in classification of satellite imagery and the associated raster calculations used for time trend analysis.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    MultiPathNet

    MultiPathNet

    A Torch implementation of the object detection network

    MultiPathNet is a Torch-7 implementation of the “A MultiPath Network for Object Detection” paper (BMVC 2016), developed by Facebook AI Research. It extends the Fast R-CNN framework by introducing multiple network “paths” to enhance feature extraction and object recognition robustness. The MultiPath architecture incorporates skip connections and multi-scale processing to capture both fine-grained details and high-level context within a single detection pipeline. This results in improved detection accuracy across various object sizes and categories compared to standard single-path architectures. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    swap_digger

    swap_digger

    swap_digger is a tool used to automate Linux swap analysis

    swap_digger is a bash script used to automate Linux swap analysis for post-exploitation or forensics purpose. It automates swap extraction and searches for Linux user credentials, Web form credentials, Web form emails, HTTP basic authentication, WiFi SSID and keys, etc. swap_digger is a tool used to automate Linux swap analysis during post-exploitation or forensics. It automates swap extraction and searches for Linux user credentials, web forms credentials, web forms emails, http basic authentication, Wifi SSID and keys, etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Bifrozt

    Bifrozt

    High interaction honeypot solution for Linux based systems

    NOTICE: The format of this project has been changed from ISO to using ansible and has been moved to GitHub. Github link: https://github.com/Bifrozt/bifrozt-ansible
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20

    mwetoolkit

    THIS PROJECT MIGRATED TO https://gitlab.com/mwetoolkit/mwetoolkit3/

    THIS PROJECT MIGRATED TO https://gitlab.com/mwetoolkit/mwetoolkit3/ The Multiword Expressions toolkit aids in the automatic identification and extraction of multiword units in running text. These include idioms (kick the bucket), noun compounds (cable car), phrasal verbs (take off, give up), etc. Even though it focuses on multiword expresisons, the framework is quite complete and can also be useful in any corpus-based study in computational linguistics. The mwetoolkit can be applied to virtually any text collection, language, and MWE type. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21

    Large Text File converter

    Java Based Heavy-duty utilitity to process large delimited text files

    ...Another strength of this tool is in its configurability, it's design allows to generate as many output files as required from one input file, and at every row of input file validation, extraction, conversion can be applied. Use case Example: legacy system is to be replaced with new advanced system with different DB schema, and the data provided as 100GB size of delimited text data which is to be inserted in 10 different tables of new system DB after validation,date format conversion, rearrangements, and MD5 hashing implementation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Customizable browser based (text/web(WYSIWYG)) file editors environment in PHP (GPL Licensed) with loads of features. (tested only in firefox)
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    cydownloader is a shell script which downloads files from rapidshare.com (premium-users). cydownloader suppports .rar extraction (including password protected files), md5sum checks and (simple) priority ranking. It also works an an link-checker.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    kavenc is a KDE application which allows easy conversion of videos and audio files, as well as the extraction of audio from video files.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Extremely customizable bash script for ripping audio CDs. It gets album information from FreeDB. You can use any audio extraction software you like, with any encoder, and any related tools (for tagging, replaygain, etc...)
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB