Showing 219 open source projects for "extraction"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Let your crypto work for you

    Put idle assets to work with competitive interest rates, borrow without selling, and trade with precision. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 1
    FixRes

    FixRes

    Reproduces results of "Fixing the train-test resolution discrepancy"

    FixRes is a lightweight yet powerful training methodology for convolutional neural networks (CNNs) that addresses the common train-test resolution discrepancy problem in image classification. Developed by Facebook Research, FixRes improves model generalization by adjusting training and evaluation procedures to better align input resolutions used during different phases. The approach is simple but highly effective, requiring no architectural modifications and working across diverse CNN...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    CC-Net

    CC-Net

    Tools to download and cleanup Common Crawl data

    cc_net provides tools to download, segment, clean, and filter Common Crawl to build large-scale text corpora, including monolingual datasets and the multilingual CC-100 collection introduced in the associated paper. It includes pipelines to fetch snapshots, extract text, de-duplicate, identify language, and apply quality filtering based on heuristics and language models. The outputs are intended for pretraining language models and for creating standardized corpora that can be reproduced or...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Tashkeela processed

    Tashkeela processed

    Tashkeela dataset cleaned and normalized.

    ...The cleaning process includes removing the XML tags and strange symbols, as well as fixing diacritics errors. After that, the tokenization is performed while focusing on the extraction of the Arabic words. The result is a space-separated tokens file, where the words and the numbers are separated, but not the sequences of punctuation (ie, an ending parenthesis followed by a dot). The sentence segmentation is done at usual punctuations such as dots, commas, interrogation/exclamation marks, and line end as well. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    AICtools

    AICtools

    Workflow and set of tools for Automated Imagery Classification

    AICtools is a GIS workflow and set of tools to facilitate Automated Imagery Classification (AIC) and analysis of surface features over time. Allows bulk processing of large data sets, including automated metadata processing/filtering, compressed archive extraction and file manipulation, raster band compositing, pre-processing, mosaicking, clipping. Automates a subset of operations involved in classification of satellite imagery and the associated raster calculations used for time trend analysis.
    Downloads: 0 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 5
    DeText

    DeText

    A Deep Neural Text Understanding Framework

    DeText is a Deep Text understanding framework for NLP-related ranking, classification, and language generation tasks. It leverages semantic matching using deep neural networks to understand member intents in search and recommender systems. As a general NLP framework, DeText can be applied to many tasks, including search & recommendation ranking, multi-class classification and query understanding tasks.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    DeepCluster

    DeepCluster

    Deep Clustering for Unsupervised Learning of Visual Features

    ...DeepCluster was one of the early successes in unsupervised visual feature learning, demonstrating that clustering-based reformulation can rival supervised baselines for many downstream tasks. The repository includes code for feature extraction, clustering, training loops, and evaluation benchmarks like linear probes. Because of its simplicity and modular design, DeepCluster has inspired many later methods.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Delta ML

    Delta ML

    Deep learning based natural language and speech processing platform

    ...It helps you to train, develop, and deploy NLP and/or speech models. Use configuration files to easily tune parameters and network structures. What you see in training is what you get in serving: all data processing and features extraction are integrated into a model graph. Text classification, named entity recognition, question and answering, text summarization, etc. Uniform I/O interfaces and no changes for new models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    FavFreak

    FavFreak

    Favicon hash–based reconnaissance tool for security research

    FavFreak is an open source reconnaissance tool designed to assist security researchers, bug bounty hunters, and penetration testers in identifying web technologies using favicon hashes. It works by taking one or more URLs as input and automatically retrieving the favicon.ico file associated with each target website. After fetching the favicon, it calculates a hash value and organizes the scanned domains, subdomains, or IP addresses according to these hashes. FavFreak then compares the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    SkyWater PDK

    SkyWater PDK

    Open source process design kit for usage with SkyWater Technology

    The SkyWater PDK is the first broadly available open-source process design kit for a commercial-grade CMOS node, enabling researchers, startups, and students to design real ASICs without proprietary NDAs. It provides the essential artifacts for digital and analog flows: SPICE models, DRC/LVS rules, extraction decks, and technology files for open tools like Magic and KLayout. Standard-cell libraries and IO pads are included so digital designers can use open synthesis and place-and-route to reach a manufacturable GDS. Because the PDK is open, it becomes a common target for community reference designs, open tapeouts, and teaching curricula. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 10
    PyTorch GAN Zoo

    PyTorch GAN Zoo

    A mix of GAN implementations including progressive growing

    PyTorch GAN Zoo is a comprehensive open research toolbox designed for experimenting with and developing Generative Adversarial Networks (GANs) using PyTorch. The project provides modular implementations of popular GAN architectures, including Progressive Growing of GANs (PGAN), DCGAN, and an experimental StyleGAN version. It is built to support both researchers and developers who want to train, evaluate, and extend GANs efficiently across diverse datasets such as CelebA-HQ, FashionGen, DTD,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    DNSGen

    DNSGen

    Intelligent DNS permutation tool for subdomain discovery

    DNSGen is an open source DNS name permutation tool designed primarily for security researchers and penetration testers who need to discover potential subdomains during reconnaissance and attack surface mapping. It analyzes existing domain names and generates numerous intelligent variations that may represent valid subdomains within an organization’s infrastructure. These generated permutations help identify hidden or unlisted services that may not appear in standard DNS queries or public...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    jieba

    jieba

    Stuttering Chinese word segmentation

    "Jaba" Chinese word segmentation, do the best Python Chinese word segmentation component. Four word segmentation modes are supported. Precise mode, which tries to cut the sentence most precisely, suitable for text analysis. Full mode, scans all the words that can be formed into words in the sentence, the speed is very fast, but the ambiguity cannot be resolved. The search engine mode, on the basis of the precise mode, divides the long words again to improve the recall rate, which is suitable...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 13
    cocoNLP

    cocoNLP

    A Chinese information extraction tool

    cocoNLP is a lightweight natural-language processing toolkit geared toward practical information extraction from raw text, especially for Chinese and mixed Chinese–English content. Instead of requiring a heavy pipeline, it focuses on quick wins such as extracting names, places, organizations, emails, phone numbers, and dates directly from unstructured sentences. The project blends pattern-based methods with NLP heuristics, giving developers dependable results for real-world texts like chats, comments, and user-generated content. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    PyTracking

    PyTracking

    Visual tracking library based on PyTorch

    A general python framework for visual object tracking and video object segmentation, based on PyTorch. Official implementation of the RTS (ECCV 2022), ToMP (CVPR 2022), KeepTrack (ICCV 2021), LWL (ECCV 2020), KYS (ECCV 2020), PrDiMP (CVPR 2020), DiMP (ICCV 2019), and ATOM (CVPR 2019) trackers, including complete training code and trained models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Snips NLU

    Snips NLU

    Snips Python library to extract meaning from text

    Snips NLU is a Natural Language Understanding python library that allows to parse sentences written in natural language, and extract structured information. It’s the library that powers the NLU engine used in the Snips Console that you can use to create awesome and private-by-design voice assistants. The exact output is a bit richer, the point here is to give a glimpse on what kind of information can be extracted. Behind every chatbot and voice assistant lies a common piece of technology:...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Image Super-Resolution (ISR)

    Image Super-Resolution (ISR)

    Super-scale your images and run experiments with Residual Dense

    The goal of this project is to upscale and improve the quality of low-resolution images. This project contains Keras implementations of different Residual Dense Networks for Single Image Super-Resolution (ISR) as well as scripts to train these networks using content and adversarial loss components. Docker scripts and Google Colab notebooks are available to carry training and prediction. Also, we provide scripts to facilitate training on the cloud with AWS and Nvidia-docker with only a few...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 17
    MCNPydE

    MCNPydE

    MCNP data extraction and display software library

    MCNPydE is a Python library for extracting data from MCNP output file. It requires Python, Matplotlib and Numpy. It is a data reduction tool for MCNP output for ease of results analysis and viewing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18

    AutoBench

    This program is a benchmark site data extraction util program

    This program is a program that extracts the latest CPU, GPU, Drive and RAM performance scores and rankings from benchmark sites. The Output Data is saved as a csv, xlsx and xls file. CPU information is written by model name and score. GPU information is written by model name and score. Drive information is written by model name and score. RAM information is written by model name and score.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    RoboSat

    RoboSat

    Semantic segmentation on aerial and satellite imagery

    RoboSat is an end-to-end pipeline written in Python 3 for feature extraction from aerial and satellite imagery. Features can be anything visually distinguishable in the imagery for example: buildings, parking lots, roads, or cars.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    ECommerceCrawlers

    ECommerceCrawlers

    Collection of Python ecommerce and website crawler examples projects

    ...These examples demonstrate how to build and operate web scrapers capable of collecting structured information such as product listings, news content, job postings, social media data, and other publicly available web data. It aims to help developers understand the full workflow of web scraping, including request simulation, data extraction, storage, and handling anti-scraping techniques. It includes crawlers for platforms such as ecommerce marketplaces, blogging platforms, recruitment sites, and social networks, providing real-world practice scenarios. Developers can study the individual project documentation to understand the analysis process.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    Photon

    Photon

    Incredibly fast crawler designed for OSINT

    ...Its Python implementation makes it accessible for customization and integration into larger automation frameworks. Despite its speed focus, the tool still provides useful filtering and extraction capabilities for analysts who need structured results. Overall, Photon functions as a lightweight yet powerful reconnaissance spider for web intelligence gathering.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 22
    NeuroNER

    NeuroNER

    Named-entity recognition using neural networks

    Named-entity recognition (NER) aims at identifying entities of interest in the text, such as location, organization and temporal expression. Identified entities can be used in various downstream applications such as patient note de-identification and information extraction systems. They can also be used as features for machine learning systems for other natural language processing tasks. Leverages the state-of-the-art prediction capabilities of neural networks (a.k.a. "deep learning") Is cross-platform, open source, freely available, and straightforward to use. Enables the users to create or modify annotations for a new or existing corpus. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    TextRank

    TextRank

    TextRank implementation for Python 3

    TextRank is an implementation of the TextRank algorithm for extractive text summarization and keyword extraction, inspired by Google’s PageRank.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Spatial Media

    Spatial Media

    Specifications and tools for 360º video and spatial audio

    spatial-media provides tools for working with spherical video and spatial audio metadata so players and platforms can correctly render immersive media. The utilities inject, inspect, and extract metadata in common container formats (MP4/WebM) to signal 360° projection type, stereoscopy mode, and spatial audio layout. Creators use it to prepare 360/VR180 assets for upload so services know whether a video is monoscopic, top-bottom stereo, or side-by-side, and whether ambisonic audio is...
    Downloads: 26 This Week
    Last Update:
    See Project
  • 25
    PySptools

    PySptools

    Hyperspectral algorithms for Python

    ...The functions and classes are organized by topics: * abundance maps: FCLS, NNLS, UCLS * classification: AbundanceClassification, NormXCorr, KMeans SAM, SID, SVC * detection: ACE, CEM, GLRT, MatchedFilter, OSP * distance: chebychev, NormXCorr, SAM, SID * endmembers extraction: ATGP, FIPPI, NFINDR, PPI * material count: HfcVd, HySime * noise: Savitzky Golay, MNF, whiten * sigproc: bilateral * sklearn: HyperEstimatorCrossVal, HyperSVC and others * spectro: convex hull quotient, features extraction (tetracorder style), USGS06 lib interface * util: load_ENVI_file, load_ENVI_spec_lib, corr, cov and others The library do an extensive use of the numpy numeric library and can achieve good speed. ...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB