Showing 219 open source projects for "extraction"

View related business solutions
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 1
    pyhanlp

    pyhanlp

    Chinese participle

    ...The project focuses on making HanLP’s capabilities accessible through a Python-friendly API surface, so you can integrate NLP steps into data pipelines, notebooks, and downstream ML or information-extraction code. In practice, it serves as a bridge layer: Python calls are translated into the corresponding HanLP operations, so you can keep your application logic in Python while relying on HanLP’s implementations. It is especially useful when you need a pragmatic “get results quickly” NLP layer for segmentation, tagging, entity extraction, parsing, or keyword-style tasks rather than experimenting with model training from scratch.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    gain

    gain

    Asyncio-based Python framework for building fast web crawling spiders

    ...It provides a structured framework for creating spiders that can navigate websites, extract structured data, and process the collected results. Developers define crawlers using components such as spiders, parsers, and items, allowing them to organize crawling logic and data extraction rules clearly. Gain supports CSS selectors and XPath expressions for parsing page content and extracting specific elements. Gain also allows developers to configure headers, concurrency levels, and proxy settings to control how crawlers interact with target websites. Because it uses asynchronous programming, Gain can handle multiple requests efficiently while minimizing blocking operations.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3

    TEES

    Turku Event Extraction System

    Turku Event Extraction System (TEES) is a free and open source natural language processing system developed for the extraction of events and relations from biomedical text. It is written mostly in Python, and should work in generic Unix/Linux environments. Currently, the TEES source code repository still remains on GitHub at http://jbjorne.github.com/TEES/ where there is also a wiki with more information.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    aeneas

    aeneas

    Automagically synchronize audio and text (aka forced alignment)

    aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment). aeneas automatically generates a synchronization map between a list of text fragments and an audio file containing the narration of the text. In computer science this task is known as (automatically computing a) forced alignment.
    Downloads: 10 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    More flexibility. More control.

    Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 5

    Distant Speech Recognition

    Beamforming and Speech Recognition Toolkit

    BTK contains C++ and Python libraries that implement speech processing and microphone array techniques such as speech feature extraction, speech enhancement, speaker tracking, beamforming, dereverberation and echo cancellation algorithms. The Millennium ASR provides C++ and python libraries for automatic speech recognition. The Millennium ASR implements a weighted finite state transducer (WFST) decoder, training and adaptation methods. These toolkits are meant for facilitating research and development of automatic distant speech recognition.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Python Audio Tools are a collection of audio handling programs. These include programs for CD extraction, track conversion from one audio format to another, track renaming and retagging, track identification, CD burning from tracks, and more.
    Leader badge
    Downloads: 17 This Week
    Last Update:
    See Project
  • 7
    Yet Another Audio Feature Extractor is a toolbox for audio analysis. Easy to use and efficient at extracting a large number of audio features simultaneously. WAV and MP3 files supported, or embedding in C++, Python or Matlab applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Bifrozt

    Bifrozt

    High interaction honeypot solution for Linux based systems

    NOTICE: The format of this project has been changed from ISO to using ansible and has been moved to GitHub. Github link: https://github.com/Bifrozt/bifrozt-ansible
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9

    mwetoolkit

    THIS PROJECT MIGRATED TO https://gitlab.com/mwetoolkit/mwetoolkit3/

    THIS PROJECT MIGRATED TO https://gitlab.com/mwetoolkit/mwetoolkit3/ The Multiword Expressions toolkit aids in the automatic identification and extraction of multiword units in running text. These include idioms (kick the bucket), noun compounds (cable car), phrasal verbs (take off, give up), etc. Even though it focuses on multiword expresisons, the framework is quite complete and can also be useful in any corpus-based study in computational linguistics. The mwetoolkit can be applied to virtually any text collection, language, and MWE type. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    QuickNXS

    QuickNXS

    Polarized ToF reflectivity raw data analysis tool

    Data evaluation tool for the magnetism reflectometer at the spallation neutron source (BL-4A@SNS). Reads raw nexus files (HDF5) of histogrammed or event mode data to create reflectivity curves and 2D Q-maps.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    Newspaper3k

    Newspaper3k

    News, full-text, and article metadata extraction in Python 3

    Inspired by requests for its simplicity and powered by lxml for its speed. Newspaper is an amazing python library for extracting & curating articles. Newspaper delivers Instapaper style article extraction. Newspaper is a Python3 library! If you are certain that an entire news source is in one language, go ahead and use the same api. Works in 10+ languages, English, Chinese, German, Arabic, and more! On python3 you must install newspaper3k, not newspaper. newspaper is our python2 library. Although installing newspaper is simple with pip, you will run into fixable issues if you are trying to install on ubuntu. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12

    M0Droid

    A malware detection technique

    This is an Android malware detection technique based on system call extraction. The code is written with Python 2.7 and require Android SDK to launch virtual Android device and communicate with it. This program uses correlation coefficient to compare the signature of the app with the dataset (blacklist).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    CUbRIK

    CUbRIK

    Human-enhanced time-aware multimedia search

    The CUbRIK project provides a modular framework and distributed system architecture for flexible design and implementation of multimedia search applications. The framework supports hybrid workflows that combines automatic computation with CROWD-enabled and GWAP-enabled human computation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14

    pyll

    Set of scripts to help on package deploy on WebMethods Integration Srv

    Pyll can be described by a set of tools developed to help on package deployment on a growing environment of 20+ Integration Servers. Actually pyll can deploy over than 100+ packages for 20+ Integration servers under 2/3 hours ( update mode ). Pyll works for WebMethods Integration Server version 7.x and 8.x.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    TextBlob

    TextBlob

    TextBlob is a Python library for processing textual data

    Simple, Pythonic, text processing, Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more. TextBlob stands on the giant shoulders of NLTK and pattern, and plays nicely with both. Supports word inflection (pluralization and singularization) and lemmatization, as well as spelling correction. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    CAPLET

    CAPLET

    GDS visualization and parallelized capacitance extraction

    Project CAPLET is a capacitance extraction toolkit that extract capacitance at field-solver accuracy. CAPLET can directly handle GDS2 layout files into capacitance matrices in both GUI and command line interfaces. The internal extraction algorithm is specialized for VLSI interconnect structures but not exclusively, as long as the structure is of Manhattan geometry and embedded in a uniform dielectric material.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Pumilio
    Pumilio is a web-based sound analysis and archive system for almost any kind of sound file with tools to see the spectrogram of the sound, select regions for further analysis and insertion in a database, filtering, and many other manipulations.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 18

    CDSbank

    multi-sequence extraction, filtering & formatting

    CDSbank is a database that stores both the protein-coding DNA sequence (CDS) and amino acid sequence for each protein annotated in Genbank. CDSbank also stores Genbank feature annotation, a flag to indicate incomplete 5’ and 3’ ends, full taxonomic data, and a heuristic to rank the scientific interest of a species. This rich information allows fully automated data set preparation with a level of sophistication that meets or exceeds manual processing. Defaults ensure ease of use for typical...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    A text mining system for extraction of protein-protein interactions from biomedical text.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Bit operations on integers for Python - fast C implementation of bit extraction, counting, reversal etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    GWA Geographical Wifi Analyser
    Open Source Geographical Wifi Analyser Tool
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22

    WebSynonymExtractor

    a synonym extractor based on web-corpora and a multilingual translator

    This project is an approach for synonym extraction and extending WordNet by the so found synonyms. The python application is realised as a kind of pipe that starts with a web-corpus-reader which is followed by several workers (tokenizers, lemmatizers, ...) and finally completed by a result writer. In contrast to the state of the art approaches, this implementation is based on single words found in the web used as a corpus and translated to other languages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23

    pyBioImage

    Biological Imaging Software suite

    The pyBioImage is a python a python based biological imaging suite tailored to the problem of finding Germinal Center "spots" within multidimensional microscopy images as it is described in the research paper: "Software tool for 3D extraction of germinal centers", by David N. Olivieri, Merly Escalona and Jose Faro.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    evofetch

    evofetch

    Mining tool for software repositories

    EvoFETCH is a tool to extract information about software entities such as classes, methods, and attributes from software repositories. EvoFETCH is middleware and makes use of other freely available software to perform the extraction and provide query functionality.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Graph-based Extraction and Summarization - a generic graph-based summarization framework. Basic functionality is provided - third-party modules can be plugged in.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB