Showing 35 open source projects for "information extraction"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    react-docgen

    react-docgen

    A CLI and toolbox to extract information from React component files

    react-docgen is a CLI and toolbox to help extracting information from React components, and generate documentation from it. It uses @babel/parser to parse the source into an AST and provides methods to process this AST to extract the desired information. The output / return value is a JSON blob / JavaScript object. It provides a default implementation for React components defined via React.createClass, ES2015 class definitions or functions (stateless components). These component definitions...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    OSRFramework

    OSRFramework

    OSRFramework, the Open Sources Research Framework is a AGPLv3+ project

    OSRFramework is a GNU AGPLv3+ set of libraries developed by i3visio to perform Open Source Intelligence collection tasks. They include references to a bunch of different applications related to username checking, DNS lookups, information leaks research, deep web search, regular expressions extraction and many others. At the same time, by means of ad-hoc Maltego transforms, OSRFramework provides a way of making these queries graphically as well as several interfaces to interact with like OSRFConsole or a Web interface. If everything went correctly (we hope so!)...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 3
    LangExtract

    LangExtract

    A Python library for extracting structured information

    ...LangExtract supports a wide range of models, including Google Gemini, OpenAI GPT, and local LLMs via Ollama, making it adaptable to different deployment environments and compliance needs. The system excels at handling long documents using optimized chunking, multi-pass extraction, and parallel processing to ensure both high recall and structured consistency.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    LibPDF

    LibPDF

    A modern PDF library for TypeScript

    ...The library offers full read and write manipulation, including support for encryption with RC4 and modern AES cipher suites, form filling and flattening, digital signature creation and verification, page merging/splitting, rich text extraction with layout information, and font embedding with subsetting.
    Downloads: 3 This Week
    Last Update:
    See Project
  • MyQ Print Management Software Icon
    MyQ Print Management Software

    SAVE TIME WITH PERSONALIZED PRINT SOLUTIONS

    Boost your digital or traditional workplace with MyQ’s secure print and scan solutions that respect your time and help you focus on what you do best.
    Learn More
  • 5
    ANTLR

    ANTLR

    Parser generator to read, process, or translate structured text

    ...Twitter search uses ANTLR for query parsing, with over 2 billion queries a day. The languages for Hive and Pig, the data warehouse and analysis systems for Hadoop, both use ANTLR. Lex Machina uses ANTLR for information extraction from legal texts. Oracle uses ANTLR within SQL Developer IDE and their migration tools. NetBeans IDE parses C++ with ANTLR. The HQL language in the Hibernate object-relational mapping framework is built with ANTLR.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    java-pdf-table-extractor-lib

    java-pdf-table-extractor-lib

    Java Pdf Table extraction library

    The command line application is an example of usage of the Java library. The library is based on pdfbox library and works by looking for the layout of each selected pdf page, and looking for table structure patterns. After calling the library (passing the pdf filename, and the page range), the result is a List<PdfTextElement>. PdfTextElement is an interface that has two implementations. * A basic text (outside the tables) * And PdfTextTabulaElement, for table structures. That...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    [ARCHIVAL] The central forum for the MWE community. Share your open-source data sets and MWE extraction tools, exchange ideas on evaluation strategies and further development of the tools, and discuss theoretical definitions and linguistic properties of MWEs.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    OmniPull

    OmniPull

    Just pull anything

    OmniPull is a powerful, cross-platform download manager built with Python and PySide6. It provides a modern, intuitive interface for managing downloads with advanced features like multi-threading, queue management, and media extraction.
    Downloads: 26 This Week
    Last Update:
    See Project
  • 9
    DocWire SDK

    DocWire SDK

    Award-winning modern data processing SDK in C++20

    DocWire SDK, a standout C++20AI driven data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis. For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document format support and the ability to extract valuable insights from email boxes, databases, and websites using cutting-edge AI. DocWire SDK aims to...
    Downloads: 4 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 10
    RedtDec

    RedtDec

    RetDec is a retargetable machine-code decompiler based on LLVM

    The decompiler is not limited to any particular target architecture, operating system, or executable file format. ELF, PE, Mach-O, COFF, AR (archive), Intel HEX, and raw machine code supported. 32-bit: Intel x86, ARM, MIPS, PIC32, and PowerPC 64-bit: x86-64 supported. Demangling of symbols from C++ binaries (GCC, MSVC, Borland). Reconstruction of functions, types, and high-level constructs. Output in two high-level languages: C and a Python-like language. Generation of call graphs,...
    Downloads: 37 This Week
    Last Update:
    See Project
  • 11
    7-Zip-JBinding

    7-Zip-JBinding

    Java wrapper for 7z archiver engine

    Native (JNI) cross-platform library to extract (password protected, multi-part) 7z Zip Rar Tar Split Lzma Iso HFS GZip Cpio BZip2 Z Arj Chm Lhz Cab Nsis Deb Rpm Wim Udf archives and create 7z, Zip, Tar, GZip & BZip2 from Java.
    Leader badge
    Downloads: 30 This Week
    Last Update:
    See Project
  • 12
    Marathon -GUI Test Runner Web, Swing, FX

    Marathon -GUI Test Runner Web, Swing, FX

    Marathon supports testing of Java/Swing and Java/Fx applications.

    Marathon provides an integrated environment for test script creation and execution. Supported FW: Web, Java Swing/ Java FX. Currently, Marathon supports JRuby script models for recording the test scripts.Marathon test runner generates Allure test reports.Marathon allows for grouping of test cases.It also has an option of inserting modules while recording(in MarathonITE while recording). It allows tester to inset checklist while recording and also takes screen capture and annotate it in...
    Leader badge
    Downloads: 7 This Week
    Last Update:
    See Project
  • 13
    Laravel Stats Tracker

    Laravel Stats Tracker

    Laravel Stats Tracker

    Tracker gathers a lot of information from your requests to identify and store sessions and page views. Storing user tracking information, on indexed and normalized database tables, wastes less disk space and ease the extraction of valuable information about your application and business. As soon as you install and enable it, Tracker will start storing all information you tell it to, then you can in your application use the Tracker Facade to access everything.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    MITIE

    MITIE

    MITIE: library and tools for information extraction

    This project provides free (even for commercial use) state-of-the-art information extraction tools. The current release includes tools for performing named entity extraction and binary relation detection as well as tools for training custom extractors and relation detectors. MITIE is built on top of dlib, a high-performance machine-learning library[1], MITIE makes use of several state-of-the-art techniques including the use of distributional word embeddings[2] and Structural Support Vector Machines[3]. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    pyhanlp

    pyhanlp

    Chinese participle

    ...The project focuses on making HanLP’s capabilities accessible through a Python-friendly API surface, so you can integrate NLP steps into data pipelines, notebooks, and downstream ML or information-extraction code. In practice, it serves as a bridge layer: Python calls are translated into the corresponding HanLP operations, so you can keep your application logic in Python while relying on HanLP’s implementations. It is especially useful when you need a pragmatic “get results quickly” NLP layer for segmentation, tagging, entity extraction, parsing, or keyword-style tasks rather than experimenting with model training from scratch.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    TextTeaser

    TextTeaser

    TextTeaser is an automatic summarization algorithm

    textteaser is an automatic text summarization algorithm implemented in Python. It extracts the most important sentences from an article to generate concise summaries that retain the core meaning of the original text. The algorithm uses features such as sentence length, keyword frequency, and position within the document to determine which sentences are most relevant. By combining these features with a simple scoring mechanism, it produces summaries that are both readable and informative....
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    Matlab/Octave Rotations Library

    Matlab/Octave Rotations Library

    Library for working with 3D rotations in Matlab/Octave

    The Matlab/Octave rotations library is a collection of functions, bundled as m-scripts, that address computations and numerical handling of rotations in 3D Euclidean space. The rotation representations that are supported are rotation matrices (Rotmat), Quaternions (Quat), intrinsic ZYX Euler angles (Euler), fused angles (Fused) and tilt angles (Tilt). Operations such as composition, inversion, ZYX yaw extraction, fused yaw extraction, random generation, equality detection, vector rotation...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Rotations Conversion Library

    Rotations Conversion Library

    Library for working with 3D rotations in C++

    The Rotations Conversion Library (RCL) is a collection of C++ functions that address common computations and numerical handling of rotations in 3D Euclidean space, including support for rotation matrices (`Rotmat`), Quaternions (`Quat`), intrinsic ZYX Euler angles (`Euler`), fused angles (`Fused`) and tilt angles (`Tilt`). In addition to the core competency of being able to convert between each of the representations, operations such as inversion, ZYX yaw extraction, fused yaw extraction,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    nexmon

    nexmon

    The C-based Firmware Patching Framework for Broadcom/Cypress WiFi Chip

    The C-based Firmware Patching Framework for Broadcom/Cypress WiFi Chips enables Monitor Mode, Frame Injection, and much more. Nexmon is our C-based firmware patching framework for Broadcom/Cypress WiFi chips that enables you to write your own firmware patches, for example, to enable monitor mode with radiotap headers and frame injection. This repository mainly focuses on enabling monitor mode and frame injection on many chips. A real Wi-Fi jammer that allows to overlay ongoing frame...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 20
    GUAJE FUZZY

    GUAJE FUZZY

    Free software for generating understandable and accurate fuzzy systems

    GUAJE stands for Generating Understandable and Accurate fuzzy models in a Java Environment. Thus, it is a free software tool (licensed under GPL-v3) with the aim of supporting the design of interpretable and accurate fuzzy systems by means of combining several preexisting open source tools, taking profit from the main advantages of all of them. It is a user-friendly portable tool designed and developed in order to make easier knowledge extraction and representation for fuzzy systems, paying...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    CMIS Input plugin for Pentaho

    CMIS Input plugin for Pentaho

    Allows querying Content Management Systems that use the CMIS.

    Imagine being able to extract from your Enterprise Content Management System, all the metadata of your documents using simple queries with a query language very close to the traditional SQL. Imagine using the information extracted for statistical purposes, for creating reports and, more generally, to analyse your document archives in a way unthinkable until now with the current tools available. All this is possible within the Pentaho Suite, the Open Source Business Intelligence platform, which is useful to the extraction and analysis of structured and semi-structured data. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    musicinformationretrieval.com

    musicinformationretrieval.com

    Instructional notebooks on music information retrieval

    musicinformationretrieval.com is a collection of instructional materials for music information retrieval (MIR). These materials contain a mix of casual conversation, technical discussion, and Python code. These pages, including the one you're reading, are authored using Colab notebooks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    cde4php - Cross Database Engine for PHP

    cde4php - Cross Database Engine for PHP

    Uniform Database Abstraction for PHP Development

    Debby has replaced CDE in the Tina4Stack, you may want to check it out at http://tina4.com CDE is a PHP class which implements the general database functions in PHP and provides a common SQL platform for php development where developers change their databases but not their code. Supports Firebird, MySQL,Oracle,SQLite, MSSQL(both drivers),CUBRID,ODBC. CDE now supports date uniformity, param passing & BLOB handling across all the databases supported. CDE is not a replacement for PDO,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    ivantk

    ivantk

    Image-based Vascular Analysis Toolkit

    The Image-based Vascular Analysis Toolkit is a set of multiplatform C++ libraries for vascular analysis of (3D) medical images, typically CT or MRI. It can be considered as an extension of the Insight Toolkit (ITK) for vascular image analysis, with methods for detection, extraction and modeling of vascular structures.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Customizable browser based (text/web(WYSIWYG)) file editors environment in PHP (GPL Licensed) with loads of features. (tested only in firefox)
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB