Showing 48 open source projects for "text extractor"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 1
    Video-subtitle-extractor

    Video-subtitle-extractor

    A GUI tool for extracting hard-coded subtitle (hardsub) from videos

    Video hard subtitle extraction, generate srt file. There is no need to apply for a third-party API, and text recognition can be implemented locally. A deep learning-based video subtitle extraction framework, including subtitle region detection and subtitle content extraction. A GUI tool for extracting hard-coded subtitles (hardsub) from videos and generating srt files. Use local OCR recognition, no need to set up and call any API, and do not need to access online OCR services such as Baidu...
    Downloads: 70 This Week
    Last Update:
    See Project
  • 2
    Article Extractor

    Article Extractor

    To extract main article from given URL with Node.js

    A Node.js library for extracting main content from web articles, removing unnecessary clutter like ads and navigation elements.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Trafilatura

    Trafilatura

    Python & command-line tool to gather text on the Web

    Trafilatura is a Python package and command-line tool designed to gather text on the Web. It includes discovery, extraction and text-processing components. Its main applications are web crawling, downloads, scraping, and extraction of main texts, metadata and comments. It aims at staying handy and modular: no database is required, the output can be converted to various commonly used formats. Going from raw HTML to essential parts can alleviate many problems related to text quality, first by avoiding the noise caused by recurring elements (headers, footers, links/blogroll etc.) and second by including information such as author and date in order to make sense of the data. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Scanopy

    Scanopy

    Clean network diagrams, One-time setup, zero upkeep

    Scanopy is a powerful multi-modal data capture and analysis toolkit that enables users to collect, process, and visualize structured and unstructured information from a variety of sources in a flexible pipeline. It is built to handle complex scanning tasks — such as OCR, document analysis, audio transcription, network data capture, and image extraction — while providing unified APIs and workflows that make managing heterogeneous data sources seamless. Developers can compose custom pipelines...
    Downloads: 5 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 5
    StyleTTS 2

    StyleTTS 2

    Towards Human-Level Text-to-Speech through Style Diffusion

    ...StyleTTS2 supports both single-speaker and multi-speaker configurations, with the ability to sample or transfer styles from reference audio, making it powerful for expressive TTS and character voices. The repository includes training scripts, configuration files, and pre-trained auxiliary modules such as a text aligner, pitch extractor, and PL-BERT-based linguistic encoder.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    ...Extracts Information from Web by parsing millions of pages. Store data into Derby Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby and MySQL Database - Written in Java Cross Platform Also See Free email Sender : https://sourceforge.net/projects/gitst-free-email-ender/ Please install Microsoft OpenJDK to start the application https://www.microsoft.com/openjdk
    Downloads: 13 This Week
    Last Update:
    See Project
  • 7
    java-pdf-table-extractor-lib

    java-pdf-table-extractor-lib

    Java Pdf Table extraction library

    The command line application is an example of usage of the Java library. The library is based on pdfbox library and works by looking for the layout of each selected pdf page, and looking for table structure patterns. After calling the library (passing the pdf filename, and the page range), the result is a List<PdfTextElement>. PdfTextElement is an interface that has two implementations. * A basic text (outside the tables) * And PdfTextTabulaElement, for table structures. That...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    "A free, open-source PDF editor for basic editing tasks"
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Savvy DOCX Recovery

    Savvy DOCX Recovery

    Open corrupt Word DOCX files and possibly recover formatting too.

    ...If all else fails, SilverCoder's DocToText is used to extract text. Try also http://wordcorruptdocchecker.codeplex.com/ and https://support.microsoft.com/en-us/kb/2528942 and my other SF projects: Corrupt Extractor for Microsoft Office, Corrupt DOCX Salvager, S2 Recovery Tools for Microsoft Word.
    Leader badge
    Downloads: 175 This Week
    Last Update:
    See Project
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 10
    xfe

    xfe

    A lightweight file manager for X

    X File Explorer (Xfe) is an MS-Explorer like file manager for X. It is based on X Win Commander, which was developed by Maxim Baranov. Xfe aims to be the file manager of choice for people who are looking for a fast and light graphical file manager on Unix systems.
    Leader badge
    Downloads: 233 This Week
    Last Update:
    See Project
  • 11
    MahaKurawa.My.ID URL Extractor

    MahaKurawa.My.ID URL Extractor

    MahaKurawa.My.ID URL Extractor is Simple Tool to extract unique URL

    MahaKurawa.My.ID URL Extractor is Simple Tool to extract unique URL from any text content in instant. It's useful when you lazy enough to identify and copy-paste URL from your content one by one by yourself.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12

    Jamilsoft Blade Email Extractor

    A powerful and easy-to-use email extracting software

    Jamilsoft Blade is a powerful and easy-to-use email extracting software that can help you extract email addresses from a variety of sources, including websites, documents, and social media. With Jamilsoft Blade, you can quickly and easily find the email addresses you need, even if they are hidden or obscured.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13

    Language-Aware String Extractor

    multi-encoding strings(1) replacement with language identification

    Enhanced version of the standard Unix strings(1) program which uses language models for automatic language identification and character-set identification, supporting over 1400 languages, dozens of character encodings, and 4800+ language/encoding pairs.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    pdf-extractor

    pdf-extractor

    Node.js module for rendering pdf pages to images, svgs and HTML files

    Pdf-extractor is a wrapper around pdf.js to generate images, svgs, html files, text files and json files from a pdf on node.js. A DOM Canvas is used to render and export the graphical layer of the pdf. Canvas exports *.png as a default but can be extended to export to other file types like .jpg. Pdf objects are converted to svg using the SVGGraphics parser of pdf.js.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    ...Extracts Information from Web by parsing millions of pages. Store data into Derby OR MySQL Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby Database - Written in Java Cross Platform See also Free Email Sender in this link: https://sourceforge.net/projects/gitst-free-email-ender/ Please install Microsoft OpenJDK to start the application https://www.microsoft.com/openjdk
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16

    pdf-to-text-fragments

    PDF text extractor for Firefox extensions

    Extract all possible textual information from a PDF file. This is intended mainly for tabular data where positional as well as textual information is required. PDF uses two text string placement operators, Tj and TJ. Tj places equally spaced characters while TJ places variably spaced characters starting from an X, Y coordinate in arbitrary units. A text fragment consists of the X and Y coordinates of the text string along with the text string. A list of text fragments containing all the text...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    DrQA

    DrQA

    Reading Wikipedia to Answer Open-Domain Questions

    DrQA is an open-domain question answering system that reads large text corpora—famously Wikipedia—to answer natural language questions with extractive spans. It follows a two-stage pipeline: a fast document retriever first narrows down candidate articles, and a neural machine reader then predicts the exact answer span from those passages. The retriever relies on classic IR features (like TF-IDF and n-gram statistics) to remain lightweight and scalable to millions of documents. The reader is...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Flat file extractor can be used for reading and parsing different flat file structures and printing them in different formats. ffe is a command line tool developed in GNU/Linux environment and it is distributed under GPL. Project moved to https://github.com/igitur/ffe
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19

    Blackberry Backup Contacts Extractor

    Extracts contact information from Blackberry Backup Files (IPD, BBB)

    This program helps you extract your contacts from an existing Blackberry Backup File (with extension .IPD or .BBB), into a text file (with extension .CSV) that you can then open in Microsoft Excel. Please note that this does not work with the latest version of the .BBB files, as Blackberry introduced encryption in those files, making them unreadable. This trial version lets you extract a few contacts from your backup file. If you like the result, you can upgrade to be able to extract...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Jihosoft iOS 10 Backup Extractor

    Jihosoft iOS 10 Backup Extractor

    A professional software that helps to extract data from iTunes backup.

    Jihosoft iOS 10 Backup Extractor is a professional software that helps to extract data including lost text messages, contacts, photos, videos, WhatsApp messages, Viber Messages from iTunes backup. It is fully compatible with iOS 10.2. Users can extract data from iOS 10.2 backup in an efficient and safe way without restoring the whole backup on iPhone.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Php Email Extractor
    to extract emails from text sources and removes duplicate emails and removes unwanted words from emails
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    We have implemented a core summarizer of scientific articles written in Spanish, with the following components: a tokenizer, a grammar checker, a clarity checker, a cohesion-coherence checker, a common-topic extractor and an output formatter.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Extractor Files

    Extractor Files

    Extractor Files is an index

    Extractor Files is an indexer to search for content in files, contains a simple text editor, he can display the contents of the file and also their metadata
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25

    Pdf Text Extractor

    A Java Application that extracts text from pdf files.

    A Java Application that extracts text from pdf files. User can select different areas on the pdf file and can extract text from those areas.Extraction of text can be done for single or multiple pages. Generate Bookmarks on the basis of Font Heights entered by the user.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB