Showing 32 open source projects for "text extractor"

View related business solutions
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 1
    Article Extractor

    Article Extractor

    To extract main article from given URL with Node.js

    A Node.js library for extracting main content from web articles, removing unnecessary clutter like ads and navigation elements.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2

    text-extractor-extension

    Browser utility for extracting and organizing webpage text quickly.

    ...Additional setup resources and extension workflow examples: https://sites.google.com/view/text-extractor-extension-tool/ https://downloadjennymod.org/
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    Trafilatura

    Trafilatura

    Python & command-line tool to gather text on the Web

    Trafilatura is a Python package and command-line tool designed to gather text on the Web. It includes discovery, extraction and text-processing components. Its main applications are web crawling, downloads, scraping, and extraction of main texts, metadata and comments. It aims at staying handy and modular: no database is required, the output can be converted to various commonly used formats. Going from raw HTML to essential parts can alleviate many problems related to text quality, first by avoiding the noise caused by recurring elements (headers, footers, links/blogroll etc.) and second by including information such as author and date in order to make sense of the data. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Scanopy

    Scanopy

    Clean network diagrams, One-time setup, zero upkeep

    Scanopy is a powerful multi-modal data capture and analysis toolkit that enables users to collect, process, and visualize structured and unstructured information from a variety of sources in a flexible pipeline. It is built to handle complex scanning tasks — such as OCR, document analysis, audio transcription, network data capture, and image extraction — while providing unified APIs and workflows that make managing heterogeneous data sources seamless. Developers can compose custom pipelines...
    Downloads: 1 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    java-pdf-table-extractor-lib

    java-pdf-table-extractor-lib

    Java Pdf Table extraction library

    The command line application is an example of usage of the Java library. The library is based on pdfbox library and works by looking for the layout of each selected pdf page, and looking for table structure patterns. After calling the library (passing the pdf filename, and the page range), the result is a List<PdfTextElement>. PdfTextElement is an interface that has two implementations. * A basic text (outside the tables) * And PdfTextTabulaElement, for table structures. That...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    ...Extracts Information from Web by parsing millions of pages. Store data into Derby Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby and MySQL Database - Written in Java Cross Platform Also See Free email Sender : https://sourceforge.net/projects/gitst-free-email-ender/ Please install Microsoft OpenJDK to start the application https://www.microsoft.com/openjdk
    Downloads: 6 This Week
    Last Update:
    See Project
  • 7
    StyleTTS 2

    StyleTTS 2

    Towards Human-Level Text-to-Speech through Style Diffusion

    ...StyleTTS2 supports both single-speaker and multi-speaker configurations, with the ability to sample or transfer styles from reference audio, making it powerful for expressive TTS and character voices. The repository includes training scripts, configuration files, and pre-trained auxiliary modules such as a text aligner, pitch extractor, and PL-BERT-based linguistic encoder.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    xfe

    xfe

    A lightweight file manager for X

    X File Explorer (Xfe) is an MS-Explorer like file manager for X. It is based on X Win Commander, which was developed by Maxim Baranov. Xfe aims to be the file manager of choice for people who are looking for a fast and light graphical file manager on Unix systems.
    Leader badge
    Downloads: 124 This Week
    Last Update:
    See Project
  • 9

    Language-Aware String Extractor

    multi-encoding strings(1) replacement with language identification

    Enhanced version of the standard Unix strings(1) program which uses language models for automatic language identification and character-set identification, supporting over 1400 languages, dozens of character encodings, and 4800+ language/encoding pairs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 10
    pdf-extractor

    pdf-extractor

    Node.js module for rendering pdf pages to images, svgs and HTML files

    Pdf-extractor is a wrapper around pdf.js to generate images, svgs, html files, text files and json files from a pdf on node.js. A DOM Canvas is used to render and export the graphical layer of the pdf. Canvas exports *.png as a default but can be extended to export to other file types like .jpg. Pdf objects are converted to svg using the SVGGraphics parser of pdf.js.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    ...Extracts Information from Web by parsing millions of pages. Store data into Derby OR MySQL Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby Database - Written in Java Cross Platform See also Free Email Sender in this link: https://sourceforge.net/projects/gitst-free-email-ender/ Please install Microsoft OpenJDK to start the application https://www.microsoft.com/openjdk
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12

    pdf-to-text-fragments

    PDF text extractor for Firefox extensions

    Extract all possible textual information from a PDF file. This is intended mainly for tabular data where positional as well as textual information is required. PDF uses two text string placement operators, Tj and TJ. Tj places equally spaced characters while TJ places variably spaced characters starting from an X, Y coordinate in arbitrary units. A text fragment consists of the X and Y coordinates of the text string along with the text string. A list of text fragments containing all the text...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    DrQA

    DrQA

    Reading Wikipedia to Answer Open-Domain Questions

    DrQA is an open-domain question answering system that reads large text corpora—famously Wikipedia—to answer natural language questions with extractive spans. It follows a two-stage pipeline: a fast document retriever first narrows down candidate articles, and a neural machine reader then predicts the exact answer span from those passages. The retriever relies on classic IR features (like TF-IDF and n-gram statistics) to remain lightweight and scalable to millions of documents. The reader is...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Flat file extractor can be used for reading and parsing different flat file structures and printing them in different formats. ffe is a command line tool developed in GNU/Linux environment and it is distributed under GPL. Project moved to https://github.com/igitur/ffe
    Leader badge
    Downloads: 8 This Week
    Last Update:
    See Project
  • 15
    Php Email Extractor
    to extract emails from text sources and removes duplicate emails and removes unwanted words from emails
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    We have implemented a core summarizer of scientific articles written in Spanish, with the following components: a tokenizer, a grammar checker, a clarity checker, a cohesion-coherence checker, a common-topic extractor and an output formatter.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17

    Pdf Text Extractor

    A Java Application that extracts text from pdf files.

    A Java Application that extracts text from pdf files. User can select different areas on the pdf file and can extract text from those areas.Extraction of text can be done for single or multiple pages. Generate Bookmarks on the basis of Font Heights entered by the user.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Jar Ajar is a JAR-based self-extractor for zip files. Zip up files and package them with descriptive images and text using Jar Ajar's graphical interface. When recipients launch the resulting JAR, Jar Ajar guides users through the unzip process.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    OpenSearchServer Extractor

    OpenSearchServer Extractor

    A RESTFul/JSON Web Service for text and metata extraction

    An open source RESTFul Web Service for text , meta-data extraction and analysis. oss-text-extractor supports various binary formats: Word processor (doc, docx, odt, rtf) Spreadsheet (xls, xlsx, ods) Presentation (ppt, pptx, odp) Publishing (pdf, pub) Web (rss, html/xhtml) Medias (audio, images) Others (vsd, text)
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Products of the project: Java HTMLParser - VietSpider Web Data Extractor - Extractor VietSpider News. Click on "Show project details" to see more feature about each product.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Contacts Email Extractor
    This email harvester and bulk mailer is written in java and is totally free. Must have java installed to Click & Run the .jar file. Get email addresses in batches. Mac OS use Contacts Extractor.jar with the JavaMail API in a \lib folder next to it.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22

    htmlpicker

    Picks up text from a web page using a html template.

    A java html picker - text extractor Picks up text from a web page using a html template. Useful if you have regularly data to extract from the same site. You may use the same url or you may build urls having parameters. These parameters are fetch from a text file.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    This is a library to extract raw unicode text from any written documents (office documents such as PDF, Word, OpenOffice, ...). It should be useful to developpers of search engine, text processing, corpus analysis, ....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Scriptilitious
    ScriptBox for several terminal (Bash/Konsole) utilities. Current utilities include: sitemap-ripper - merges sitemaps, sitmap-maker - makes sitemaps, url-extractor - extracts URLs, url2hyperlink - makes links of URLs, file splitter - splits files
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Java exception extractor. This utility will parse all files (either plain text or bzipped) and tries to search for various exceptions. It then tries to match exceptions against grouping rules (regexps). It is also able to group unrecognised exceptions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
Auth0 Logo