Showing 31 open source projects for "text processing"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.
    Start Free
  • 1
    html-loader

    html-loader

    HTML Loader

    ...Filter can also be used to extend the supported elements and attributes. By default, the parser in html-loader interprets content inside noscript tags as #text, so processing of content inside this tag will be ignored. A very common scenario is exporting the HTML into their own .html file, to serve them directly instead of injecting with javascript.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 2
    mp-html

    mp-html

    Small program rich text component, supports rendering and editing html

    A powerful applet-rich text component. Small program rich text component supports rendering and editing HTML and supports use on WeChat, QQ, Baidu, Alipay, Toutiao, and uni-app platforms. Displaying dynamic HTML rich text is a necessary requirement for many applications. The applet platform does not support dom operations, making this a problem. The built-in rich-text component supports few tags and blocks all events, making it difficult for practical application. Therefore, there is such a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    SingleFile

    SingleFile

    Web Extension for saving a copy of complete web page in a single file

    Web Extension for Firefox/Chrome/MS Edge and CLI tool to save a faithful copy of an entire web page in a single HTML file. SingleFile is a Web Extension (and a CLI tool) compatible with Chrome, Firefox (Desktop and Mobile), Microsoft Edge, Vivaldi, Brave, Waterfox, Yandex Browser, and Opera. It helps you to save a complete web page into a single HTML file. Wait until the page is fully loaded. Click on the SingleFile button in the extension toolbar to save the page. You can click again on the...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 4
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    More flexibility. More control.

    Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 5
    unfluff

    unfluff

    Automatically extract body content (and other cool stuff) from HTML

    unfluff is a Node.js library designed to automatically extract the main content from an HTML document — stripping away navigation bars, ads, footers and other boilerplate to leave you with the “body content”, metadata (title, author, date) and other useful fields. It’s a tool very much aimed at content-analysis, web scraping, building datasets, or repurposing article text for downstream processing (like machine-learning or summarization). The API is simple: you feed in raw HTML and it returns a structured object with the extracted text and other fields. It supports caching internal representations to speed up repeated extractions. While its language support is best for English, it is still widely used in web-content-processing pipelines. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Gallop

    Gallop

    A framework for build smooth asynchronous iOS APP

    Gallop is a powerful rich text framework that supports Asynchronous display. It encapsulates CoreText's rich text functions and commonly used image processing capabilities. just need use LWTextStorage object instead of UILabel object and use LWImageStorage object instead of UIImageView object,Gallop will make sure your app scroll smoothly. You can also use Gallop to parse HTML pages and customize machining to parse HTML pages into iOS native pages.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7

    dpanalyzer

    postprocessing tool for Project Gutenberg Distributed Proofreaders

    Specialized tool for PostProcessors of books produced by Project Gutenberg Distributed Proofreaders. Parses the markup structure of a project file out of the formatting rounds; reports about the text structure found, and identifies markup errors. Planned future features: generation of normalized dp output by rejoining split paragraphs and moving around footnotes, renumbering of pages; conversion to basic LaTeX and basic HTML markup for further processing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Jericho HTML Parser is a java library allowing analysis and manipulation of parts of an HTML document, including server-side tags, while reproducing verbatim any unrecognised or invalid HTML.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    A java-based parser for parsing/grabbing web sites and other text or XML documents, based on a nondeterministic parser language, creating XML output. Also contains a few utility classes for HTML, CSV and text parsing, and additional character sets.
    Downloads: 0 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 10
    A stand-alone editor using Mediawiki markup language to generate HTML code. You can create and preview pages written using Mediawiki markup (i.e. Wikipedia pages) while off-line.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    The xslt2 script semAuth (semantic authoring) translates a freemind mindmap into an xhtml website and an RDF ontology
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    nanoWIME is a simple, flexible, easy-to-use javascript based WikiMarkup editor.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    HTMLtools includes several Java HTML tools for preparing Web pages. The HTMLtools program automates batch conversion of tab-delimited spreadsheet text files to HTML Web-page files, file & table editing, keyword mapping, templates, and more.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    RTF2HTML is a name for a cross-platform C++ library (DLL, OCX) and command-line utility, which is intended to convert documents from Rich Text Format (e.g. Word, OO Writer) to HTML. Its features are tiny size, speed, low mem usage and compact output.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    A JavaScript library for parsing Creole 1.0 wiki markup.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    A freely-available Markdown text-to-HTML translator, written in C++, intended for integration into C++ programs rather than for use in web applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    HTML Header Tree is a Mediawiki extension which organizes a page with a container tree according to its headers. Containers with specific headers are given specific styles.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    NOTE: unsupported - do you want to maintain this project? contact me! Markdownify is a HTML to Markdown converter written in PHP. See it as the successor to `html2text.php` since it has better design, better performance and less corner cases.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    xBB-code is the PHP library to parse and edit text formatted with BBCode.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    ZML, the Zeitung Markup Language, is a simple CMS for small newspapers. It was specifically designed to publish a student newspaper in print and on the Web. It uses LaTeX and XHTML. So far, it is documented in German only.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Web Content Management Element (WCME) is an editable area on browser page. User can change text and text styles (CSS) then persistently save changed content into web application resources. WCME is written on Javascript with Prototype.js library.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    JLoom is a JSP like template language for text generation - e.g. source code, HTML, XML. JLoom templates are modular encapsulated. Parameters can be any Java type, even Generics or Varargs. There is a plugin for Eclipse and a command line tool.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    RTF to HTML converter for use both with your applications and as a standalone tool. Small and fast. Processes tables better than any other tool I've seen.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 24
    Html Optimizer is an optimizer for optimize html files by shrink them.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Use Xilize to create XHTML pages or entire websites with just a plain-text editor. The markup is similar to Textile and extensible via BeanShell. Run as a jEdit plugin, from the command line, or embed in a Java program. Small, fast, easy-to-use.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB