Showing 19 open source projects for "python text parser"

View related business solutions
  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    Build gen AI apps with an all-in-one modern database: MongoDB Atlas

    MongoDB Atlas provides built-in vector search and a flexible document model so developers can build, scale, and run gen AI apps without stitching together multiple databases. From LLM integration to semantic search, Atlas simplifies your AI architecture—and it’s free to get started.
    Start Free
  • No-Nonsense Code-to-Cloud Security for Devs | Aikido Icon
    No-Nonsense Code-to-Cloud Security for Devs | Aikido

    Connect your GitHub, GitLab, Bitbucket, or Azure DevOps account to start scanning your repos for free.

    Aikido provides a unified security platform for developers, combining 12 powerful scans like SAST, DAST, and CSPM. AI-driven AutoFix and AutoTriage streamline vulnerability management, while runtime protection blocks attacks.
    Start for Free
  • 1
    html-loader

    html-loader

    HTML Loader

    ... and attributes. By default, the parser in html-loader interprets content inside noscript tags as #text, so processing of content inside this tag will be ignored. A very common scenario is exporting the HTML into their own .html file, to serve them directly instead of injecting with javascript.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 2
    CssSelector Component

    CssSelector Component

    Converts CSS selectors to XPath expressions

    ... to an XPath equivalent. This XPath expression can then be used with other functions and classes that use XPath to find elements in a document. Not all CSS selectors can be converted to XPath equivalents. There are several CSS selectors that only make sense in the context of a web-browser. Pseudo-elements (:before, :after, :first-line, :first-letter) are not supported because they select portions of text rather than elements.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 3
    jsoup

    jsoup

    Java library for working with real-world HTML

    jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors. jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do. jsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup; jsoup will create a sensible parse tree. The parser will make every...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4

    HTML parser in Delphi

    A Delphi class with functions to read and dissect a HTML file

    THTMLdom is a (Delphi) class with functions to read a HTML source file and dissect it into a tree of THTMLelement. The attributes of the HTML tags are stored in the elements. Functions are provided to select elements on the basis of the attribute values or tag names. The structure of the tree can be shown and it can be rendered as plain text. The source is plain Delphi pascal, requiring a version that supports Tdictionary. There is no dependency on 3rd party units. The file to be parsed must...
    Downloads: 17 This Week
    Last Update:
    See Project
  • Deliver secure remote access with OpenVPN. Icon
    Deliver secure remote access with OpenVPN.

    Trusted by nearly 20,000 customers worldwide, and all major cloud providers.

    OpenVPN's products provide scalable, secure remote access — giving complete freedom to your employees to work outside the office while securely accessing SaaS, the internet, and company resources.
    Get started — no credit card required.
  • 5
    HTMLMinifier

    HTMLMinifier

    Javascript-based HTML compressor/minifier (with Node.js support)

    HTMLMinifier is a highly configurable, well-tested, JavaScript-based HTML minifier. Minifier options like sortAttributes and sortClassName won't impact the plain-text size of the output. However, they form long repetitive chains of characters that should improve compression ratio of gzip used in HTTP compression. SVG tags are automatically recognized, and when they are minified, both case-sensitivity and closing slashes are preserved, regardless of the minification settings used for the rest...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6

    dpanalyzer

    postprocessing tool for Project Gutenberg Distributed Proofreaders

    Specialized tool for PostProcessors of books produced by Project Gutenberg Distributed Proofreaders. Parses the markup structure of a project file out of the formatting rounds; reports about the text structure found, and identifies markup errors. Planned future features: generation of normalized dp output by rejoining split paragraphs and moving around footnotes, renumbering of pages; conversion to basic LaTeX and basic HTML markup for further processing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    htmlarea

    htmlarea

    Small, powerful, full featured WYSIWYG editor

    HTMLArea 4 is a browser based WYSIWYG editor that easily replaces the TEXTAREA in your web pages. It is written in JavaScript, and suitable for use in any modern web browser, and any page on your web site. Current version is 4.0-2016-08-29
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    Jericho HTML Parser is a java library allowing analysis and manipulation of parts of an HTML document, including server-side tags, while reproducing verbatim any unrecognised or invalid HTML.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9

    HTML XHTML Parser + XPath

    Delphi HTML XHTML Parser +XPath

    Delphi HTML Parser This module lets you work with HTML documents as DOM tree and use XPath for searching tags. It is very simple way to parse HTML. This tested with version Delphi XE5,6 Usage Add in Uses parser.pas; begin HtmlTxt:= ''; //here your html NodeList:= TNodeList.Create; ValueList:= TStringList.Create; DomTree:= TDomTree.Create; DomTreeNode:= DomTree.RootNode; If DomTreeNode.RunParse(HtmlTxt) then begin {your code example
    Downloads: 0 This Week
    Last Update:
    See Project
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 10
    A java-based parser for parsing/grabbing web sites and other text or XML documents, based on a nondeterministic parser language, creating XML output. Also contains a few utility classes for HTML, CSV and text parsing, and additional character sets.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    Wiko, the wiki compiler, compiles wiki like files into html and LaTeX, combining easy wiki syntax, your preferred non-web text editor and svn/cvs control to write static webs, cientific articles or even blogs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Html Assembler
    Html Assembler is a static site generator. It automatically integrates page content such as text and photos in a modifiable page template creating a complete set of html files ready for upload to your site.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Create or parse ANY Mark-up Language (HTML XML X3D VRML MathML XAML XDP CDA SCORM COLLADA XBRL) file or string into a simple and versatile MLDocument, MLElement, MLParameter hierarchical object model, written in VB 6 (Win32). Alternative to using DOM.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    A JavaScript library for parsing Creole 1.0 wiki markup.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    ZML, the Zeitung Markup Language, is a simple CMS for small newspapers. It was specifically designed to publish a student newspaper in print and on the Web. It uses LaTeX and XHTML. So far, it is documented in German only.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Markout is a pure-Java lightweight wiki markup parser based on John Gruber's Markdown.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    A Python tool for creating websites or project documentation. Pages can be stored as reST (text) or html. With a simple templating and macro system it can autogenerate index pages and navigation links. Facilities for multiple translations as well.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    PyBookmark manipulates bookmark files. It can sync files (no server required), merge, sort, remove duplicates, and check links. Its library pybookmarklib provides access to these operations, data structures, and parser for further extensibility.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    A .Net (C#) program to convert grammar based text (like code) to colorful, CSS based HTML. Based on the GOLD parser.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.