Showing 7 open source projects for "java html parser"

View related business solutions
  • Our Free Plans just got better! | Auth0 by Okta Icon
    Our Free Plans just got better! | Auth0 by Okta

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your secuirty. Auth0 now, thank yourself later.
    Try free now
  • Bright Data - All in One Platform for Proxies and Web Scraping Icon
    Bright Data - All in One Platform for Proxies and Web Scraping

    Say goodbye to blocks, restrictions, and CAPTCHAs

    Bright Data offers the highest quality proxies with automated session management, IP rotation, and advanced web unlocking technology. Enjoy reliable, fast performance with easy integration, a user-friendly dashboard, and enterprise-grade scaling. Powered by ethically-sourced residential IPs for seamless web scraping.
    Get Started
  • 1
    html-to-markdown

    html-to-markdown

    Convert HTML to Markdown. Even works with entire websites

    Convert HTML into Markdown with Go. It is using an HTML Parser to avoid the use of regexp as much as possible. That should prevent some weird cases and allows it to be used for cases where the input is totally unknown.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    goquery

    goquery

    A little like that j-thing, only in Go

    goquery brings a syntax and a set of features similar to jQuery to the Go language. It is based on Go's net/HTML package and the CSS Selector library Cascadia. Since the net/html parser returns nodes, and not a full-featured DOM tree, jQuery's stateful manipulation functions (like height(), css(), and detach()) have been left off. Also, because the net/HTML parser requires UTF-8 encoding, so does goquery: it is the caller's responsibility to ensure that the source document provides UTF-8...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    crawley

    crawley

    The unix-way web crawler

    Crawls web pages and prints any link it can find. Fast HTML SAX-parser (powered by golang.org/x/net/html) Small (below 1500 SLOC), idiomatic, 100% test-covered codebase. Grabs most of useful resources URLs (pics, videos, audios, forms, etc...) Found URLs are streamed to stdout and guaranteed to be unique (with fragments omitted) Scan depth (limited by starting host and path, by default - 0) can be configured. Can crawl rules and sitemaps from robots.txt. Brute mode - scan HTML comments for URLs...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    bluemonday

    bluemonday

    Fast golang HTML sanitizer (inspired by the OWASP Java HTML Sanitizer

    bluemonday is an HTML sanitizer implemented in Go. It is fast and highly configurable. bluemonday takes untrusted user-generated content as an input, and will return HTML that has been sanitized against an allowlist of approved HTML elements and attributes so that you can safely include the content in your web page. If you accept user-generated content, and your server uses Go, you need bluemonday. It protects sites from XSS attacks. There are many vectors for an XSS attack and the best way...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Deliver secure remote access with OpenVPN. Icon
    Deliver secure remote access with OpenVPN.

    Trusted by nearly 20,000 customers worldwide, and all major cloud providers.

    OpenVPN's products provide scalable, secure remote access — giving complete freedom to your employees to work outside the office while securely accessing SaaS, the internet, and company resources.
    Get started — no credit card required.
  • 5
    Horusec

    Horusec

    Open source tool that improves identification of vulnerabilities

    Horusec is an open source tool that performs a static code analysis to identify security flaws during the development process. Currently, the languages for analysis are C#, Java, Kotlin, Python, Ruby, Golang, Terraform, Javascript, Typescript, Kubernetes, PHP, C, HTML, JSON, Dart, Elixir, Shell, Nginx. The tool has options to search for key leaks and security flaws in all your project's files, as well as in Git history. Horusec can be used by the developer through the CLI and by the DevSecOps...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    protoc-gen-doc

    protoc-gen-doc

    Documentation generator plugin for Google Protocol Buffers

    This is a documentation generator plugin for the Google Protocol Buffers compiler (protoc). The plugin can generate HTML, JSON, DocBook, and Markdown documentation from comments in your .proto files. There is a Docker image available (docker pull pseudomuto/protoc-gen-doc) that has everything you need to generate documentation from your protos. The plugin is invoked by passing the --doc_out, and --doc_opt options to the protoc compiler. The docker image has two volumes: /out and /protos which...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    go_spider

    go_spider

    An awesome Go concurrent Crawler(spider) framework

    An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl components only. Spider gets a Request in Scheduler that has url to be crawled. Then Downloader downloads the result(html, json, jsonp, text) of the Request. The result is saved in Page for parsing in PageProcesser. Html parsing is based on goquery package. Json parsing is based on simple JSON package. Jsonp...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next