Showing 16 open source projects for "data formats html/xhtml tidy"

View related business solutions
  • Our Free Plans just got better! | Auth0 by Okta Icon
    Our Free Plans just got better! | Auth0 by Okta

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your secuirty. Auth0 now, thank yourself later.
    Try free now
  • Top-Rated Free CRM Software Icon
    Top-Rated Free CRM Software

    216,000+ customers in over 135 countries grow their businesses with HubSpot

    HubSpot is an AI-powered customer platform with all the software, integrations, and resources you need to connect your marketing, sales, and customer service. HubSpot's connected platform enables you to grow your business faster by focusing on what matters most: your customers.
    Get started free
  • 1
    JuniCoder is a Java project that uses unicode as a base for decoding and encoding formats that invented workarounds to express characters not covered by ASCII. Decoders translate those inventions to unicode. Encoders encode to these inventions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 3

    eXtensible Text Framework (XTF)

    Framework for search and display of heterogenous document collections.

    NOTICE: This code repository is deprecated. Please visit https://github.com/cdlib/xtf for the latest updates. Obsolete Description: The eXtensible Text Framework (XTF) is an architecture that supports searching across collections of heterogeneous textual data (XML, PDF, HTML, text, and more), and the presentation of results and documents in a highly configurable manner. Includes highly customized versions of the proven open-source components Lucene and Saxon.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces.
    Downloads: 8 This Week
    Last Update:
    See Project
  • Payroll Services for Small Businesses | QuickBooks Icon
    Payroll Services for Small Businesses | QuickBooks

    Save 50% off for 3 months with QuickBooks Payroll when you Buy Now

    Easily pay your team and access powerful tools, employee benefits, and supportive experts with the #1 online payroll service provider. Manage payroll and access HR and employee services in one place. Pay your team automatically once your payroll setup is complete. We'll calculate, file, and pay your payroll taxes automatically.
    Learn More
  • 5
    Regain is a Java search engine based on Jakarta Lucene. It provides indexing and searching files for plenty of formats (HTML,XML,doc(x),xls(x),ppt(x),oo,PDF,RTF,mp3,mp4,Java). A TagLibrary eases integrating search results in your JSP based web page.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 6

    HXPath

    XPath HTML parser

    HXPath is a command line tool useful to extract data from HTML documents. HXPath can select sub trees, like the standard xpath tool, but is also able to read contents and attributes and output them in a bash friendly format. HTML Tidy and HTTP/HTTPS get are built in too.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    The aw script is written so that you can browse web sites through the command line by specifying where to look at in a concise manner. It can also be used to make an excerpt of web sites.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    A tool for autonomous and virtual topical data integration using the focused web-harvesting method.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    TinyURL PHP script, which shortens long URL's into a nice small one
    Downloads: 0 This Week
    Last Update:
    See Project
  • Secure remote access solution to your private network, in the cloud or on-prem. Icon
    Secure remote access solution to your private network, in the cloud or on-prem.

    Deliver secure remote access with OpenVPN.

    OpenVPN is here to bring simple, flexible, and cost-effective secure remote access to companies of all sizes, regardless of where their resources are located.
    Get started — no credit card required.
  • 10
    This project aims to provide an offline version of wikipedia, available from the web browser.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    This project is designed to optimize search engine results by managing your web server sitemaps. The software combines both command line processes and a web user interface with a highly configurable architecture.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Irudiko is a library written in C++ for generating Locality Sensitive Hashing sketches from any textual and web document. Mainly designed to work with HTML pages, it has also an optimization support for English or Italian documents.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    bluebery is an easy-to-use sql/php based content manager that provides php libraries and methods to use in your sites pages with which you can very easily access & print desired items, or an iteration of items that are stored through the bluebery web ui.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    The project Navigator aims at supporting automated gathering of dynamic information from third party web sites, using their web interface to post queries and to gather replies. Navigator is written in OS-independent java language.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Our intent with OCDB is to create an online database of one's cd collection and, going a little further, to yield insightful information on the overlaps between creators, labels, genres, etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    ICECrawler is a WWW crawler and map-generator intended to help understanding and analyzing links between websites and webdocuments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next