Showing 15 open source projects for "data formats html/xhtml tidy"

View related business solutions
  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    The database for AI-powered applications.

    MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.
    Start Free
  • Keep company data safe with Chrome Enterprise Icon
    Keep company data safe with Chrome Enterprise

    Protect your business with AI policies and data loss prevention in the browser

    Make AI work your way with Chrome Enterprise. Block unapproved sites and set custom data controls that align with your company's policies.
    Download Chrome
  • 1
    tidytext

    tidytext

    Text mining using tidy tools

    tidytext brings tidy data principles to text mining by converting text into a tidy data frame format. It provides tools for tokenization, sentiment analysis, n‑gram creation, and term‑document matrices, enabling interoperability with dplyr, ggplot2, and other tidyverse workflows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Pandas Profiling

    Pandas Profiling

    Create HTML profiling reports from pandas DataFrame objects

    pandas-profiling generates profile reports from a pandas DataFrame. The pandas df.describe() function is handy yet a little basic for exploratory data analysis. pandas-profiling extends pandas DataFrame with df.profile_report(), which automatically generates a standardized univariate and multivariate report for data understanding. High correlation warnings, based on different correlation metrics (Spearman, Pearson, Kendall, Cramér’s V, Phik). Most common categories (uppercase, lowercase,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    LlamaParse

    LlamaParse

    Parse files for optimal RAG

    LlamaParse is a GenAI-native document parser that can parse complex document data for any downstream LLM use case (RAG, agents). Load in 160+ data sources and data formats, from unstructured, and semi-structured, to structured data (API's, PDFs, documents, SQL, etc.) Store and index your data for different use cases. Integrate with 40+ vector stores, document stores, graph stores, and SQL db providers.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 4
    Label Studio

    Label Studio

    Label Studio is a multi-type data labeling and annotation tool

    ...Support for multiple data types including images, audio, text, HTML, time-series, and video.
    Downloads: 22 This Week
    Last Update:
    See Project
  • Rent Manager Software Icon
    Rent Manager Software

    Landlords, multi-family homes, manufactured home communities, single family homes, associations, commercial properties and mixed portfolios.

    Rent Manager is award-winning property management software built for residential, commercial, and short-term-stay portfolios of any size. The program’s fully customizable features include a double-entry accounting system, maintenance management/scheduling, marketing integration, mobile applications, more than 450 insightful reports, and an API that integrates with the best PropTech providers on the market.
    Learn More
  • 5
    DB-GPT

    DB-GPT

    Revolutionizing Database Interactions with Private LLM Technology

    DB-GPT is an experimental open-source project that uses localized GPT large models to interact with your data and environment. With this solution, you can be assured that there is no risk of data leakage, and your data is 100% private and secure.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    DocWire SDK

    DocWire SDK

    Award-winning modern data processing SDK in C++20

    DocWire SDK, a standout C++20AI driven data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis. For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document format support and the ability to extract valuable insights from email boxes, databases, and websites using cutting-edge AI. DocWire SDK aims to...
    Leader badge
    Downloads: 8 This Week
    Last Update:
    See Project
  • 7
    TXM

    TXM

    Unicode XML TEI text analysis platform

    TXM is a free and open-source cross-platform Unicode & XML based text analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in. DOWNLOAD LATEST VERSION OF TXM : http://textometrie.ens-lyon.fr/spip.php?rubrique61&lang=en TXM offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 8
    Super-PDF-Editor-Lite

    Super-PDF-Editor-Lite

    World's most comprehensive, powerful, process-based PDF editor

    ...Easy pdf imposition, booklet, n ups pages, and more. OCR performs in pdf files, scanned pdf files and any pdf files. OCR performs in image files, and supports multiple image formats. Auto and manual image enhancement for better OCR accuracy and quality. Supports 165+ languages with three languages data set. Use Multiple Languages at once. International Languages: 127 Languages, High, Medium, and Fast Quality. Scanned Images (jpg, png, gif, tiff, bmp) Multi-Page and TIFF and GIF, Scanned PDFs.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    html2canvas

    html2canvas

    A JavaScript HTML screenshot renderer

    html2canvas is a JavaScript HTML renderer. The script provides you with the tools to take screenshots of webpages directly on the browser. The screenshot is based on the DOM and therefore, it may not be 100% accurate to the real representation, given that it is not an actual screenshot, but a type of screenshot built based on the available data and information of the page. The script renders such page as a canvas image, by reading the DOM and the different styles of the featured elements. It...
    Downloads: 3 This Week
    Last Update:
    See Project
  • D&B Hoovers is Your Sales Accelerator Icon
    D&B Hoovers is Your Sales Accelerator

    For sales teams that want to accelerate B2B sales with better data

    Speed up sales prospecting with the rich audience targeting capabilities of D&B Hoovers so you can spend more sales time closing.
    Learn More
  • 10
    WikiSQL

    WikiSQL

    A large annotated semantic parsing corpus for developing NL interfaces

    A large crowd-sourced dataset for developing natural language interfaces for relational databases. WikiSQL is the dataset released along with our work Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning. Regarding tokenization and Stanza, when WikiSQL was written 3-years ago, it relied on Stanza, a CoreNLP python wrapper that has since been deprecated. If you'd still like to use the tokenizer, please use the docker image. We do not anticipate switching...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    DeepLearningProject

    DeepLearningProject

    An in-depth machine learning tutorial

    This tutorial tries to do what most Most Machine Learning tutorials available online do not. It is not a 30 minute tutorial that teaches you how to "Train your own neural network" or "Learn deep learning in under 30 minutes". It's a full pipeline which you would need to do if you actually work with machine learning - introducing you to all the parts, and all the implementation decisions and details that need to be made. The dataset is not one of the standard sets like MNIST or CIFAR, you...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    RDFaMaker is a Java application that enable users to inser and modify semantic content in XHTML pages using RDFa extension
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Tested for Ubuntu Maverick - Create Audiobooks from eBooks, text or pictures. - Read eBooks or text aloud while scrolling through pages
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    POESIA= Public Opensource Environment for a Safer Internet Access an opensource Internet content filter (multimodal, mulitlingual) aimed for protection of youth (in schools...); partly funded by the European Commission
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Thema is an XML based data format (DTD) for thesauri, glossaries, lexicons, conceptual maps etc. up to ontologies. It contains publishing tools to convert into HTML, RDF etc. and to read different formats and is has a connection to the Semantic Web.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next