Showing 16 open source projects for "html source extractor"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 1
    TXM

    TXM

    Unicode XML TEI text analysis platform

    TXM is a free and open-source cross-platform Unicode & XML based text analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in. DOWNLOAD LATEST VERSION OF TXM : http://textometrie.ens-lyon.fr/spip.php?rubrique61&lang=en TXM offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP...
    Leader badge
    Downloads: 16 This Week
    Last Update:
    See Project
  • 2
    FastQC

    FastQC

    A quality control analysis tool for high throughput sequencing data

    FastQC is a quality control analysis tool designed to spot potential problems in high throughput sequencing datasets. Its goal is to provide a simple way by which to check the quality of raw sequence data coming from high throughput sequencing pipelines. It does this by running a modular set of analyses on one or more raw sequence files in fastq or bam format. It then produces a report summarizing the results, and highlighting any areas where the library may appear unusual. This should then...
    Downloads: 42 This Week
    Last Update:
    See Project
  • 3

    Ghawwas_V4

    An open source system for Arabic corpora processing

    ...Accept Windows and UTF-8 character encoding g. Accept TXT, DOC, DOCX, RTF and HTML formats h. Export the processing results in CSV file format
    Downloads: 9 This Week
    Last Update:
    See Project
  • 4
    Universal Tag Finder

    Universal Tag Finder

    Is tool for query HTML content

    Universal Tag Finder is a tool to query against html content in the file system. For developers it allows to find out relevant elements during the troubleshooting and validating periods. Does not need to bother on massive regex to find out elements.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    More flexibility. More control.

    Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 5
    Simple-Scrape is a simple web-scraping library that allows for programmatic access to HTML code. No further techniques are needed and the library is very compact and thus easy to use.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6

    P-VCD

    Content-Based Video Copy Detection

    Software for Video Copy Detection. The software has been described in some scientific papers, e.g. http://dx.doi.org/10.1109/ICME.2011.6012212 and http://dx.doi.org/10.1007/s11042-011-0915-x. This software is the result of my PhD at the University of Chile and the participation at TRECVID Content-Based Copy Detection (CCD) evaluation task 2010 and 2011. More details in http://www-nlpir.nist.gov/projects/tv2011/tv2011.html#ccd and in...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    The National Library of New Zealand's Metadata Extraction Tool automatically extracts preservation-related metadata from digital files, then output that metadata in XML formats. It can be used through a graphical user interface or command-line interface. Please take the latest code from 'https://github.com/DIA-NZ/Metadata-Extraction-Tool.git'. The code on source forge will not be updated henceforth as it is moved to github.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 8

    HadStat

    HadStat is service on cloud,for data analysis using Hadoop MapReduce.

    HadStat is service on the cloud, allow you to analysis the data on the cloud and return the result in nice graph,this service is free, you can redistribute it and/or modify it under the terms of the GNU General Public License. this service using many technologies , like Hadoop mapreduce, HTML, PHP, Web Service applications, linux server, java, eclipse IDE, with many indicators:Simple moving average (SMA),Exponential moving average (EMA),Smoothed simple moving average (SMMA),Linear...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Optex Analyzer is a software to analyze and compare algorithms to solve approximately optimization problems. It has a GUI that allows select a set of input files containing raw algorithm results. The analysis is shown with tables and charts.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 10
    Web-as-corpus tools in Java. * Simple Crawler (and also integration with Nutch and Heritrix) * HTML cleaner to remove boiler plate code * Language recognition * Corpus builder
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Open extensible system analysis report tool for Java, based on numerous open source analysis initiatives. The XML/XSL batch-processing framework produces integrated HTML/SVG reports of the systems current state and the development over time.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Docgen plugin for Protege provides a quick export of all the content of an ontology (classes, instances and documentation) in various formats (html, pdf, fo...). Images, Graphs, URL are readyly included in reports.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    The aim of MIEX (Metadata and Information Extractor from small XML documents) is to create a wrapper for the Stanford Parser, to extract and store metadata (syntactic structures, relationships among words...) from simple XML documents.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    The aim of this project is to highlight the effect of lexical chain scoring metrics and keyword extraction techniques on summary generation. We present our own chain-based keyword extraction system using WordNet lexical database.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    RelEx is a semantic relationship extractor. It gives subject, object, possesive and other relationships between words in a sentence, part-of-speech, noun-number, verb tense, and gender tagging, and Hobbs anaphora (pronoun) resolution.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    A knowledgment management system written in Java under JBoss 4.2.3 Server, with richfaces 3.3.0BETA4. Including fileconversion from html to pdf and rich:editor component without special syntaxing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB