134 projects for "text processing" with 2 filters applied:

  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Access competitive interest rates on your digital assets.

    Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • Streamline Azure Security with Palo Alto Networks VM-Series Icon
    Streamline Azure Security with Palo Alto Networks VM-Series

    Centrally manage physical and virtualized firewalls with Panorama

    Improve your security posture and reduce incident response time. Use the VM-Series to natively analyze Azure traffic and dynamically drive policy updates based on workload changes.
    Learn more
  • 1
    newspaper4k

    newspaper4k

    Python library for scraping and analyzing online news articles easily

    ...Newspaper4k also includes natural language processing capabilities that can generate summaries and identify keywords from extracted article text. Newspaper4k supports both single-article extraction and full news site processing, allowing users to build sources representing entire publications and iterate through their articles. It maintains compatibility with the original project so that existing code written for newspaper3k can continue working with minimal changes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Browserless

    Browserless

    The headless Chrome/Chromium driver on top of Puppeteer

    ...It provides a high-level API for interacting with headless Chrome, allowing developers to perform operations such as generating PDFs, capturing screenshots, extracting text or HTML, and automating web navigation. The project is designed to act as a production-ready abstraction layer over Puppeteer, offering improved reliability, error handling, and scalability for real-world applications. Browserless includes built-in optimizations such as request blocking, automatic retries, and sensible defaults that improve performance when processing web pages. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    SuperSocket

    SuperSocket

    Extensible socket server application framework for .NET

    ...The framework supports multiple protocols, including TCP, UDP, and WebSocket, and gives developers a modular architecture for protocol parsing, session handling, command processing, and server hosting. It is suitable for chat servers, game servers, IoT gateways, telemetry services, industrial systems, and other real-time network applications. SuperSocket is designed to be flexible enough for custom binary or text protocols while still offering reusable abstractions for common server patterns. It is most useful for .NET teams that need robust networking infrastructure with room for domain-specific protocol logic.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    syslog-ng

    syslog-ng

    Log management solution that improves the performance of SIEM

    syslog-ng is the log management solution that improves the performance of your SIEM solution by reducing the amount and improving the quality of data feeding your SIEM. With syslog-ng Store Box, you can find the answer. Search billions of logs in seconds using full text queries with Boolean operators to pinpoint critical logs. syslog-ng Store Box provides secure, tamper-proof storage and custom reporting to demonstrate compliance. syslog-ng can deliver data from a wide variety of sources to...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 5
    news-please

    news-please

    Python tool for crawling and extracting structured data from news site

    ...It provides an integrated pipeline that crawls news sites, retrieves article pages, and extracts structured information such as headlines, authors, publication dates, and article text. news-please can recursively follow internal links and read RSS feeds to gather both recent and archived articles from a news outlet when given only the root URL of a site. It combines several established technologies and libraries to perform web crawling and content extraction, enabling reliable processing across a wide range of news sources. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Ada Class Library

    Ada Class Library

    Ada Class Library - an object orientated library for Ada.

    Text search and replace. Scripting (small tool programs). CGI scripts. Execution of external programs (incl. I/O redirection). Garbage Collection. Extendended Booch Components. CD-Recorder
    Leader badge
    Downloads: 59 This Week
    Last Update:
    See Project
  • 7
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    iText®, a JAVA PDF library

    iText®, a JAVA PDF library

    PDF Library for Developers

    iText is an open-source PDF library available for Java and .NET (C#). iText allows you to effortlessly generate and manipulate standards-compliant PDF documents with a powerful and feature-rich SDK. With iText, you can create archivable and accessible PDFs, split and merge documents, fill and flatten forms, digitally sign documents, and more. iText add-ons enable additional functionality, such as PDF creation from HTML templates, secure redaction, OCR, and much more. The latest...
    Leader badge
    Downloads: 202 This Week
    Last Update:
    See Project
  • 9
    Command-line/Ant-task/embeddable text file preprocessor. Macros, flow control, expressions. Recursive directory processing. Extensible in Java to display data from any data sources (as database). Can generate complete homepages (tree of HTML-s, images, etc.)
    Leader badge
    Downloads: 108 This Week
    Last Update:
    See Project
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    mzitu

    mzitu

    Python crawler that downloads image galleries and analyzes titles

    ...It focuses on automating the collection of large sets of images by programmatically parsing page content and iterating through gallery entries. mzitu also includes a simple analysis script that processes downloaded folder names to generate statistics and visualizations. Using text segmentation and frequency analysis, the project can create a word cloud representing common keywords found in the dataset. This makes the repository both a scraping example and a small data analysis experiment built around the collected content. Overall, mzitu serves as a learning-oriented implementation of Python web scraping, data processing, and visualization techniques.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    JUSH is a syntax highlighting component written in JavaScript. It highlights HTML, CSS, JS, PHP and SQL code embedded into each other. Beside syntax highlighting, it provides links to the documentation for all supported languages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    cpDetector is a proxy for codepage detection of documents. It delegates to multiple instances that try to detect the codepage by different techinques. A command line executeable is shipped that allows to sort documents by codepage.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 13
    uncaptcha

    uncaptcha

    Defeating Google's audio reCaptcha with 85% accuracy

    uncaptcha is an open-source proof-of-concept system designed to demonstrate vulnerabilities in Google’s audio reCAPTCHA challenges by automatically solving them using speech recognition techniques. The project uses browser automation to navigate to CAPTCHA challenges, extract audio files, and process them through multiple speech-to-text services. By combining outputs from several transcription engines, the system increases the likelihood of correctly identifying the spoken digits or phrases required to solve the challenge. It employs signal processing techniques such as segmenting audio clips into individual components before transcription, which improves accuracy in noisy or complex audio conditions. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14

    JS Vim

    jsvim 4: xvi online

    JSVim intends to provides Vim to any textarea. The javascript text editor core is now ported using Emscripten and written using web components. Stable archives at http://www.migniot.com/JSVim.html, roadmap available at http://home.migniot.com/smigniot/Zion/JSVim%20Roadmap.html
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Our mission is to develop open source solutions and provides professional support helps small and medium size companies meet the challenges of developing professional Arabic websites in the PHP/MySQL environment based on our experience in Arabic language processing, the library that we develop helps companies save time and increase productivity.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 16
    PDF Clown

    PDF Clown

    General-Purpose PDF Library for Java and .NET

    PDF Clown is a general-purpose Java and .NET library for manipulating PDF files through multiple abstraction layers, rigorously adhering to PDF 1.7 specification (ISO 32000-1). This project aims to provide a universal access to PDF files (creation, reading, editing, rendering...) through an accurate and elegant object-oriented API. * Features: http://pdfclown.org/overview/features/ * Overview: http://pdfclown.org/overview/architecture/ * Website: http://pdfclown.org/ * Blog:...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Jericho HTML Parser is a java library allowing analysis and manipulation of parts of an HTML document, including server-side tags, while reproducing verbatim any unrecognised or invalid HTML.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    IMPORTANT NOTE: This project has moved to Github: https://github.com/pkozelka/libxml2-pas Pascal units accessing the popular XML API from Daniel Veillard ( http://www.xmlsoft.org ). This should be usable at least from Kylix and Delphi, but hopefully also from other Pascal compilers (like freepascal).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19

    sitecheck

    Modular web site spider for web developers.

    More than just a link checker, sitecheck is a website spider (also known as a crawler) which can assist with SEO by testing an entire site plus both inbound links from search engines and outbound links to other sites for the following issues: looping redirects (HTTP 301/302), broken links (HTTP 404), server errors (HTTP 500), spelling mistakes, low readability scores (using the Flesch Reading Ease test), missing/empty/duplicate meta tags, duplicate content, slow page speed, W3C validation...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20

    Darkbot

    The IRC's Talking Robot

    [ Please read https://sourceforge.net/p/darkbot/news/2014/01/darkbots-revitalization/ ] Darkbot is a portable IRC chat robot written in the C language that can be taught responses to user inquiries, and even have conversations with them. Darkbot was originally created by Jason Hamilton as an aid for help channels on Intenet Relay Chat.
    Leader badge
    Downloads: 8 This Week
    Last Update:
    See Project
  • 21
    FCKeditor

    FCKeditor

    FCKeditor (retired)

    FCKeditor is the previous version of CKEditor and has been discontinued after version 2. The new CKEditor is redesigned from the ground up, offering more WYSIWYG text editing features, enhanced security and better integration. Don’t force yourself with retro FCKeditor. Switch to the new, cool CKEditor at ckeditor.com
    Downloads: 17 This Week
    Last Update:
    See Project
  • 22
    SEO & SEM - Marketing Text Writer

    SEO & SEM - Marketing Text Writer

    Open Source SEO & SEM Text Creation Tools for free Article Writer

    Open Source Tool for Search Engine Optimization (SEO & SEM) used for automatic content processing. These SEO Content Genrators and Article Writers based on Text Writer: https://www.artikelschreiber.com/en/ https://www.unaique.net/en/ https://www.unaique.com/ https://www.artikelschreiben.com/ https://www.buzzerstar.com/ https://googleduplicatecontentsolver.sourceforge.io/ https://inkassos.github.io/inkasso/ https://www.artikelschreiber.com/opensource/ https://www.sebastianenger.com/ https://www.artikelschreiber.com/marketing/review/ https://muckrack.com/markus-muller https://linktr.ee/textgenerator Code Contains: - Perl Source code, language databases and more
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    A tool to help finding the corresponding interwikis the when translating a wikipedia article from a given language to another one.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    JODReports is a solution for generating dynamic documents and reports in Java based on the OpenDocument format (ODF). Templates can be easily composed with a word processor such as OpenOffice.org Writer. Data sources include POJOs and XML.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 25
    Xelem is a compact Java-library to read and write Excel files of type SpreadsheetML. It can produce sophisticated, intricate and complex spreadsheets from within any Java program. And, since the release of xelem.2.0, it can read xml-spreadsheets.
    Downloads: 1 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB