Search Results for "content analysis" - Page 2

Showing 291 open source projects for "content analysis"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 1
    Wiseflow

    Wiseflow

    Enhance any agent's browser use skill

    Wiseflow is an open-source information extraction and knowledge discovery system designed to collect, filter, and organize valuable information from large volumes of online content. The platform continuously monitors specified sources such as websites, social platforms, and other digital channels to identify relevant data according to user-defined interests or topics. By combining web crawling, content parsing, and large language model analysis, the system extracts concise insights from raw information streams and converts them into structured data that can be stored or analyzed. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    PocketFlow Tutorial Codebase Knowledge
    ...It supports both GitHub URL crawling and local directory analysis, and can tailor output tutorials to different languages, making it accessible for international developers.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    ai-renamer

    ai-renamer

    A Node.js CLI that uses Ollama and LM Studio models

    ai-renamer is a Node.js-based command-line tool that uses large language models to automatically rename files based on their content, enabling more meaningful and organized file management. Instead of relying on manual naming or metadata, the tool analyzes the actual content of files, including images, videos, and documents, to generate descriptive and context-aware filenames. It integrates with local and cloud-based AI providers such as Ollama, LM Studio, and OpenAI, allowing users to...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    Spider

    Spider

    High-performance Rust web crawler and scraper for large-scale data

    ...It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents. Spider can operate concurrently across many pages, allowing it to gather large datasets in a short period of time. Spider also provides mechanisms for subscribing to crawl events so developers can process page data such as URLs, status codes, or HTML content as it is discovered. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 5
    Zeek

    Zeek

    Zeek is a powerful network analysis framework

    Zeek has a long history in the open source and digital security worlds. Vern Paxson began developing the project in the 1990s under the name “Bro” as a means to understand what was happening on his university and national laboratory networks. Vern and the project’s leadership team renamed Bro to Zeek in late 2018 to celebrate its expansion and continued development. Zeek is not an active security device, like a firewall or intrusion prevention system. Rather, Zeek sits on a “sensor,” a...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 6
    BruteForceAI

    BruteForceAI

    Advanced LLM-powered brute-force tool combining AI intelligence

    BruteForceAI is an open-source security testing tool that applies large language models to the analysis of login forms and authentication flows in web applications. At a high level, the project uses AI to inspect HTML content, identify the relevant form elements, and automate selector discovery so that a tester does not need to hand-map every field before evaluation. It combines that analysis layer with automated credential testing workflows, framing itself as a more adaptive alternative to older brute-force tooling that depends heavily on manual configuration. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    MCP YouTube

    MCP YouTube

    A Model-Context Protocol Server for YouTube

    The YouTube MCP Server uses yt-dlp to download subtitles from YouTube videos and connects to claude.ai via the Model Context Protocol. It enables AI assistants to summarize YouTube videos by accessing their subtitles. ​
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    MinerU

    MinerU

    A high-quality tool for convert PDF to Markdown and JSON

    MinerU is an open-source, high-quality document extraction toolkit focused on converting PDFs (and other document formats) into structured Markdown and JSON. It leverages OCR and layout analysis to preserve semantic structure and metadata, ideal for research and data science workflows.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 9
    Napkin

    Napkin

    An Infinitely Large Napkin

    Napkin (also titled “An Infinitely Large Napkin”) is a lightweight, semi-formal introduction to higher mathematics, aimed at giving readers a bird’s-eye view over various mathematical fields. It is not a polished textbook full of full proofs; rather it offers clean definitions, theorem statements, intuitive motivations, and informal sketches of why things work, with the goal of building conceptual understanding. The coverage spans undergraduate and early graduate topics, designed to show how...
    Downloads: 0 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 10
    PDF4QT

    PDF4QT

    Open source PDF editor

    PDF4QT is open source PDF editor based on Qt framework. It contains a C++ library, applications for viewing/editing PDF documents, and a command line tool. PDF4QT is an open-source PDF editor for Windows/Linux. It is a modern solution for viewing/editing/rendering PDF documents, for users and developers alike. For developers, there is a C++ library and a command line tool for use in scripts. For users, there are four applications offering many features. The project is hosted on Github and...
    Downloads: 83 This Week
    Last Update:
    See Project
  • 11
    KubeClarity

    KubeClarity

    KubeClarity is a tool for detection and management of vulnerabilities

    KubeClarity is a tool for detection and management of Software Bill Of Materials (SBOM) and vulnerabilities of container images and filesystems. It scans both runtime K8s clusters and CI/CD pipelines for enhanced software supply chain security. Effective vulnerability scanning requires an accurate Software Bill Of Materials (SBOM) detection. KubeClarity includes a CLI that can be run locally and especially useful for CI/CD pipelines. It allows to analyze images and directories to generate...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    BrowserOS

    BrowserOS

    Agentic browser; privacy-first alternative to ChatGPT Atlas

    BrowserOS is an open-source, agentic web browser built on a Chromium base that integrates AI agents directly into the browsing experience. Rather than just doing standard browsing, it places AI intelligence at the core: you can connect your own API keys (for e.g., OpenAI, Anthropic, Google Gemini) or run local models (via e.g., Ollama) so that your browsing data and automation stay on your machine — privacy and control are emphasized throughout. The interface remains familiar to users of...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 13
    DeepWiki Open

    DeepWiki Open

    AI-Powered Wiki Generator for GitHub/Gitlab/Bitbucket Repositories

    DeepWiki Open is an open-source, AI-powered wiki generator that automatically creates fully navigable, richly structured wiki documentation for GitHub, GitLab, or Bitbucket repositories by combining code analysis, vector embeddings, retrieval-augmented generation (RAG), and visualization tools. Users can enter a repository URL and the system will clone the project, build semantic embeddings of its codebase, extract architecture and relationships, generate human-readable documentation, and...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    Vanilla.PDF

    Vanilla.PDF

    Cross-platform SDK for creating and modifying PDF documents

    ...The SDK offers full cross-platform support including Windows, Linux, macOS, and Android, with builds available for major compilers and architectures. Vanilla.PDF supports advanced PDF features such as adding CMS (PKCS#7) digital signatures, modifying content streams and metadata, and working with encryption and permissions based on standard PDF security models. It includes tools for parsing PDF internals like cross-reference tables and objects, providing fine-grained document analysis capabilities. The project is unit-tested with continuous integration pipelines, supporting sanitizers for enhanced code quality and stability.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 15
    Archivematica

    Archivematica

    Free and open-source digital preservation system

    Archivematica is a web- and standards-based, open-source application which allows your institution to preserve long-term access to trustworthy, authentic, and reliable digital content. Our target users are archivists, librarians, and anyone working to preserve digital objects. You are free to copy, modify, and distribute Archivematica with attribution under the terms of the AGPLv3 license. Archivematica is an open-source application based on recognized standards that makes it possible to...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    LOL HTML

    LOL HTML

    Low output latency streaming HTML parser/rewriter with CSS API

    Low Output Latency streaming HTML rewriter/parser with CSS-selector based API. It is designed to modify HTML on the fly with minimal buffering. It can quickly handle very large documents, and operate in environments with limited memory resources. The crate serves as a back-end for the HTML rewriting functionality of Cloudflare Workers, but can be used as a standalone library with a convenient API for a wide variety of HTML rewriting/analysis tasks. The parser switches back to the tag scanner...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 17
    Weibo Crawler

    Weibo Crawler

    Python crawler for collecting and downloading Sina Weibo user data

    ...It automates the process of gathering posts, user profile details, and engagement metrics from one or more target accounts. weibo-crawler can extract comprehensive information about users, including profile attributes such as nickname, follower count, following count, and account metadata. It also captures detailed data about each post, including the content, publishing time, topics, mentions, likes, reposts, and comments. In addition to textual data, the project can download original media from posts, such as images, videos, and Live Photo content. Collected data can be exported to structured formats such as CSV or JSON or stored in databases for further analysis and research. It supports incremental crawling so users can periodically collect only newly published posts, making it useful for ongoing monitoring or dataset updates.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Perf Book

    Perf Book

    The book "Performance Analysis and Tuning on Modern CPU"

    This project is a practical guide to performance analysis and tuning on modern CPUs, bridging microarchitecture details with hands-on profiling. It explains how caches, TLBs, prefetchers, branch predictors, and out-of-order execution influence real program speed, then connects those concepts to concrete optimization strategies. Readers learn how to design trustworthy benchmarks, avoid measurement traps (warmup, turbo, frequency scaling), and interpret hardware performance counters. The book...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 19
    Open Semantic Search

    Open Semantic Search

    Open source semantic search and text analytics for large document sets

    Open Semantic Search is an open source research and analytics platform designed for searching, analyzing, and exploring large collections of documents using semantic search technologies. It provides an integrated search server combined with a document processing pipeline that supports crawling, text extraction, and automated analysis of content from many different sources. Open Semantic Search includes an ETL framework that can ingest documents, process them through analysis steps, and enrich the data with extracted information such as named entities and metadata. It also supports optical character recognition to extract text from images and scanned documents, including images embedded inside PDF files. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 20
    AutoClip

    AutoClip

    AI-powered video clipping and highlight generation

    AutoClip is an open-source, AI-powered video processing system designed to automate the extraction of “highlight” segments from full-length videos — ideal for creators who want to generate bite-sized clips, compilations, or highlight reels without manually sifting through hours of footage. The system supports downloading videos from major platforms (e.g. YouTube, Bilibili), or accepting local uploads, and then applies AI analysis to identify segments worth clipping based on content (e.g. high energy moments, speech, or other heuristics). Once highlights are identified, AutoClip can automatically cut those segments and optionally assemble them into a compilation, thus greatly reducing manual video editing effort. It uses a modern web application stack with a front end (React + TypeScript) for user interaction and a back end that handles downloading, processing, clipping, and queue management, allowing real-time progress feedback and easy deployment, e.g. via Docker.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 21
    yek

    yek

    Serialize repositories into LLM-ready context w/ smart prioritization

    Yek is a Rust-based CLI tool designed to serialize text-based files from a repository or directory into a single structured output for large language model use. It scans projects using .gitignore rules to exclude irrelevant files and automatically filters out binary or oversized content. Yek prioritizes files based on Git history, placing more important content later in the output to align with how language models process context. Yek supports multiple directories, individual files, and glob patterns, making it flexible for different workflows. It can stream output when piped or save results to a temporary file, depending on usage. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    Gitingest

    Gitingest

    Create prompt-friendly codebase digests from any Git repository URL

    Gitingest is a developer utility that converts an entire Git repository into a structured, prompt-friendly text digest suitable for use with large language models. It analyzes a repository and produces a consolidated textual representation that includes the file structure and code content in an organized format. This makes it easier to provide meaningful code context when working with AI systems that require compact, readable inputs. Developers can generate these digests from either a local...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    Pix2Text

    Pix2Text

    Open-Source Python3 tool for recognizing layouts, tables, and math

    An Open-Source Python3 tool for recognizing layouts, tables, math formulas, and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported. Pix2Text (P2T) aims to be a free and open-source Python alternative to Mathpix, and it can already accomplish Mathpix's core functionality. Pix2Text (P2T) can recognize layouts, tables, images, text, and mathematical...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 24
    ModSecurity

    ModSecurity

    Cross platform web application firewall (WAF) engine for Apache

    ...It has a robust event-based programming language that provides protection from a range of attacks against web applications and allows for HTTP traffic monitoring, logging and real-time analysis. Libmodsecurity is one component of the ModSecurity v3 project. The library codebase serves as an interface to ModSecurity Connectors taking in web traffic and applying traditional ModSecurity processing. In general, it provides the capability to load/interpret rules written in the ModSecurity SecRules format and apply them to HTTP content provided by your application via Connectors. ...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 25
    Docling

    Docling

    Get your documents ready for gen AI

    Docling is an open-source document processing toolkit built to prepare diverse content types for modern generative AI and data workflows. The project focuses on converting and parsing many document formats into a unified structured representation that downstream systems can easily consume. It supports advanced PDF understanding, including layout detection, table extraction, and reading order analysis, enabling high-fidelity document intelligence pipelines.
    Downloads: 8 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB