raw free download - SourceForge

MarkPDFDown

A high-quality PDF to Markdown tool based on large language model

...The project focuses on extracting text, formatting, and structural information from complex PDF documents and transforming that information into clean Markdown that preserves the original hierarchy of headings, paragraphs, tables, and lists. By producing Markdown rather than raw text, the tool makes it easier to integrate documents into knowledge bases, documentation systems, or language model pipelines that rely on structured input. The software is particularly useful for developers working with technical documents, academic papers, or reports that need to be indexed, summarized, or processed by downstream AI systems.

Downloads: 10 This Week

Last Update: 2026-03-06

See Project

Instructor Python

Structured outputs for llms

Instructor is a Python library that bridges OpenAI responses with structured data validation using Pydantic models. It lets developers specify expected output schemas and ensures that the responses from OpenAI APIs are automatically parsed and validated against those models. This makes integrating LLMs into structured workflows safer and more predictable, especially in production applications.

Downloads: 1 This Week

Last Update: 2026-04-03

See Project

kg-gen

Knowledge Graph Generation from Any Text

...Instead of relying on traditional rule-based extraction techniques, KG-Gen uses language models to identify entities and their relationships, producing higher-quality graph structures from raw text. The framework addresses common problems in automatic knowledge graph construction, particularly sparsity and duplication of entities, by applying a clustering and entity-resolution process that merges semantically similar nodes. This allows the generated graphs to be denser, more coherent, and easier to use for downstream tasks such as retrieval-augmented generation, semantic search, and reasoning systems.

Downloads: 1 This Week

Last Update: 2026-03-09

See Project

Engram

A New Axis of Sparsity for Large Language Models

...Engineered with speed and memory efficiency in mind, Engram supports batched indexing, incremental updates, and custom distance metrics so developers can tailor search behaviors to their domain’s needs. In addition to raw similarity search, the project includes tools for clustering, ranking, and filtering results, enabling richer user experiences like “related content”, semantic auto-completion, and contextual filtering.

Downloads: 0 This Week

Last Update: 2026-01-28

See Project

llm.c

LLM training in simple, raw C/CUDA

llm.c is a minimalist, systems-level implementation of a small transformer-based language model in C that prioritizes clarity and educational value. By stripping away heavy frameworks, it exposes the core math and memory flows of embeddings, attention, and feed-forward layers. The code illustrates how to wire forward passes, losses, and simple training or inference loops with direct control over arrays and buffers. Its compact design makes it easy to trace execution, profile hotspots, and...

Downloads: 0 This Week

Last Update: 2025-10-15

See Project

LLM-Aided OCR Project

Enhances Tesseract OCR output using LLMs (local or API)

...The project addresses common OCR challenges such as distorted text, unusual fonts, historical documents, and complex layouts that often produce inaccurate results with standard OCR pipelines. The system first extracts raw text using OCR engines and then applies language models to analyze and correct recognition errors based on context. This AI-assisted correction process helps reconstruct missing characters, fix formatting mistakes, and produce more coherent text outputs. The project is particularly useful for digitizing historical documents, research papers, and scanned materials where traditional OCR often struggles. ...

Downloads: 0 This Week

Last Update: 2026-03-22

See Project

Deep Lake

Data Lake for Deep Learning. Build, manage, and query datasets

...Use one API to upload, download, and stream datasets to/from AWS S3/S3-compatible storage, GCP, Activeloop cloud, or local storage. Store images, audios and videos in their native compression. Deeplake automatically decompresses them to raw data only when needed, e.g., when training a model. Treat your cloud datasets as if they are a collection of NumPy arrays in your system's memory. Slice them, index them, or iterate through them.

Downloads: 0 This Week

Last Update: 2026-02-12

See Project

Canopy

Retrieval Augmented Generation (RAG) framework

Canopy is an open-source retrieval-augmented generation (RAG) framework developed by Pinecone to simplify the process of building applications that combine large language models with external knowledge sources. The system provides a complete pipeline for transforming raw text data into searchable embeddings, storing them in a vector database, and retrieving relevant context for language model responses. It is designed to handle many of the complex components required for a RAG workflow, including document chunking, embedding generation, prompt construction, and chat history management. Developers can use Canopy to quickly build chat systems that answer questions using their own data instead of relying solely on the pretrained knowledge of the language model. ...

Downloads: 7 This Week

Last Update: 2026-03-10

See Project

autollm

Ship RAG based LLM web apps in seconds

...The framework also includes built-in readers for multiple content sources such as PDFs, DOCX files, notebooks, websites, and other document types, which helps shorten the time between raw data and a working knowledge application.

Downloads: 0 This Week

Last Update: 2026-03-10

See Project

BIG-bench

Beyond the Imitation Game collaborative benchmark for measuring

...The suite provides a common JSON task format and an evaluation harness so research groups can contribute new tasks and reproduce results consistently. It emphasizes robustness analysis—looking at scale trends, calibration, and areas where models systematically fail—to guide model development beyond raw accuracy. BIG-bench is as much a community process as a dataset, encouraging open sharing of tasks and findings to keep evaluations fresh and comprehensive.

Downloads: 0 This Week

Last Update: 2025-10-09

See Project

Search Results for "raw"

Showing 10 open source projects for "raw"

MarkPDFDown

Instructor Python

kg-gen

Engram

llm.c

LLM-Aided OCR Project

Deep Lake

Canopy

autollm

BIG-bench

Search Results for "raw"

Showing 10 open source projects for "raw"

MarkPDFDown

Instructor Python

kg-gen

Engram

llm.c

LLM-Aided OCR Project

Deep Lake

Canopy

autollm

BIG-bench

Related Categories