Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Artificial Intelligence
Large Language Models (LLM)
Search Results

Search Results for "memory" - Page 2

x

Sort By:

Relevance

Clear All Filters

OS

Linux 52
Mac 48
Windows 47
More...
BSD 31
ChromeOS 31

Category

Artificial Intelligence 53

License

OSI-Approved Open Source 49

Translations

English 1

Programming Language

Python 53
Unix Shell 4
TypeScript 1

Showing 53 open source projects for "memory"

View related business solutions

Large Language Models (LLM) Python Clear Filters & Widen Search

Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
Enterprise-grade ITSM, for every business
Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.

Try it Free
1

ChatGLM2-6B

ChatGLM2-6B: An Open Bilingual Chat LLM

...It upgrades the base model with GLM’s hybrid pretraining objective, 1.4 TB bilingual data, and preference alignment—delivering big gains on MMLU, CEval, GSM8K, and BBH. The context window extends up to 32K (FlashAttention), and Multi-Query Attention improves speed and memory use. The repo includes Python APIs, CLI & web demos, OpenAI-style/FASTAPI servers, and quantized checkpoints for lightweight local deployment on GPUs or CPU/MPS.

Downloads: 4 This Week

Last Update: 2 days ago
See Project
2

Qwen

The official repo of Qwen chat & pretrained large language model

Qwen is a series of large language models developed by Alibaba Cloud, consisting of various pretrained versions like Qwen-1.8B, Qwen-7B, Qwen-14B, and Qwen-72B. These models, which range from smaller to larger configurations, are designed for a wide range of natural language processing tasks. They are openly available for research and commercial use, with Qwen's code and model weights shared on GitHub. Qwen's capabilities include text generation, comprehension, and conversation, making it a...

1 Review

Downloads: 6 This Week

Last Update: 2026-03-05
See Project
3

Torch Pruning

DepGraph: Towards Any Structural Pruning

Torch-Pruning is an open-source toolkit designed to optimize deep neural networks by performing structural pruning directly within PyTorch models. The library focuses on reducing the size and computational cost of neural networks by removing redundant parameters and channels while maintaining model performance. It introduces a graph-based algorithm called DepGraph that automatically identifies dependencies between layers, allowing parameters to be pruned safely across complex architectures....

Downloads: 4 This Week

Last Update: 2026-03-05
See Project
4

Deep Lake

Data Lake for Deep Learning. Build, manage, and query datasets

...Deeplake automatically decompresses them to raw data only when needed, e.g., when training a model. Treat your cloud datasets as if they are a collection of NumPy arrays in your system's memory. Slice them, index them, or iterate through them.

Downloads: 5 This Week

Last Update: 2026-02-12
See Project
Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure
Native application identity and user-based security for your Azure cloud

Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.

Get a free trial
5

Agentic Context Engine

Make your agents learn from experience

...In this workflow, one component generates solutions, another reflects on outcomes, and a third curates useful knowledge so it can be reused in future interactions. This architecture allows agents to gradually build persistent operational memory without requiring additional training datasets or model retraining.

Downloads: 3 This Week

Last Update: 2026-05-07
See Project
6

Chitu

High-performance inference framework for large language models

...Chitu is designed to scale from small single-machine deployments to large distributed clusters that handle high volumes of concurrent inference requests. The system also includes performance optimizations for large models, including support for quantized formats and efficient computation operators that reduce memory usage and latency. Its architecture aims to support enterprise adoption by ensuring stable long-term operation under production workloads.

Downloads: 3 This Week

Last Update: 2026-06-04
See Project
7

SageAttention

NeurIPS2025 Spotlight] Quantized Attention

...The system achieves this by using low-precision numerical formats such as INT4, FP8, or INT8 to represent key matrices within the attention computation. These optimizations allow models to perform matrix operations faster and consume less memory during inference. SageAttention is designed to function as a plug-and-play replacement for standard attention implementations, enabling developers to accelerate existing models without modifying their architecture.

Downloads: 2 This Week

Last Update: 2026-03-08
See Project
8

Xtuner

A Next-Generation Training Engine Built for Ultra-Large MoE Models

...The engine supports training models with hundreds of billions of parameters and enables long-context training with sequence lengths reaching tens of thousands of tokens. Its architecture incorporates memory-efficient optimizations that allow researchers to train large models even when computational resources are limited. XTuner is also designed to integrate with modern AI ecosystems, supporting multimodal training, reinforcement learning optimization, and instruction tuning pipelines.

Downloads: 2 This Week

Last Update: 2026-03-04
See Project
9

OpenChronicle

Open-source, local-first memory for any tool-capable LLM agent

OpenChronicle is a knowledge management and storytelling platform designed to organize information into structured timelines and interconnected narratives. It allows users to create chronological records that link events, ideas, and entities in a cohesive format. The system emphasizes visualization and organization of complex information over time. It can be used for research, writing, or personal knowledge tracking. OpenChronicle supports extensibility, enabling customization of how data is...

Downloads: 0 This Week

Last Update: 2026-05-09
See Project
Ship Agents Faster
Transform your applications and workflows into powerful agentic systems at global scale.

Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.

Get Started Free
10

Intel LLM Library for PyTorch

Accelerate local LLM inference and finetuning

...The framework provides hardware-aware optimizations and low-precision computation techniques that significantly improve the performance of large language models while reducing memory consumption. IPEX-LLM supports a wide range of popular models, including architectures such as LLaMA, Mistral, Qwen, and other transformer-based systems. The library can integrate with common AI frameworks and serving tools such as Hugging Face Transformers, LangChain, and vLLM, allowing developers to incorporate optimized inference into existing pipelines.

Downloads: 1 This Week

Last Update: 2026-03-04
See Project
11

Gemini Fullstack LangGraph Quickstart

Get started w/ building Fullstack Agents using Gemini 2.5 & LangGraph

...The repository provides both a browser-based chat interface and a command-line script (cli_research.py) for executing research queries directly. For production deployment, the backend integrates with Redis and PostgreSQL to manage persistent memory, streaming outputs, & background task coordination.

Downloads: 2 This Week

Last Update: 2 days ago
See Project
12

whichllm

Find the local LLM that actually runs and performs best

whichllm is a command-line tool for finding local large language models that can realistically run on a user’s hardware. It detects the machine’s available resources, including GPU, CPU, memory, and storage, then recommends models based on practical fit rather than parameter count alone. The project is useful for users who are unsure which local LLM will perform well on their system. It focuses on real, recency-aware benchmarks so recommendations better reflect current model performance. whichllm is especially helpful for developers, AI hobbyists, and researchers comparing local inference options across NVIDIA, AMD, Apple Silicon, and CPU-only environments. ...

Downloads: 0 This Week

Last Update: 4 days ago
See Project
13

CogVLM2

GPT4V-level open-source multi-modal model based on Llama3-8B

...It supports bilingual interaction (Chinese and English) and has open-source versions optimized for dialogue and video comprehension. Notably, the Int4 quantized version allows efficient inference on GPUs with only 16GB of memory. The repository offers demos, API servers, fine-tuning examples, and integration with OpenAI API-compatible endpoints, making it accessible for both researchers and developers.

Downloads: 1 This Week

Last Update: 2 days ago
See Project
14

Engram

A New Axis of Sparsity for Large Language Models

...It provides utilities to generate embeddings from text or other structured data, index them using efficient approximate nearest neighbor algorithms, and perform real-time similarity queries even on large corpora. Engineered with speed and memory efficiency in mind, Engram supports batched indexing, incremental updates, and custom distance metrics so developers can tailor search behaviors to their domain’s needs. In addition to raw similarity search, the project includes tools for clustering, ranking, and filtering results, enabling richer user experiences like “related content”, semantic auto-completion, and contextual filtering.

Downloads: 0 This Week

Last Update: 2026-01-28
See Project
15

LLM-Pruner

On the Structural Pruning of Large Language Models

LLM-Pruner is an open-source framework designed to compress large language models through structured pruning techniques while maintaining their general capabilities. Large language models often require enormous computational resources, making them expensive to deploy and inefficient for many practical applications. LLM-Pruner addresses this issue by identifying and removing non-essential components within transformer architectures, such as redundant attention heads or feed-forward...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
16

Cradle framework

The Cradle framework is a first attempt at General Computer Control

...This approach allows agents to interact with any software interface without relying on specialized APIs or predefined automation scripts. The framework integrates reasoning, planning, and memory modules that help the agent understand its environment and execute long sequences of actions. Cradle agents are capable of performing tasks across a wide variety of environments, including computer applications and video games, demonstrating the generality of the approach. The architecture includes modules that allow agents to observe their environment, reflect on past actions, plan future steps, and accumulate useful skills for later tasks.

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
17

MatMul-Free LM

Implementation for MatMul-free LM

...The architecture relies on quantization-aware training and lightweight operations to replace conventional dense matrix multiplications with more efficient alternatives. These optimizations can significantly reduce memory consumption and potentially improve computational efficiency during both training and inference. The repository provides implementations of models at several parameter scales and includes tools for experimenting with the architecture using modern machine learning frameworks.

Downloads: 0 This Week

Last Update: 2026-03-05
See Project
18

MobileLLM

MobileLLM Optimizing Sub-billion Parameter Language Models

MobileLLM is a lightweight large language model (LLM) framework developed by Facebook Research, optimized for on-device deployment where computational and memory efficiency are critical. Introduced in the ICML 2024 paper “MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases”, it focuses on delivering strong reasoning and generalization capabilities in models under one billion parameters. The framework integrates several architectural innovations—SwiGLU activation, deep and thin network design, embedding sharing, and grouped-query attention (GQA)—to achieve a superior trade-off between model size, inference speed, and accuracy. ...

Downloads: 0 This Week

Last Update: 1 day ago
See Project
19

VibeThinker

Diversity-driven optimization and large-model reasoning ability

VibeThinker is a compact but high-capability open-source language model released by WeiboAI (Sina AI Lab). It contains about 1.5 billion parameters, far smaller than many “frontier” models, yet it is explicitly optimized for reasoning, mathematics, and code generation tasks rather than general open-domain chat. The innovation lies in its training methodology: the team uses what they call the Spectrum-to-Signal Principle (SSP), where a first stage emphasizes diversity of reasoning paths (the...

Downloads: 0 This Week

Last Update: 2025-11-19
See Project
20

VisualGLM-6B

Chinese and English multimodal conversational language model

VisualGLM-6B is an open-source multimodal conversational language model developed by ZhipuAI that supports both images and text in Chinese and English. It builds on the ChatGLM-6B backbone, with 6.2 billion language parameters, and incorporates a BLIP2-Qformer visual module to connect vision and language. In total, the model has 7.8 billion parameters. Trained on a large bilingual dataset — including 30 million high-quality Chinese image-text pairs from CogView and 300 million English pairs...

Downloads: 0 This Week

Last Update: 2 days ago
See Project
21

marqo

Tensor search for humans

A tensor-based search and analytics engine that seamlessly integrates with your applications, websites, and workflows. Marqo is a versatile and robust search and analytics engine that can be integrated into any website or application. Due to horizontal scalability, Marqo provides lightning-fast query times, even with millions of documents. Marqo helps you configure deep-learning models like CLIP to pull semantic meaning from images. It can seamlessly handle image-to-image, image-to-text and...

Downloads: 0 This Week

Last Update: 2026-04-02
See Project
22

Qwen2.5-Omni

Capable of understanding text, audio, vision, video

Qwen2.5-Omni is an end-to-end multimodal flagship model in the Qwen series by Alibaba Cloud, designed to process multiple modalities (text, images, audio, video) and generate responses both as text and natural speech in streaming real-time. It supports “Thinker-Talker” architecture, and introduces innovations for aligning modalities over time (for example synchronizing video/audio), robust speech generation, and low-VRAM/quantized versions to make usage more accessible. It holds...

Downloads: 0 This Week

Last Update: 2025-09-23
See Project
23

Grok-1

Open-source, high-performance Mixture-of-Experts large language model

...The accompanying GitHub repository provides JAX example code for loading and running the model. Due to its substantial size, utilizing Grok-1 requires a machine with significant GPU memory. The repository's MoE layer implementation prioritizes correctness over efficiency, avoiding the need for custom kernels. This is a full repo snapshot ZIP file of the Grok-1 code.

1 Review

Downloads: 31 This Week

Last Update: 2025-02-27
See Project
24

Mixtral offloading

Run Mixtral-8x7B models in Colab or consumer desktops

Mixtral-Offloading is an open-source project designed to enable efficient inference of large Mixture-of-Experts language models such as Mixtral-8x7B on hardware with limited GPU memory. The project implements techniques that allow model components to be dynamically moved between CPU memory and GPU memory during inference, significantly reducing the amount of GPU VRAM required to run the model. This approach takes advantage of the sparse activation properties of mixture-of-experts architectures, where only a subset of expert networks are used for each token during generation. ...

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
25

Ailice

AIlice is a fully autonomous, general-purpose AI agent

AIlice is an open-source autonomous AI agent framework built to function as a general-purpose assistant that can plan, decompose, and execute complex tasks through a structured multi-agent architecture. The project presents itself as a standalone assistant powered by open-source language models, with an internal design that treats user requests almost like executable programs rather than simple chat prompts. Its core IACT architecture allows the system to break large goals into smaller...

Downloads: 0 This Week

Last Update: 2026-03-15
See Project

Previous
1
You're on page 2
3
Next

Related Searches

grok

qwen

int4

ai database

ai model

ai agent that can do anything

Related Categories

Artificial Intelligence

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise