python q learning free download

Showing 19 open source projects for "python q learning"

View related business solutions

AI Models Python Clear Filters & Widen Search

Get Avast Free Antivirus | Your top-rated shield against malware and online scams
Boost your PC's defense against cyberthreats and web-based scams.

Our antivirus software scans for security and performance issues and helps you to fix them instantly. It also protects you in real time by analyzing unknown files before they reach your desktop PC or laptop — all for free.

Free Download
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
1

DeepSeek R1

Open-source, high-performance AI model with advanced reasoning

... integrates large-scale reinforcement learning (RL) without relying on supervised fine-tuning, enabling the model to develop advanced reasoning capabilities. This approach has resulted in performance comparable to leading models like OpenAI's o1, while maintaining cost-efficiency. To further support the research community, DeepSeek has released distilled versions of the model based on architectures such as LLaMA and Qwen.

1 Review

Downloads: 46 This Week

Last Update: 5 days ago
See Project
2

DeepSeek-V3

Powerful AI language model (MoE) optimized for efficiency/performance

... supervised fine-tuning and reinforcement learning to fully realize its capabilities. Evaluations indicate that it outperforms other open-source models and rivals leading closed-source models, achieving this with a training duration of 55 days on 2,048 Nvidia H800 GPUs, costing approximately $5.58 million.

1 Review

Downloads: 44 This Week

Last Update: 5 days ago
See Project
3

MedicalGPT

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training

MedicalGPT training medical GPT model with ChatGPT training pipeline, implementation of Pretraining, Supervised Finetuning, Reward Modeling and Reinforcement Learning. MedicalGPT trains large medical models, including secondary pre-training, supervised fine-tuning, reward modeling, and reinforcement learning training.

Downloads: 12 This Week

Last Update: 2025-02-16
See Project
4

Phi-3-MLX

Phi-3.5 for Mac: Locally-run Vision and Language Models

Phi-3-Vision-MLX is an Apple MLX (machine learning on Apple silicon) implementation of Phi-3 Vision, a lightweight multi-modal model designed for vision and language tasks. It focuses on running vision-language AI efficiently on Apple hardware like M1 and M2 chips.

Downloads: 19 This Week

Last Update: 2025-03-13
See Project
Simplify IT and security with a single endpoint management platform
Automate the hardest parts of IT

NinjaOne automates the hardest parts of IT, delivering visibility, security, and control over all endpoints for more than 20,000 customers. The NinjaOne automated endpoint management platform is proven to increase productivity, reduce security risk, and lower costs for IT teams and managed service providers. The company seamlessly integrates with a wide range of IT and security technologies. NinjaOne is obsessed with customer success and provides free and unlimited onboarding, training, and support.

Learn More
5

DB-GPT

Revolutionizing Database Interactions with Private LLM Technology

DB-GPT is an experimental open-source project that uses localized GPT large models to interact with your data and environment. With this solution, you can be assured that there is no risk of data leakage, and your data is 100% private and secure.

Downloads: 14 This Week

Last Update: 2025-06-13
See Project
6

VALL-E

PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)

...-E emerges in-context learning capabilities and can be used to synthesize high-quality personalized speech with only a 3-second enrolled recording of an unseen speaker as an acoustic prompt. Experiment results show that VALL-E significantly outperforms the state-of-the-art zero-shot TTS system in terms of speech naturalness and speaker similarity. In addition, we find VALL-E could preserve the speaker's emotion and acoustic environment of the acoustic prompt in synthesis.

Downloads: 19 This Week

Last Update: 2023-04-14
See Project
7

GPT-NeoX

Implementation of model parallel autoregressive transformers on GPUs

This repository records EleutherAI's library for training large-scale language models on GPUs. Our current framework is based on NVIDIA's Megatron Language Model and has been augmented with techniques from DeepSpeed as well as some novel optimizations. We aim to make this repo a centralized and accessible place to gather techniques for training large-scale autoregressive language models, and accelerate research into large-scale training. For those looking for a TPU-centric codebase, we...

Downloads: 2 This Week

Last Update: 2023-03-23
See Project
8

FinGPT

Open-Source Financial Large Language Models!

FinGPT is an open-source large language model tailored specifically for financial tasks. Developed by AI4Finance Foundation, it is designed to assist with various financial applications, such as forecasting, financial sentiment analysis, and portfolio management. FinGPT has been trained on a diverse range of financial datasets, making it a powerful tool for finance professionals looking to leverage AI for data-driven decision-making. The model is freely available on platforms like Hugging...

1 Review

Downloads: 36 This Week

Last Update: 2025-03-03
See Project
9

Warlock-Studio

AI-suite for image and video upscaling and enhancement.

An open-source desktop application for AI-driven media enhancement, integrating state-of-the-art models for upscaling, restoration, and frame interpolation. Version 2.2 marks a major leap forward in stability and reliability, focused on ensuring your processing jobs run smoothly and complete successfully. Key enhancements: Enhanced Stability: Features a new logging system, proactive environment checks, and safe process management to prevent crashes and facilitate debugging. Resilient...

Downloads: 27 This Week

Last Update: 12 hours ago
See Project
Build Securely on Azure with Proven Frameworks
Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.

Download Now
10

GLM-4-32B-0414

Open Multilingual Multimodal Chat LMs

... with reinforcement learning and human preference alignment for improved instruction-following and function calling. Variants like GLM-Z1-32B-0414 offer deep reasoning and advanced mathematical problem-solving, while GLM-Z1-Rumination-32B-0414 specializes in long-form, complex research-style writing using scaled reinforcement learning and external search tools. Despite its large capacity, the model supports user-friendly local deployment and efficient fine-tuning with available scripts.

Downloads: 2 This Week

Last Update: 2025-06-27
See Project
11

chatglm-6b

Bilingual 6.2B parameter chatbot optimized for Chinese and English

ChatGLM-6B is a 6.2 billion parameter bilingual language model developed by THUDM, based on the General Language Model (GLM) framework. It is optimized for natural and fluent dialogue in both Chinese and English, supporting applications in conversational AI, question answering, and assistance. Trained on approximately 1 trillion tokens, the model benefits from supervised fine-tuning, feedback self-training, and reinforcement learning with human feedback to align its outputs with human...

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
12

Mistral-7B-Instruct-v0.2

Instruction-tuned 7B model for chat and task-oriented text generation

Mistral-7B-Instruct-v0.2 is a fine-tuned version of the Mistral-7B-v0.2 language model, designed specifically for following instructions in a conversational format. It supports a 32k token context window, enabling more detailed and longer interactions compared to its predecessor. The model is trained to respond to user prompts formatted with [INST] and [/INST] tags, and it performs well in instruction-following tasks like Q&A, summarization, and explanations. It can be used via the official...

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
13

DeepSWE-Preview

State-of-the-art RL-trained coding agent for complex SWE tasks

DeepSWE-Preview is a 32.8B parameter open-source coding agent trained solely with reinforcement learning (RL) to perform complex software engineering (SWE) tasks. Built on top of Qwen3-32B, it achieves 59% accuracy on the SWE-Bench-Verified benchmark—currently the highest among open-weight models. The model navigates and edits large codebases using tools like a file editor, bash execution, and search, within the R2E-Gym environment. Its training emphasizes sparse reward signals, test-time...

Downloads: 0 This Week

Last Update: 2025-07-04
See Project
14

MiniMax-M1

Open-weight, large-scale hybrid-attention reasoning model

... for very long sequences. Trained using large-scale reinforcement learning on diverse tasks, it excels in mathematics, software engineering, agentic tool use, and long-context understanding benchmarks. It outperforms other open-weight models like DeepSeek R1 and Qwen3-235B on complex reasoning and coding challenges. MiniMax-M1 is available in two versions with 40K and 80K token thinking budgets, offering scalable performance based on your application needs.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
15

Meta-Llama-3-8B-Instruct

Instruction-tuned 8B LLM by Meta for helpful, safe English dialogue

Meta-Llama-3-8B-Instruct is an instruction-tuned large language model from Meta’s Llama 3 family, optimized for safe and helpful English dialogue. It uses an autoregressive transformer architecture with Grouped-Query Attention (GQA) and supports an 8k token context length. Fine-tuned using supervised learning and reinforcement learning with human feedback (RLHF), the model achieves strong results on benchmarks like MMLU, GSM8K, and HumanEval. Trained on over 15 trillion tokens of publicly...

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
16

Llama-2-7b-chat-hf

Dialogue-optimized 7B language model for safe and helpful chatting

Llama-2-7b-chat-hf is a fine-tuned large language model developed by Meta, designed specifically for dialogue use cases. With 7 billion parameters and built on an optimized transformer architecture, it uses supervised fine-tuning and reinforcement learning with human feedback (RLHF) to enhance helpfulness, coherence, and safety. It outperforms most open-source chat models and rivals proprietary systems like ChatGPT in human evaluations. Trained on 2 trillion tokens of public text and over 1...

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
17

Llama-3.1-8B-Instruct

Multilingual 8B-parameter chat-optimized LLM fine-tuned by Meta

Llama-3.1-8B-Instruct is a multilingual, instruction-tuned language model developed by Meta, designed for high-quality dialogue generation across eight languages, including English, Spanish, French, German, Italian, Portuguese, Hindi, and Thai. It uses a transformer-based, autoregressive architecture with Grouped-Query Attention and supports a 128k token context window. The model was fine-tuned using a combination of supervised fine-tuning (SFT), reinforcement learning with human feedback (RLHF...

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
18

Llama-3.3-70B-Instruct

Llama-3.3-70B-Instruct is a multilingual AI optimized for helpful chat

... and refined using both supervised fine-tuning and reinforcement learning with human feedback. It supports long context windows up to 128k tokens and enables advanced tool use for function calling and integration. Llama-3.3 is distributed under the Llama Community License, allowing commercial use within specific limits, and requires proper attribution and adherence to Meta's Acceptable Use Policy.

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
19

Llama-2-70b-chat-hf

Llama-2-70B-Chat is Meta’s largest fine-tuned open-source chat LLM

Llama-2-70B-Chat is Meta’s largest fine-tuned large language model, optimized for dialogue and aligned using supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF). It features 70 billion parameters and uses a transformer architecture with grouped-query attention (GQA) to improve inference scalability. Trained on 2 trillion tokens from publicly available sources and over a million human-annotated examples, the model outperforms most open-source chat models and rivals...

Downloads: 0 This Week

Last Update: 2025-06-27
See Project