Showing 823 open source projects for "python data analysis"

View related business solutions
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 1
    GHAPACK is a suite of tools around the Generalized Hebbian Algorithm. The Generalized Hebbian Algorithm (GHA) is a neural-net-like algorithm that allows the eigen decomposition of a dataset to be "learned" from serially-presented data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Weka++ is a collection of machine learning and data mining algorithm implementations ported from Weka (http://www.cs.waikato.ac.nz/ml/weka/) from Java to C++, with enhancements for usability as embedded components.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    UGLi ML: The Undirected Graphical Library for Machine Learning. Easy, agile learning and inference from structured/relational data using Markov Networks and Conditional Random Fields.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Cellicone is a project to develop an artificial life organism with the necessary components to make it comparable to biological life as we know it. This includes components ranging from proteins to cells to organs to limbs, and many steps between.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Secure remote access solution to your private network, in the cloud or on-prem. Icon
    Secure remote access solution to your private network, in the cloud or on-prem.

    Deliver secure remote access with OpenVPN.

    OpenVPN is here to bring simple, flexible, and cost-effective secure remote access to companies of all sizes, regardless of where their resources are located.
    Get started — no credit card required.
  • 5
    ** IMPORTANT NOTICE ** 10 Feb 2006 Code is being moved to the SMI subversion repository (http://smi-protege.stanford.edu/svn/owl/trunk/) Project will continue to be open source. ProtegeOWL info at: http://protege.stanford.edu/overview/protege-owl.html
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6

    Cinefile

    A category-based approach to exploring film data.

    Cinefile is a prototype of a category-based method of database exploration. It allows the user to identify abstract categories of films by providing examples of category members, learns to classify films as belonging or not belonging to those categories, and provides a graphical interface for exploring and comparing categories. Cinefile is designed to work with data retrieved from the Internet Movie Database (imdb.com). This data is used for classification and is the subject of the category...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    xlm-roberta-base

    xlm-roberta-base

    Multilingual RoBERTa trained on 100 languages for NLP tasks

    xlm-roberta-base is a multilingual transformer model trained by Facebook AI on 2.5TB of filtered CommonCrawl data spanning 100 languages. It is based on the RoBERTa architecture and pre-trained using a masked language modeling (MLM) objective. Unlike models like GPT, which predict the next word, this model learns bidirectional context by predicting masked tokens, enabling robust sentence-level representations. xlm-roberta-base is particularly suited for cross-lingual understanding...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    twitter-roberta-base-sentiment-latest

    twitter-roberta-base-sentiment-latest

    RoBERTa model for English sentiment analysis on Twitter data

    twitter-roberta-base-sentiment-latest is a RoBERTa-based transformer model fine-tuned on over 124 million tweets collected between 2018 and 2021. Designed for sentiment analysis in English, it categorizes tweets as Negative, Neutral, or Positive. The model is optimized using the TweetEval benchmark and integrated with the TweetNLP ecosystem for seamless deployment. Its training emphasizes real-world, social media content, making it highly effective for analyzing informal or noisy text...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    mms-300m-1130-forced-aligner

    mms-300m-1130-forced-aligner

    CTC-based forced aligner for audio-text in 158 languages

    ... to the TorchAudio forced alignment API. Users can integrate it easily through the Python package ctc-forced-aligner, and it supports GPU acceleration via PyTorch. The alignment pipeline includes audio processing, emission generation, tokenization, and span detection, making it suitable for speech analysis, transcription syncing, and dataset creation. This model is especially useful for researchers and developers working with low-resource languages or building multilingual speech systems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 10
    Meta-Llama-3-8B-Instruct

    Meta-Llama-3-8B-Instruct

    Instruction-tuned 8B LLM by Meta for helpful, safe English dialogue

    ... available data and more than 10 million human-annotated examples, it excludes any Meta user data. The model is released under the Meta Llama 3 Community License, which allows commercial use for organizations with fewer than 700 million MAUs, and imposes clear use, attribution, and redistribution rules. Meta provides safety tools like Llama Guard 2 and Code Shield to help developers implement system-level safety in applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11

    text_summurization_abstractive_methods

    Multiple implementations for abstractive text summurization

    This repo is built to collect multiple implementations for abstractive approaches to address text summarization it is built to simply run on google colab , in one notebook so you would only need an internet connection to run these examples without the need to have a powerful machine , so all the code examples would be in a jupyter format , and you don't have to download data to your device as we connect these jupyter notebooks to google drive
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    bert-base-chinese

    bert-base-chinese

    BERT-based Chinese language model for fill-mask and NLP tasks

    ... applications, including text classification, named entity recognition, and sentiment analysis in Chinese. It uses the same structure as the BERT base uncased English model, but it is trained entirely on Chinese data. While robust, like other large language models, it may reflect or amplify existing biases present in its training data. Due to limited transparency around the dataset and evaluation metrics, users should test it thoroughly before deployment in sensitive contexts.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Llama-2-7b

    Llama-2-7b

    7B-parameter foundational LLM by Meta for text generation tasks

    Llama-2-7B is a foundational large language model developed by Meta as part of the Llama 2 family, designed for general-purpose text generation in English. It has 7 billion parameters and uses an optimized transformer-based, autoregressive architecture. Trained on 2 trillion tokens of publicly available data, it serves as the base for fine-tuned models like Llama-2-Chat. The model is pretrained only, meaning it is not optimized for dialogue but can be adapted for various natural language...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    GPT-2

    GPT-2

    GPT-2 is a 124M parameter English language model for text generation

    GPT-2 is a pretrained transformer-based language model developed by OpenAI for generating natural language text. Trained on 40GB of internet data from outbound Reddit links (excluding Wikipedia), it uses causal language modeling to predict the next token in a sequence. The model was trained without human labels and learns representations of English that support text generation, feature extraction, and fine-tuning. GPT-2 uses a byte-level BPE tokenizer with a vocabulary of 50,257 and handles...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    starcoder

    starcoder

    Code generation model trained on 80+ languages with FIM support

    ... natural language. While it is not an instruction-tuned model, it can act as a capable technical assistant when prompted appropriately. Developers can use it for general-purpose code generation, with fine control over prefix/middle/suffix tokens. The model has some limitations: generated code may contain bugs or licensing constraints, and attribution must be observed when output resembles training data. StarCoder is licensed under the BigCode OpenRAIL-M license.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16

    Savant

    Python Computer Vision & Video Analytics Framework With Batteries Incl

    Savant is an open-source, high-level framework for building real-time, streaming, highly efficient multimedia AI applications on the Nvidia stack. It helps to develop dynamic, fault-tolerant inference pipelines that utilize the best Nvidia approaches for data center and edge accelerators. Savant is built on DeepStream and provides a high-level abstraction layer for building inference pipelines. It is designed to be easy to use, flexible, and scalable. It is a great choice for building smart...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Nanonets-OCR-s

    Nanonets-OCR-s

    State-of-the-art image-to-markdown OCR model

    Nanonets-OCR-s is an advanced image-to-markdown OCR model that transforms documents into structured and semantically rich markdown. It goes beyond basic text extraction by intelligently recognizing content types and applying meaningful tags, making the output ideal for Large Language Models (LLMs) and automated workflows. The model expertly converts mathematical equations into LaTeX syntax, distinguishing between inline and display modes for accuracy. It also generates descriptive <img> tags...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Llama-3.1-8B-Instruct

    Llama-3.1-8B-Instruct

    Multilingual 8B-parameter chat-optimized LLM fine-tuned by Meta

    ...), and high-quality human and synthetic safety data. It excels at conversational AI, tool use, coding, and multilingual reasoning, achieving strong performance across a wide range of academic and applied benchmarks. The model is released under the Llama 3.1 Community License, which permits commercial use for organizations with fewer than 700 million monthly active users, provided they comply with Meta’s Acceptable Use Policy.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    phi-2

    phi-2

    Small, high-performing language model for QA, chat, and code tasks

    Phi-2 is a 2.7 billion parameter Transformer model developed by Microsoft, designed for natural language processing and code generation tasks. It was trained on a filtered dataset of high-quality web content and synthetic NLP texts created by GPT-3.5, totaling 1.4 trillion tokens. Phi-2 excels in benchmarks for common sense, language understanding, and logical reasoning, outperforming most models under 13B parameters despite not being instruction-tuned or aligned via RLHF. It performs best...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    whisper-large-v3-turbo

    whisper-large-v3-turbo

    Whisper-large-v3-turbo delivers fast, multilingual speech recognition

    Whisper-large-v3-turbo is a high-performance automatic speech recognition (ASR) and translation model developed by OpenAI, based on a pruned version of Whisper large-v3. It reduces decoding layers from 32 to 4, offering significantly faster inference with only minor degradation in accuracy. Trained on over 5 million hours of multilingual data, it handles speech transcription, translation, and language identification across 99 languages. It supports advanced decoding strategies like beam search...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Llama-2-7b-hf

    Llama-2-7b-hf

    Llama-2-7B is a 7B-parameter transformer model for text generation

    Llama-2-7B is a foundational large language model developed by Meta as part of the Llama 2 family, designed for general-purpose text generation tasks. It is a 7 billion parameter auto-regressive transformer trained on 2 trillion tokens from publicly available sources, using an optimized architecture without Grouped-Query Attention (GQA). This model is the pretrained version, intended for research and commercial use in English, and can be adapted for downstream applications such as...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    OpenVLA 7B

    OpenVLA 7B

    Vision-language-action model for robot control via images and text

    ... supports real-world robotics tasks, with robust generalization to environments seen in pretraining. Its actions include delta values for position, orientation, and gripper status, and can be un-normalized based on robot-specific statistics. OpenVLA is MIT-licensed, fully open-source, and designed collaboratively by Stanford, Berkeley, Google DeepMind, and TRI. Deployment is facilitated via Python and Hugging Face tools, with flash attention support for efficient inference.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    chronos-t5-small

    chronos-t5-small

    Time series forecasting model using T5 architecture with 46M params

    chronos-t5-small is part of Amazon’s Chronos family of time series forecasting models built on transformer-based language model architectures. It repurposes the T5 encoder-decoder design for time series data by transforming time series into discrete tokens via scaling and quantization. With 46 million parameters and a reduced vocabulary of 4096 tokens, this small variant balances performance with efficiency. Trained on both real-world and synthetic time series datasets, it supports...
    Downloads: 0 This Week
    Last Update:
    See Project
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.