Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Artificial Intelligence Software
Search Results

Search Results for "data quality" - Page 5

x

Sort By:

Relevance

Clear All Filters

OS

Linux 133
Mac 127
Windows 125
More...
BSD 63
ChromeOS 63
Mobile Operating Systems 2
Desktop Operating Systems 1

Category

Artificial Intelligence 133
Business 15
Software Development 10
Formats and Protocols 7
System 6
Education 2
Multimedia 2
Scientific/Engineering 2
Internet 1

License

OSI-Approved Open Source 115
Creative Commons Attribution License 2
Other License 2

Translations

English 8
Brazilian Portuguese 1
Chinese (Simplified) 1
Chinese (Traditional) 1

Programming Language

Python 85
TypeScript 15
C++ 6
JavaScript 5
More...
Java 4
C 2
Perl 2
Rust 2
C# 1
Go 1
Scala 1

Status

Production/Stable 4
Beta 3
Planning 2
Pre-Alpha 1
More...
Alpha 1

Showing 133 open source projects for "data quality"

View related business solutions

Artificial Intelligence Linux Clear Filters & Widen Search

Build Agents and Models on One Platform
Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.

Try It Free
Secure File Transfer for Windows with Cerberus by Redwood
Protect and share files over FTP/S, SFTP, HTTPS and SCP with the #1 rated Windows file transfer server.

Cerberus supports unlimited users and connections on a single IP, with built-in encryption, 2FA, and a browser-based web client — all deployable in under 15 minutes with a 25-day free trial.

Try for Free
1

Data Science Collected Resources

Carefully curated resource links for data science in one place

...Its goal is to provide learners and practitioners with easy access to high-quality resources related to data science tools, programming languages, cloud platforms, and machine learning techniques. The repository includes links to materials discussing topics such as artificial intelligence research, AWS infrastructure, machine learning algorithms, and data analysis tools. It also contains supplementary documents like cheat sheets and machine learning notes that help readers review important concepts quickly.

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
2

Super-PDF-Editor-Lite

World's most comprehensive, powerful, process-based PDF editor

...OCR performs in pdf files, scanned pdf files and any pdf files. OCR performs in image files, and supports multiple image formats. Auto and manual image enhancement for better OCR accuracy and quality. Supports 165+ languages with three languages data set. Use Multiple Languages at once. International Languages: 127 Languages, High, Medium, and Fast Quality. Scanned Images (jpg, png, gif, tiff, bmp) Multi-Page and TIFF and GIF, Scanned PDFs.

3 Reviews

Downloads: 4 This Week

Last Update: 2023-02-02
See Project
3

AllenNLP

An open-source NLP research library, built on PyTorch

AllenNLP makes it easy to design and evaluate new deep learning models for nearly any NLP problem, along with the infrastructure to easily run them in the cloud or on your laptop. AllenNLP includes reference implementations of high quality models for both core NLP problems (e.g. semantic role labeling) and NLP applications (e.g. textual entailment). AllenNLP supports loading "plugins" dynamically. A plugin is just a Python package that provides custom registered classes or additional...

Downloads: 0 This Week

Last Update: 2022-10-18
See Project
4

WaveRNN

WaveRNN Vocoder + TTS

WaveRNN is a PyTorch implementation of DeepMind’s WaveRNN vocoder, bundled with a Tacotron-style TTS front end to form a complete text-to-speech stack. As a vocoder, WaveRNN models raw audio with a compact recurrent neural network that can generate high-quality waveforms more efficiently than many traditional autoregressive models. The repository includes scripts and code for preprocessing datasets such as LJSpeech, training Tacotron to produce mel spectrograms, training WaveRNN on those spectrograms (with optional GTA data), and finally generating audio. A quick_start.py script allows users to immediately synthesize example sentences from a pretrained model and inspect both generated audio and attention plots. ...

Downloads: 0 This Week

Last Update: 2025-11-28
See Project
Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure
Native application identity and user-based security for your Azure cloud

Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.

Get a free trial
5

EZStacking

EZStacking is Jupyter notebook generator for machine learning

EZStacking is Jupyter notebook generator for supervised learning problems using Scikit-Learn pipelines and stacked generalization. EZStacking handles classification and regression problems for structured data. It can also be viewed as a development tool, because a notebook generated with EZStacking contains: -an exploratory data analysis (EDA) used to assess data quality - a modelling producing a reduced-size stacked estimator - a server returning a prediction, a measure of the quality of input data and the execution time.

Downloads: 0 This Week

Last Update: 2022-06-30
See Project
6

MAE (Masked Autoencoders)

PyTorch implementation of MAE

MAE (Masked Autoencoders) is a self-supervised learning framework for visual representation learning using masked image modeling. It trains a Vision Transformer (ViT) by randomly masking a high percentage of image patches (typically 75%) and reconstructing the missing content from the remaining visible patches. This forces the model to learn semantic structure and global context without supervision. The encoder processes only the visible patches, while a lightweight decoder reconstructs the...

Downloads: 2 This Week

Last Update: 2025-10-06
See Project
7

GANformer

Generative Adversarial Transformers

This is an implementation of the GANformer model, a novel and efficient type of transformer, explored for the task of image generation. The network employs a bipartite structure that enables long-range interactions across the image, while maintaining computation of linearly efficiency, that can readily scale to high-resolution synthesis. The model iteratively propagates information from a set of latent variables to the evolving visual features and vice versa, to support the refinement of...

Downloads: 0 This Week

Last Update: 2023-03-22
See Project
8

Kite

Primary Kite repo, private bits replaced with XXXXXXX

The main Kite repo (originally kiteco/kiteco) was intended for private use. It has been lightly adapted for publication here by replacing private information with XXXXXXX. As a result, many components here may not work out of the box. We used a variety of infrastructure, on a mix of cloud platforms, depending on what was most economical, though it was mostly on AWS. You should be able to develop, build, and test Kite entirely on your local machine. However, we do have cloud instances & VMs...

Downloads: 6 This Week

Last Update: 2025-07-16
See Project
9

GiantMIDI-Piano

Classical piano MIDI dataset

GiantMIDI-Piano is a large-scale symbolic classical piano music dataset built by applying the piano_transcription system on a vast collection of piano performance recordings. The dataset contains thousands of piano works, spanning a large number of composers and styles, with each piece transcribed into high-precision MIDI files capturing note events, pedal usage, velocities, etc. It provides a resource for music information retrieval (MIR), symbolic music modeling, composer classification,...

Downloads: 5 This Week

Last Update: 2025-12-02
See Project
$300 Free Credits for Your Google Cloud Projects
Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.

Start Free Trial
10

Parakeet

PAddle PARAllel text-to-speech toolKIT

PAddle PARAllel text-to-speech toolKIT (supporting Tacotron2, Transformer TTS, FastSpeech2/FastPitch, SpeedySpeech, WaveFlow and Parallel WaveGAN) Parakeet aims to provide a flexible, efficient and state-of-the-art text-to-speech toolkit for the open-source community. It is built on PaddlePaddle dynamic graph and includes many influential TTS models. In order to facilitate exploiting the existing TTS models directly and developing the new ones, Parakeet selects typical models and provides...

Downloads: 4 This Week

Last Update: 2023-03-24
See Project
11

XLM (Cross-lingual Language Model)

PyTorch original implementation of Cross-lingual Language Model

XLM (Cross-lingual Language Model) is a family of multilingual pretraining methods that align representations across languages to enable strong zero-shot transfer. It popularized objectives like Masked Language Modeling (MLM) across many languages and Translation Language Modeling (TLM) that jointly trains on parallel sentence pairs to tighten cross-lingual alignment. Using a shared subword vocabulary, XLM learns language-agnostic features that work well for classification and sequence...

Downloads: 0 This Week

Last Update: 2025-10-07
See Project
12

OpenAI Glow

Copy code in "Glow: Generative Flow with Invertible 1x1 Convolutions"

Glow is an open source generative model released by OpenAI that demonstrates flow-based generative modeling techniques. Unlike models that rely on approximate inference, Glow uses invertible transformations to directly learn the data distribution, allowing for exact likelihood computation and efficient sampling. The model is capable of producing high-quality synthetic images while maintaining interpretable latent spaces that enable meaningful manipulation of generated outputs. Glow’s architecture is based on reversible layers and efficient flow operations, which allow large-scale training while keeping memory usage manageable. ...

Downloads: 1 This Week

Last Update: 4 days ago
See Project
13

CC-Net

Tools to download and cleanup Common Crawl data

cc_net provides tools to download, segment, clean, and filter Common Crawl to build large-scale text corpora, including monolingual datasets and the multilingual CC-100 collection introduced in the associated paper. It includes pipelines to fetch snapshots, extract text, de-duplicate, identify language, and apply quality filtering based on heuristics and language models. The outputs are intended for pretraining language models and for creating standardized corpora that can be reproduced or...

Downloads: 0 This Week

Last Update: 2025-10-11
See Project
14

NLP Best Practices

Natural Language Processing Best Practices & Examples

In recent years, natural language processing (NLP) has seen quick growth in quality and usability, and this has helped to drive business adoption of artificial intelligence (AI) solutions. In the last few years, researchers have been applying newer deep learning methods to NLP. Data scientists started moving from traditional methods to state-of-the-art (SOTA) deep neural network (DNN) algorithms which use language models pretrained on large text corpora.

Downloads: 0 This Week

Last Update: 2022-08-01
See Project
15

Image Quality Assessment

Convolutional Neural Networks to predict aesthetic quality of images

Image Quality Assessment is an open-source deep learning project that implements neural models for predicting the aesthetic and technical quality of digital images. The repository provides an implementation inspired by the NIMA (Neural Image Assessment) research approach, which uses convolutional neural networks trained on human-annotated datasets to estimate image quality scores.

Downloads: 0 This Week

Last Update: 2026-03-12
See Project
16

textgenrnn

Easily train your own text-generating neural network

With textgenrnn you can easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code. A modern neural network architecture that utilizes new techniques as attention-weighting and skip-embedding to accelerate training and improve model quality. Train on and generate text at either the character-level or word-level. Configure RNN size, the number of RNN layers, and whether to use bidirectional RNNs. Train on any generic input text...

Downloads: 0 This Week

Last Update: 2021-11-24
See Project
17

EverydayWechat

Python tool that automates WeChat messages, replies, & group utilities

...In addition to personal messaging automation, the project includes a group assistant that can respond to queries and provide useful information within chat groups. These group utilities can retrieve data such as weather conditions, calendar details, garbage classification information, movie box office statistics, delivery tracking updates, and air quality reports.

1 Review

Downloads: 4 This Week

Last Update: 5 days ago
See Project
18

MatchZoo

Facilitating the design, comparison and sharing of deep text models

The goal of MatchZoo is to provide a high-quality codebase for deep text matching research, such as document retrieval, question answering, conversational response ranking, and paraphrase identification. With the unified data processing pipeline, simplified model configuration and automatic hyper-parameters tunning features equipped, MatchZoo is flexible and easy to use. Preprocess your input data in three lines of code, keep track parameters to be passed into the model. ...

Downloads: 0 This Week

Last Update: 2022-08-03
See Project
19

Chatito

Dataset generation for AI chatbots, NLP tasks

Chatito is a tool that helps generate datasets for training and validating chatbot models using a simple domain-specific language (DSL).

Downloads: 0 This Week

Last Update: 2025-01-30
See Project
20

PyTorch-BigGraph

Generate embeddings from large-scale graph-structured data

PyTorch-BigGraph (PBG) is a system for learning embeddings on massive graphs—think billions of nodes and edges—using partitioning and distributed training to keep memory and compute tractable. It shards entities into partitions and buckets edges so that each training pass only touches a small slice of parameters, which drastically reduces peak RAM and enables horizontal scaling across machines. PBG supports multi-relation graphs (knowledge graphs) with relation-specific scoring functions,...

Downloads: 0 This Week

Last Update: 2025-10-07
See Project
21

OpenSeq2Seq

Toolkit for efficient experimentation with Speech Recognition

...The toolkit includes ready-made models for neural machine translation, automatic speech recognition, speech synthesis, language modeling, and additional NLP tasks such as sentiment analysis. It supports multi-GPU and multi-node data-parallel training, and integrates with Horovod to scale out across large GPU clusters. Mixed-precision support (float16) is optimized for NVIDIA Volta and Turing GPUs, allowing significant speedups and memory savings without sacrificing model quality. The project comes with configuration-driven training scripts, documentation, and examples that demonstrate how to set up pipelines for tasks.

Downloads: 0 This Week

Last Update: 2025-11-28
See Project
22

JuliusModels

Open source speech models for Julius in English and other languages.

Open source speech models for Julius speech decoder. Its aim is to give access a wider community of speech recognition enthusiasts to quality models, which they can use in their own projects on different OS platforms (Unix, Windows, etc...) All of the models are based on HTK modelling software and data sets available freely on the Internet.

Downloads: 2 This Week

Last Update: 2018-05-11
See Project
23

Speechalyzer

Process large speech data wrt transcription, labeling and annotation

Speechalyzer: a tool for the daily work of a 'speech worker' It is optimized to process large speech data sets with respect to transcription, labeling and annotation. It is implemented as a client server based framework in Java and interfaces software for speech recognition, synthesis, speech classification and quality evaluation. The application is mainly the processing of training data for speech recognition and classification models and performing benchmarking tests on speech-to-text, text-to-speech and speech classification software systems.

Downloads: 2 This Week

Last Update: 2016-04-27
See Project
24

DBpedia Spotlight

DBpedia Spotlight is a tool for automatically annotating

It is a tool for automatically annotating mentions of DBpedia resources in text, providing a solution for linking unstructured information sources to the Linked Open Data cloud through DBpedia. With a four step approach, DBpedia Spotlight performs named entity extraction, including entity detection and name resolution. It can also be used for named entity recognition, amongst other information extraction tasks. Empower the user experience reusing, interlinking and making semantic queries among high-quality open datasets, extracting meaning from unstructured data.

Downloads: 0 This Week

Last Update: 2023-05-23
See Project
25

A Data Generator

A tool to generate synthetic test data useful to Record matchers

With growing amount of information from multiple sources it has become very hard to relate information to the correct real life entities. Record matching software try to solve this by machine learning techniques. To do this effectively, its necessary to train the record matcher with proper test data which is identical to real life data. Hence, there is a need for a data generator to create the synthetic data to be used for evaluating the quality and capability of record matching software. A data generator creates qualitative test data considering various the real life data glitches entered through various means like human data entry, voice dictation and data scanning. ...

Downloads: 0 This Week

Last Update: 2013-12-08
See Project

Previous
1
2
3
4
You're on page 5
6
Next

Related Searches

pdf editor

urdu text to speech

julius

transcription

tiff to pdf

pdf imposition

pdf

imposition

booklet

voice cloning

Related Categories

Artificial Intelligence

Business

Software Development

Formats and Protocols

System

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise