dataset free download

Showing 34 open source projects for "dataset"

View related business solutions

Java Clear Filters & Widen Search

Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
1

BIMserver

The open source BIMserver platform

...The main advantage of this approach is the ability to query, merge and filter the BIM model and generate IFC output (i.e. files) on the fly. Thanks to its multi-user support, multiple people can work on their own part of the dataset, while the complete dataset is updated on the fly. Other users can get notifications when the model (or a part of it) is updated.

Downloads: 1 This Week

Last Update: 2025-08-07
See Project
2

Weka

Machine learning software to solve data mining problems

Weka is a collection of machine learning algorithms for solving real-world data mining problems. It is written in Java and runs on almost any platform. The algorithms can either be applied directly to a dataset or called from your own Java code.

51 Reviews

Downloads: 8,877 This Week

Last Update: 2026-01-29
See Project
3

sRNAWorkbench

The UEA sRNA Workbench

A suite of tools for analysing small RNA (sRNA) data from Next Generation Sequencing devices. Including expression profiling of known mirco RNA (miRNA), identification of novel miRNA in deep-sequencing data and identification of other interesting landmarks within high-throughput genetic data

Downloads: 1 This Week

Last Update: 2022-08-29
See Project
4

RadNet Listener

A Java-based listener and support classes to procure and decode RadNet messages from the network transport layer into their instrument-specific datasets to make those dataset members available to software as indexed name-value pairs.

Downloads: 0 This Week

Last Update: 2021-12-13
See Project
Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
5

WhyLogs Java Library

Profile and monitor your ML data pipeline end-to-end

...WhyLogs calculates approximate statistics for datasets of any size up to TB-scale, making it easy for users to identify changes in the statistical properties of a model's inputs or outputs. Using approximate statistics allows the package to run on minimal infrastructure and monitor an entire dataset, rather than miss outliers and other anomalies by only using a sample of the data to calculate statistics.

Downloads: 0 This Week

Last Update: 2023-06-12
See Project
6

MarDRe

MapReduce-based tool to remove duplicate DNA reads

MarDRe is a de novo MapReduce-based parallel tool to remove duplicate and near-duplicate DNA reads through the clustering of single-end and paired-end sequences from FASTQ/FASTA datasets. This tool allows bioinformatics to avoid the analysis of not necessary reads, reducing the time of subsequent procedures with the dataset. MarDRe is the Big Data counterpart of ParDRe (link above), which employs HPC technologies (i.e., hybrid MPI/multithreading) to reduce runtime on multicore systems. Instead, MarDRe takes advantage of the MapReduce programming model to significantly improve ParDRe performance on distributed systems, especially on cloud-based infrastructures. ...

Downloads: 0 This Week

Last Update: 2019-01-23
See Project
7

OYSTER Entity Resolution

OYSTER is an Entity Resolution engine

Entity Resolution is the process by which a dataset is processed and records are identified that represent the same real-world entity. OYSTER (Open sYSTem Entity Resolution) is an entity resolution system that supports probabilistic direct matching, transitive linking, and asserted linking. To facilitate prospecting for match candidates (blocking), the system builds and maintains an in-memory index of attribute values to identities.

2 Reviews

Downloads: 0 This Week

Last Update: 2018-11-11
See Project
8

FlowLayout

Android streaming layout, supports single selection

FlowLayout is an Android UI library that implements a “flow” or “tag cloud” layout where items automatically wrap onto new lines as needed, making it ideal for chips, product tags, and selectable labels. Instead of manually placing views, you feed data through an adapter-style API, so tags can be created dynamically from a list and refreshed when the dataset changes. The library supports selection behavior out of the box, including single-select and multi-select modes, so it can behave like a group of checkable chips without you building the state machinery from scratch. It also provides click and selection listeners that let you react when a user taps a tag or when the selected set changes, which is useful for filters and preference UIs. ...

Downloads: 0 This Week

Last Update: 2026-01-20
See Project
9

Genetic Oversampling Weka Plugin

A Weka Plugin that uses a Genetic Algorithm for Data Oversampling

Weka genetic algorithm filter plugin to generate synthetic instances. This Weka Plugin implementation uses a Genetic Algorithm to create new synthetic instances to solve the imbalanced dataset problem. See my master thesis available for download, for further details.

1 Review

Downloads: 0 This Week

Last Update: 2017-11-01
See Project
AI-generated apps that pass security review
Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.

Try Retool free
10

Dataset Metadata Collector

A Java web application which converts metadata to RIF-CS format.

The CSIRO Dataset Metadata Collector is a Java web application which reads metadata (in a variety of formats and from a variety of data sources) on datasets and produces corresponding RIF-CS metadata which are added (or updated) in a Repository. This project is supported by the Australian National Data Service (ANDS) through the National Collaborative Research Infrastructure Strategy Program and the Education Investment Fund (EIF) Super Science Initiative, as well as through the CSIRO.

Downloads: 0 This Week

Last Update: 2016-03-10
See Project
11

R2R

For the process of RDF dataset transformation the R2R Framework specifies a mapping language and an implementation in form of a a Java API. More infos at: http://www4.wiwiss.fu-berlin.de/bizer/r2r/

Downloads: 6 This Week

Last Update: 2016-04-19
See Project
12

GA-EoC

GeneticAlgorithm-based search for Heterogeneous Ensemble Combinations

In data classification, there are no particular classifiers that perform consistently in every case. This is even worst in case of both the high dimensional and class-imbalanced datasets. To overcome the limitations of class-imbalanced data, we split the dataset using a random sub-sampling to balance them. Then, we apply the (alpha,beta)-k feature set method to select a better subset of features and combine their outputs to get a consolidated feature set for classifier training. To enhance classification performances, we propose an ensemble of classifiers that combine the classification outputs of base classifiers using the simplest and largely used majority voting approach. ...

Downloads: 0 This Week

Last Update: 2016-04-04
See Project
13

Analyze My Genes

Compare gene analysis results from 23andme with the human genome

This program compares personal gene analysis results from 23andme with extracted databases from the human genome project. An typical example of an extracted database is a dataset which contains all alternative alleles which occur less than 1% of the time.

Downloads: 0 This Week

Last Update: 2016-02-14
See Project
14

DE-HEoC

DE-based Weight Optimisation for Heterogeneous Ensemble

...Average Matthews Correlation Coefficient (MCC) score, calculated over 10-fold cross-validation, has been used as the measure of quality of an ensemble. DE/rand/1/bin algorithm has been utilised to maximize the average MCC score calculated using 10-fold cross-validation on training dataset. The voting weights of base classifiers are optimized for the heterogeneous ensemble of classifiers aiming to attain better generalization performances on testing datasets.

Downloads: 0 This Week

Last Update: 2015-11-24
See Project
15

LightAir Maven Plugin

Generates DbUnit dataset XSD from database in Maven plugin.

Maven plugin to generate XSD for DbUnit flat datasets from existing tables in a database.

Downloads: 0 This Week

Last Update: 2015-09-14
See Project
16

CIG-P

CIG-P is a simple yet flexible data visualization tool

...CIG-P can be used to compare a) different AP-MS datasets of various baits or b) a particular bait under various perturbations (lenticular section CIG-P). The output of CIG-P is a simple and intuitively easy to grasp visualization of a complex dataset. Publication: CIG-P: Cicular Interaction Graph for Proteomics http://www.biomedcentral.com/1471-2105/15/344/ Previously known as PIVOT (Protein Interaction Visualization and Observation Tool)

Downloads: 0 This Week

Last Update: 2014-08-19
See Project
17

Natural product likeness calculator

Calculates Natural Product(NP)-likeness of a molecule, i.e. the similarity of the molecule to the structure space covered by known natural products. NP-likeness is a useful criterion to screen compound libraries and to design new lead compounds. Maven dependancy: <dependency> <groupId>uk.ac.ebi.cheminformatics</groupId> <artifactId>NP-Likeness</artifactId> <version>2.1</version> </dependency> Required repository: <repositories> ...

Downloads: 0 This Week

Last Update: 2017-06-23
See Project
18

Cost-sensitive Classifiers

Adaboost extensions for cost-sentive classification

...Minimum expected cost criteria Input also requires to load an arff file and a cost matrix (sample arff and cost files are uploaded for users' reference) This extension uses weka for classification and generates the classification model along with confusion matrix. For given dataset and cost matrix

1 Review

Downloads: 0 This Week

Last Update: 2014-03-01
See Project
19

TextProcessor

A Java package to preprocess text datasets for posterior text analysis

The TextProcessor Java package is a text processing toolkit, which provides some frequently used text processing functions such as stemming, removing stop-words, generating a term vocabulary, and calculating the term-doc frequency matrix. Basic topic mining models such as LDA and sparse NMF are also supported. The package can also generate feature files from a given text dataset with LDA and LIBSVM format for posterior procedures such as classification or clustering. The toolkit is also being extended for more advanced text analysis tasks based on natural language processing techniques.

Downloads: 0 This Week

Last Update: 2015-11-23
See Project
20

Document Analysis and Exploitation

The Document Analysis and Exploitation Platform is a Drupal based web interface to a cloud enabled Document Analysis resource set.

Downloads: 0 This Week

Last Update: 2017-02-14
See Project
21

LifeMap

LifeMap: Mobility Monitoring Tool

...We open the source code of adaptive duty cycling component published in [1]. We will gradually open the source code of LifeMap for research communities. The subset of dataset is available in CrawDad research communities (http://www.crawdad.org/meta.php?name=yonsei/lifemap). [1] Y. Chon, E. Talipov, H. Shin, H. Cha, "Mobility Prediction based Smartphone Energy Optimization for Everyday Location Monitoring," in Proceeding of 9th ACM Conference on Embedded Networked Sensor Systems (SenSys'11), 2011, ACM, Seattle, WA, USA.

1 Review

Downloads: 0 This Week

Last Update: 2013-05-30
See Project
22

SciChart

Interactive Swing based Charting library to display science data

This free charting library supporting in the initial version line plot and bar plots. Provides: Axis sharing independent rescaling and panning of axis and datasets. Basic tooltips Legend displayer component with ability to select the active dataset. It is designed upon the Model View Controller paradigm. This mean that the dataset related API is abstracted in a model. This model is used by the swing components. Display related functionality is limited to the swing components with no interaction with the model.

Downloads: 0 This Week

Last Update: 2015-03-15
See Project
23

BlogTEX: Blog posts extraction for TREC.

BlogTEX is an ad-hoc blog posts extraction algorithm written in Java for TREC Blog08 dataset. It includes an optimized sentence model for clearly identifying sentence boundaries in each blog post. Its output can be customized using its config file.

Downloads: 0 This Week

Last Update: 2016-07-25
See Project
24

catamaran-zip

A java-based JSON service for zip code location lookup and distance calculation. Implemented as a web application and related support classes. Includes a zip code dataset that should be loaded into a database.

Downloads: 0 This Week

Last Update: 2015-11-23
See Project
25

FastPval

FastPval is multiple stage p-value computing software that computes empirical p-values from a large set of permutated/resampled background data.

Downloads: 0 This Week

Last Update: 2015-02-01
See Project