Showing 34 open source projects for "dataset"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    BIMserver

    BIMserver

    The open source BIMserver platform

    ...The main advantage of this approach is the ability to query, merge and filter the BIM model and generate IFC output (i.e. files) on the fly. Thanks to its multi-user support, multiple people can work on their own part of the dataset, while the complete dataset is updated on the fly. Other users can get notifications when the model (or a part of it) is updated.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Weka

    Weka

    Machine learning software to solve data mining problems

    Weka is a collection of machine learning algorithms for solving real-world data mining problems. It is written in Java and runs on almost any platform. The algorithms can either be applied directly to a dataset or called from your own Java code.
    Leader badge
    Downloads: 8,877 This Week
    Last Update:
    See Project
  • 3
    sRNAWorkbench

    sRNAWorkbench

    The UEA sRNA Workbench

    A suite of tools for analysing small RNA (sRNA) data from Next Generation Sequencing devices. Including expression profiling of known mirco RNA (miRNA), identification of novel miRNA in deep-sequencing data and identification of other interesting landmarks within high-throughput genetic data
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    A Java-based listener and support classes to procure and decode RadNet messages from the network transport layer into their instrument-specific datasets to make those dataset members available to software as indexed name-value pairs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 5
    WhyLogs Java Library

    WhyLogs Java Library

    Profile and monitor your ML data pipeline end-to-end

    ...WhyLogs calculates approximate statistics for datasets of any size up to TB-scale, making it easy for users to identify changes in the statistical properties of a model's inputs or outputs. Using approximate statistics allows the package to run on minimal infrastructure and monitor an entire dataset, rather than miss outliers and other anomalies by only using a sample of the data to calculate statistics.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6

    MarDRe

    MapReduce-based tool to remove duplicate DNA reads

    MarDRe is a de novo MapReduce-based parallel tool to remove duplicate and near-duplicate DNA reads through the clustering of single-end and paired-end sequences from FASTQ/FASTA datasets. This tool allows bioinformatics to avoid the analysis of not necessary reads, reducing the time of subsequent procedures with the dataset. MarDRe is the Big Data counterpart of ParDRe (link above), which employs HPC technologies (i.e., hybrid MPI/multithreading) to reduce runtime on multicore systems. Instead, MarDRe takes advantage of the MapReduce programming model to significantly improve ParDRe performance on distributed systems, especially on cloud-based infrastructures. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    OYSTER Entity Resolution

    OYSTER Entity Resolution

    OYSTER is an Entity Resolution engine

    Entity Resolution is the process by which a dataset is processed and records are identified that represent the same real-world entity. OYSTER (Open sYSTem Entity Resolution) is an entity resolution system that supports probabilistic direct matching, transitive linking, and asserted linking. To facilitate prospecting for match candidates (blocking), the system builds and maintains an in-memory index of attribute values to identities.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    FlowLayout

    FlowLayout

    Android streaming layout, supports single selection

    FlowLayout is an Android UI library that implements a “flow” or “tag cloud” layout where items automatically wrap onto new lines as needed, making it ideal for chips, product tags, and selectable labels. Instead of manually placing views, you feed data through an adapter-style API, so tags can be created dynamically from a list and refreshed when the dataset changes. The library supports selection behavior out of the box, including single-select and multi-select modes, so it can behave like a group of checkable chips without you building the state machinery from scratch. It also provides click and selection listeners that let you react when a user taps a tag or when the selected set changes, which is useful for filters and preference UIs. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Genetic Oversampling Weka Plugin

    Genetic Oversampling Weka Plugin

    A Weka Plugin that uses a Genetic Algorithm for Data Oversampling

    Weka genetic algorithm filter plugin to generate synthetic instances. This Weka Plugin implementation uses a Genetic Algorithm to create new synthetic instances to solve the imbalanced dataset problem. See my master thesis available for download, for further details.
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 10

    Dataset Metadata Collector

    A Java web application which converts metadata to RIF-CS format.

    The CSIRO Dataset Metadata Collector is a Java web application which reads metadata (in a variety of formats and from a variety of data sources) on datasets and produces corresponding RIF-CS metadata which are added (or updated) in a Repository. This project is supported by the Australian National Data Service (ANDS) through the National Collaborative Research Infrastructure Strategy Program and the Education Investment Fund (EIF) Super Science Initiative, as well as through the CSIRO.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    For the process of RDF dataset transformation the R2R Framework specifies a mapping language and an implementation in form of a a Java API. More infos at: http://www4.wiwiss.fu-berlin.de/bizer/r2r/
    Downloads: 6 This Week
    Last Update:
    See Project
  • 12

    GA-EoC

    GeneticAlgorithm-based search for Heterogeneous Ensemble Combinations

    In data classification, there are no particular classifiers that perform consistently in every case. This is even worst in case of both the high dimensional and class-imbalanced datasets. To overcome the limitations of class-imbalanced data, we split the dataset using a random sub-sampling to balance them. Then, we apply the (alpha,beta)-k feature set method to select a better subset of features and combine their outputs to get a consolidated feature set for classifier training. To enhance classification performances, we propose an ensemble of classifiers that combine the classification outputs of base classifiers using the simplest and largely used majority voting approach. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Analyze My Genes

    Analyze My Genes

    Compare gene analysis results from 23andme with the human genome

    This program compares personal gene analysis results from 23andme with extracted databases from the human genome project. An typical example of an extracted database is a dataset which contains all alternative alleles which occur less than 1% of the time.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14

    DE-HEoC

    DE-based Weight Optimisation for Heterogeneous Ensemble

    ...Average Matthews Correlation Coefficient (MCC) score, calculated over 10-fold cross-validation, has been used as the measure of quality of an ensemble. DE/rand/1/bin algorithm has been utilised to maximize the average MCC score calculated using 10-fold cross-validation on training dataset. The voting weights of base classifiers are optimized for the heterogeneous ensemble of classifiers aiming to attain better generalization performances on testing datasets.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15

    LightAir Maven Plugin

    Generates DbUnit dataset XSD from database in Maven plugin.

    Maven plugin to generate XSD for DbUnit flat datasets from existing tables in a database.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    CIG-P

    CIG-P

    CIG-P is a simple yet flexible data visualization tool

    ...CIG-P can be used to compare a) different AP-MS datasets of various baits or b) a particular bait under various perturbations (lenticular section CIG-P). The output of CIG-P is a simple and intuitively easy to grasp visualization of a complex dataset. Publication: CIG-P: Cicular Interaction Graph for Proteomics http://www.biomedcentral.com/1471-2105/15/344/ Previously known as PIVOT (Protein Interaction Visualization and Observation Tool)
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Calculates Natural Product(NP)-likeness of a molecule, i.e. the similarity of the molecule to the structure space covered by known natural products. NP-likeness is a useful criterion to screen compound libraries and to design new lead compounds. Maven dependancy: <dependency> <groupId>uk.ac.ebi.cheminformatics</groupId> <artifactId>NP-Likeness</artifactId> <version>2.1</version> </dependency> Required repository: <repositories> ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18

    Cost-sensitive Classifiers

    Adaboost extensions for cost-sentive classification

    ...Minimum expected cost criteria Input also requires to load an arff file and a cost matrix (sample arff and cost files are uploaded for users' reference) This extension uses weka for classification and generates the classification model along with confusion matrix. For given dataset and cost matrix
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19

    TextProcessor

    A Java package to preprocess text datasets for posterior text analysis

    The TextProcessor Java package is a text processing toolkit, which provides some frequently used text processing functions such as stemming, removing stop-words, generating a term vocabulary, and calculating the term-doc frequency matrix. Basic topic mining models such as LDA and sparse NMF are also supported. The package can also generate feature files from a given text dataset with LDA and LIBSVM format for posterior procedures such as classification or clustering. The toolkit is also being extended for more advanced text analysis tasks based on natural language processing techniques.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Document Analysis and Exploitation
    The Document Analysis and Exploitation Platform is a Drupal based web interface to a cloud enabled Document Analysis resource set.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21

    LifeMap

    LifeMap: Mobility Monitoring Tool

    ...We open the source code of adaptive duty cycling component published in [1]. We will gradually open the source code of LifeMap for research communities. The subset of dataset is available in CrawDad research communities (http://www.crawdad.org/meta.php?name=yonsei/lifemap). [1] Y. Chon, E. Talipov, H. Shin, H. Cha, "Mobility Prediction based Smartphone Energy Optimization for Everyday Location Monitoring," in Proceeding of 9th ACM Conference on Embedded Networked Sensor Systems (SenSys'11), 2011, ACM, Seattle, WA, USA.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22

    SciChart

    Interactive Swing based Charting library to display science data

    This free charting library supporting in the initial version line plot and bar plots. Provides: Axis sharing independent rescaling and panning of axis and datasets. Basic tooltips Legend displayer component with ability to select the active dataset. It is designed upon the Model View Controller paradigm. This mean that the dataset related API is abstracted in a model. This model is used by the swing components. Display related functionality is limited to the swing components with no interaction with the model.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    BlogTEX is an ad-hoc blog posts extraction algorithm written in Java for TREC Blog08 dataset. It includes an optimized sentence model for clearly identifying sentence boundaries in each blog post. Its output can be customized using its config file.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    A java-based JSON service for zip code location lookup and distance calculation. Implemented as a web application and related support classes. Includes a zip code dataset that should be loaded into a database.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    FastPval is multiple stage p-value computing software that computes empirical p-values from a large set of permutated/resampled background data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB