Showing 67 open source projects for "big data"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 1
    MyCAT

    MyCAT

    Active, high-performance open source database middleware

    ...Regarded as MySQL cluster of enterprise database, MyCAT can take the place of expensive Oracle cluster. MyCAT is also a new type of database, which seems like a SQL Server integrated with the memory cache technology, NoSQL technology and HDFS big data. And as a new modern enterprise database product, MyCAT is combined with the traditional database and new distributed data warehouse. In a word, MyCAT is a fresh new middleware of database. MyCAT ’s objective is to smoothly migrate the current stand-alone database and applications to cloud side with low cost and to solve the bottleneck problem caused by the rapid growth of data storage and business scale.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 2
    geometry-api-java

    geometry-api-java

    The Esri Geometry API for Java enables developers to write apps

    The Esri Geometry API for Java can be used to enable spatial data processing in 3rd-party data-processing solutions. Developers of custom MapReduce-based applications for Hadoop can use this API for spatial processing of data in the Hadoop system. The API is also used by the Hive UDF’s and could be used by developers building geometry functions for 3rd-party applications such as Cassandra, HBase, Storm and many other Java-based “big data” applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    apache spark data pipeline osDQ

    apache spark data pipeline osDQ

    osDQ dedicated to create apache spark based data pipeline using JSON

    This is an offshoot project of open source data quality (osDQ) project https://sourceforge.net/projects/dataquality/ This sub project will create apache spark based data pipeline where JSON based metadata (file) will be used to run data processing , data pipeline , data quality and data preparation and data modeling features for big data. This uses java API of apache spark.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4

    MarDRe

    MapReduce-based tool to remove duplicate DNA reads

    MarDRe is a de novo MapReduce-based parallel tool to remove duplicate and near-duplicate DNA reads through the clustering of single-end and paired-end sequences from FASTQ/FASTA datasets. This tool allows bioinformatics to avoid the analysis of not necessary reads, reducing the time of subsequent procedures with the dataset. MarDRe is the Big Data counterpart of ParDRe (link above), which employs HPC technologies (i.e., hybrid MPI/multithreading) to reduce runtime on multicore systems. Instead, MarDRe takes advantage of the MapReduce programming model to significantly improve ParDRe performance on distributed systems, especially on cloud-based infrastructures. Written in pure Java to maximize cross-platform compatibility, MarDRe is built upon the open-source Apache Hadoop project, the most popular distributed computing framework for Big Data processing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Access competitive interest rates on your digital assets.

    Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 5

    X10

    Performance and Productivity at Scale

    ...Both its modern, type-safe sequential core and simple programming model for concurrency and distribution contribute to making X10 a high-productivity language in the HPC and Big Data spaces. User productivity is further enhanced by providing tools such as an Eclipse-based IDE (X10DT). Implementations of X10 are available for a wide variety of hardware and software platforms ranging from laptops, to commodity clusters, to supercomputers.
    Leader badge
    Downloads: 52 This Week
    Last Update:
    See Project
  • 6

    HSRA

    Hadoop spliced read aligner for RNA-seq data

    ...This tool allows bioinformatics researchers to efficiently distribute their mapping tasks over the nodes of a cluster by combining a fast multithreaded spliced aligner (HISAT2) with Apache Hadoop, which is a distributed computing framework for scalable Big Data processing. HSRA currently supports single-end and paired-end read alignments from FASTQ/FASTA datasets. Moreover, our tool uses the Hadoop Sequence Parser (HSP) library (link above) to efficiently read the input datasets stored on the Hadoop Distributed File System (HDFS), being able to process datasets compressed with Gzip and BZip2 codecs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    KittyORM

    KittyORM

    KittyORM is an ORM library designed for use with Android and SQLite.

    KittyORM is an Object-Relational Mapping library designed for use with Android and SQLite. It implements Data Mapper pattern design and its main purpose is to simplify interaction with SQLite database in Android applications. Written in Java 7 it supports devices from API level 9 Android. Main features we want to achieve with KittyORM are: * simple and clear API; * high flexibility of working with model POJO files via database mappers that grants you an ability to focus on your...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Numerics for Chemical Engineering

    Numerics for Chemical Engineering

    Numerical models for chemical and process engineering

    NCE Calculation Framework is a library of routines, models and data applicable to chemical and process engineering calculations, written in Java. -- NEW -- www.chesolver.com *ONLINE CALCULATORS*. A set of solvers to perform calculations consistently on any device, from smart-phone to desktop. The project includes the following ready to use software all based on the same core library: * Online Calculators at www.chesolver.com * Extensions for Libreoffice/Openoffice Calc...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    Easy Machine Learning

    Easy Machine Learning

    Easy Machine Learning is a general-purpose dataflow-based system

    Machine learning algorithms have become the key components in many big data applications. However, the full potential of machine learning is still far from being realized because using machine learning algorithms is hard, especially on distributed platforms such as Hadoop and Spark. The key barriers come from not only the implementation of the algorithms themselves but also the processing for applying them to real applications which often involve multiple steps and different algorithms. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • Go from Code to Production URL in Seconds Icon
    Go from Code to Production URL in Seconds

    Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

    Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.
    Try it free
  • 10
    nervalreports

    nervalreports

    A lightweight report creation Java library

    Nerval Reports is a lightweight report creation library, focused on minimal computational costs. Ideally, report creation should iterate only once through its data and minimize memory allocation and processor's use, but also restrict its dependencies only to what your specific use needs. Instead of the highly expensive way of well-known engines like Jasper Reports, where performance and data reiteration is a big and set aside problem (and also is the report design as a non-programming task), at Nerval Reports the data should be sent directly when iterating through it (ie: when using databases, one will create its report while iterating through the result set - instead of creating a bunch of collections to pass them to the library), and the design is mainly focused on programmer side. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Voldemort

    Voldemort

    A distributed key-value storage system

    Voldemort is a distributed database that’s an open source clone of Amazon’s Dynamo. It automatically replicates data over multiple servers, and automatically partitions them as well so each server only contains a subset of the total data. It offers many other features such as pluggable serialization support, data item versioning and an SSD Optimized Read Write storage engine. Voldemort is not a relational database or an object database. It is essentially a big, distributed, persistent, fault-tolerant hash table. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    DSTK - DataScience ToolKit

    DSTK - DataScience ToolKit

    DSTK - DataScience ToolKit for All of Us

    ...Of course you may specify JASP for advanced data editing and RapidMiner for advanced prediction modeling. DSTK is written in C#, Java and Python to interface with R, NLTK, and Weka. It can be expanded with plugins using R Scripts. We have also created plugins for more statistical functions, and Big Data Analytics with Microsoft Azure HDInsights (Spark Server) with Livy.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    H2O-3

    H2O-3

    H2O is an Open Source, Distributed, Fast & Scalable Machine Learning

    ...H2O-3 integrates with big data technologies such as Hadoop and Apache Spark, enabling organizations to run machine learning workflows on large-scale data infrastructure. The platform also includes a web-based interface called Flow that allows users to build models interactively through notebooks and visual tools.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    XML2CSV-Generic-Converter

    XML2CSV-Generic-Converter

    Flatten XML into CSV to suit your mood

    Java XML to CSV (XML2CSV) generic conversion facility. Flattens one or more similar XML files into CSV projections. I made it in order to extract data from big XML files and gather them in files more easily opened with a spreadsheet because I didn't find anything adapted to my needs over the Internet when I needed to (Java + truly generic + self-contained algorithm + Unix like command line options + efficiency). It is packaged as an auto executable Jar for convenient command line execution but might as well be interfaced directly by a Java class as part of a broader [yet non commercial] software. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15

    owl reasoning over big biomedical data

    A OWL reasoning framework for the analysis of big biomedical data

    A general OWL reasoning framework for the analysis of big biomedical data and implement a MapReduce-based property chain reasoning prototype system. OWL reasoning method is ideally suitable for problems involved complex semantic associations because it is able to infer logical consequences based on a set of asserted rules or axioms. MapReduce framework isused to solve the problem of scalability.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    CSV Transformer

    CSV Transformer

    transforms xml to csv

    The CSV Transformer is a data processing tool which transforms .xml-Files to comma separated values. The CSV Transformer was created in a load and performance testing project, the use case was to be able to transform 2800 configuration files of a big banking application to a single .csv-File. With this single file it was possible to compare the whole configuration between two releases with the already available tool CSV Comparator.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Universal Java Matrix Package

    Universal Java Matrix Package

    sparse and dense matrix, linear algebra, visualization, big data

    The Universal Java Matrix Package (UJMP) is an open source Java library which provides sparse and dense matrix classes, as well as a large number of calculations for linear algebra such as matrix multiplication or matrix inverse. Operations such as mean, correlation, standard deviation, replacement of missing values or the calculation of mutual information are supported, too. The Universal Java Matrix Package provides various visualization methods, import and export filters for a large...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Easy HTTP: Easy Web Service Support

    Easy HTTP: Easy Web Service Support

    Classic Web Service Support

    With all of the hoopla over Web Services, you would think that servicing user requests over HTTP was something new. While we all may have lots of experience with REST, JSON, XML, SOAP, WSDLs, HTTPS, and even EDI ... but at the end of the day, it all comes down to legacies, security, and performance? So while big companies might have billions of dollars to spend re-writing their back end web legacies, those of us who want to seamlessly automate a simple set of CRUD operations to our...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19

    Chordalysis

    Log-linear analysis (data modelling) for high-dimensional data

    ...However, due to its exponential nature, previous approaches did not allow scale-up to more than a dozen variables. We present here Chordalysis, a log-linear analysis method for big data. Chordalysis exploits recent discoveries in graph theory by representing complex models as compositions of triangular structures, also known as chordal graphs. Chordalysis makes it possible to discover the structure of datasets with thousands of variables on a standard desktop computer. Associated papers at ICDM 2013, ICDM 2014 and SDM 2015 can be found at http://www.francois-petitjean.com/Research/ YourKit is supporting Chordalysis open source project with its full-featured Java Profiler. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Classic Monitor

    Classic Monitor

    Android app to monitor a Midnite Solar Classic charge controller

    Now available on Google Play at <https://play.google.com/store/apps/details?id=ca.farrelltonsolar.classic> The source code is now maintained on <https://github.com/graham22/Classic> Classic Monitor is a free status monitor for Midnite solar 's, Classic 150, 200, 250 Charge Controller. It is a Read Only Program, it does not write to the Classic. The software is provided "AS IS", WITHOUT WARRANTY OF ANY KIND, express or implied. Classic Monitor is NOT a product of Midnite solar,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Cascalog

    Cascalog

    Data processing on Hadoop without the hassle

    Cascalog is a powerful Clojure (and Java) data processing and querying library built atop Hadoop (via Cascading), providing a high-level, Datalog-inspired abstraction for both big data processing and local computation. Cascalog is hosted at Clojars, and some of its dependencies are hosted at Conjars. Both Clo/Con-jars are maven repos that's easy to use with maven or leiningen.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    BIRT Report Designer

    BIRT Report Designer

    Open Source Reporting & Data Visualization Platform

    ...With a flexible Open Data Access framework, developers can write custom data drivers to access data from any source, including Big Data sources like Apache Hadoop, Cassandra, and MongoDB, along with all traditional relational databases, Flat Files, XML data streams, and data stored in proprietary systems. Built for embedding, BIRT includes APIs for data access, chart generation, output formats, content execution, and integration within larger applications.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    giServer

    giServer

    giServer the easy to use and extensible batch and integration server

    ...Instead of using complex XML configuration files an elaborate GUI for batch job management is included. Some possible usage scenarios are: - Automatic processing of incoming data files - Big Data applications - Process automation - Data Mining/Aggregation applications - Automatic Reporting - Processing and analysis of database records
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Flamingo Project

    Flamingo Project

    Workflow Designer, Hive Editor, Pig Editor, File System Browser

    Flamingo is a open-source Big Data Platform that combine a Ajax Rich Web Interface + Workflow Engine + Workflow Designer + MapReduce + Hive Editor + Pig Editor. 1. Easy Tool for big data 2. Use comfortable in Hadoop EcoSystem projects 3. Based GPL V3 License Supporting Pig IDE, Hive IDE, HDFS Browser, Scheduler, Hadoop Job Monitoring, Workflow Engine, Workflow Designer, MapReduce.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    Big Sack

    Big Sack: A lightweight Java Key/Value store with undo and disk cache.

    Big Sack is a Java persistence mechanism that allows storage of key value pairs following the popular Big Data paradigms. Its a very simple and straightforward way to bridge the gap between in-memory data structures and long-term storage. It has the convenience of Java SDK TreeMap and TreeSet classes and is used the same easy way, but it includes rollback through undo logging to checkpoint data so it does not wind up in an unknown state regardless of failures. ...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB