Showing 61 open source projects for "hadoop"

View related business solutions
  • SKUDONET Open Source Load Balancer Icon
    SKUDONET Open Source Load Balancer

    Take advantage of Open Source Load Balancer to elevate your business security and IT infrastructure with a custom ADC Solution.

    SKUDONET ADC, operates at the application layer, efficiently distributing network load and application load across multiple servers. This not only enhances the performance of your application but also ensures that your web servers can handle more traffic seamlessly.
  • The next chapter in business mental wellness Icon
    The next chapter in business mental wellness

    Entrust your employee well-being to Calmerry's nationwide network of licensed mental health professionals.

    Calmerry is beneficial for businesses of all sizes, particularly those in high-stress industries, organizations with remote teams, and HR departments seeking to improve employee well-being and productivity
  • 1
    Apache HBase

    Apache HBase

    Get random, realtime read/write access to your Big Data

    ... HBase provides Bigtable-like capabilities on top of Hadoop and HDFS. Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options. Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX. Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase tables.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 2
    ANTLR

    ANTLR

    Parser generator to read, process, or translate structured text

    ... and Pig, the data warehouse and analysis systems for Hadoop, both use ANTLR. Lex Machina uses ANTLR for information extraction from legal texts. Oracle uses ANTLR within SQL Developer IDE and their migration tools. NetBeans IDE parses C++ with ANTLR. The HQL language in the Hibernate object-relational mapping framework is built with ANTLR.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 3
    Apache Phoenix

    Apache Phoenix

    Mirror of Apache Phoenix

    Apache Phoenix is a SQL skin over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix enables OLTP and operational analytics in Hadoop for low latency applications by combining the best of both worlds. The power of standard SQL and JDBC APIs with full ACID transaction capabilities and the flexibility of late-bound, schema-on-read capabilities from the NoSQL world by leveraging HBase as its backing store. Apache Phoenix is fully...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    HugeGraph

    HugeGraph

    A graph database that supports more than 100+ billion data

    HugeGraph is a convenient, efficient, and adaptable graph database compatible with the Apache TinkerPop3 framework and the Gremlin query language. HugeGraph supports fast import performance in the case of more than 10 billion Vertices and Edges Graph, millisecond-level OLTP query capability, and can be integrated into big data platforms like Hadoop or Spark for OLAP analysis. The main scenarios of HugeGraph include correlation search, fraud detection, and knowledge graph. Not only supports...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Recruit and Manage your Workforce Icon
    Recruit and Manage your Workforce

    Evolia makes it easier to hire, schedule and track time worked by frontline in medium and large-sized businesses.

    Evolia is a web and mobile platform that connects enterprises with 1000’s of local shift workers and offers free workforce scheduling and time and attendance solutions. Is your business on Evolia?
  • 5
    Apache Drill

    Apache Drill

    Apache Drill is a distributed MPP query layer for self describing data

    Apache Drill is a distributed MPP query layer that supports SQL and alternative query languages against NoSQL and Hadoop data storage systems. It was inspired in part by Google's Dremel. Get faster insights without the overhead (data loading, schema creation and maintenance, transformations, etc.) Analyze the multi-structured and nested data in non-relational datastores directly without transforming or restricting the data. Leverage your existing SQL skillsets and BI tools including Tableau...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Apache Hudi

    Apache Hudi

    Upserts, Deletes And Incremental Processing on Big Data

    Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and Incrementals. Hudi manages the storage of large analytical datasets on DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage). Apache Hudi is a transactional data lake platform that brings database and data warehouse capabilities to the data lake. Hudi reimagines slow old-school batch data processing with a powerful new incremental processing framework for low latency minute-level analytics. Hudi provides...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Genie

    Genie

    Distributed Big Data Orchestration Service

    Genie is a completely open source distributed job orchestration engine developed by Netflix. Genie provides REST-ful APIs to run a variety of big data jobs like Hadoop, Pig, Hive, Spark, Presto, Sqoop and more. It also provides APIs for managing the metadata of many distributed processing clusters and the commands and applications which run on them.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    IoTDB

    IoTDB

    Apache IoTDB

    Apache IoTDB (Database for Internet of Things) is an IoT native database with high performance for data management and analysis, deployable on the edge and the cloud. Due to its light-weight architecture, high performance and rich feature set together with its deep integration with Apache Hadoop, Spark and Flink, Apache IoTDB can meet the requirements of massive data storage, high-speed data ingestion and complex data analysis in the IoT industrial fields. In the scene of factories...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9

    JRecord

    Read Cobol data files in Java

    provide Java Record based IO routines for Fixed Width (including Text, Mainframe, Cobol and Binary) and delimited Flat files via a Record Layout (Cobol, CSV or XML). The source is now available at https://github.com/bmTas/JRecord Projects using JRecord include: * https://github.com/thospfuller/rcoboldi - Cobol File in R * https://github.com/tmalaska/CopybookInputFormat - Cobol files in Hadoop * https://github.com/gss2002/copybook_formatter * https://github.com/gss2002/ftp2hdfs has some...
    Downloads: 29 This Week
    Last Update:
    See Project
  • Enterprise AI Search, Intranet, and Wiki in one platform. Icon
    Enterprise AI Search, Intranet, and Wiki in one platform.

    Your company’s all-in-one solution for trusted information

    Cut through the noise and end information overload with Guru, an all-in-one wiki, intranet, and knowledge base that serves as your company's single source of truth.
  • 10
    OpenTSDB

    OpenTSDB

    A scalable, distributed time series database

    OpenTSDB is a distributed, scalable Time Series Database (TSDB) written on top of HBase. OpenTSDB was written to address a common need: store, index and serve metrics collected from computer systems (network gear, operating systems, applications) at a large scale, and make this data easily accessible and graphable. Store and serve massive amounts of time series data without losing granularity. Generate graphs from the GUI, pull from the HTTP API, choose an open source front-end. OpenTSDB...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    spatial-framework-for-hadoop

    spatial-framework-for-hadoop

    The Spatial Framework for Hadoop allows developers

    The Spatial Framework for Hadoop allows developers and data scientists to use the Hadoop data processing system for spatial data analysis. For tools, samples, and tutorials that use this framework, head over to GIS Tools for Hadoop. At the root level of this repository, you can build a single jar with everything in the framework using Apache Ant. Alternatively, you can build a jar at the root level of each framework component. Custom MapReduce jobs that use the Esri Geometry API require...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    geometry-api-java

    geometry-api-java

    The Esri Geometry API for Java enables developers to write apps

    The Esri Geometry API for Java can be used to enable spatial data processing in 3rd-party data-processing solutions. Developers of custom MapReduce-based applications for Hadoop can use this API for spatial processing of data in the Hadoop system. The API is also used by the Hive UDF’s and could be used by developers building geometry functions for 3rd-party applications such as Cassandra, HBase, Storm and many other Java-based “big data” applications.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Open Source Data Quality and Profiling

    Open Source Data Quality and Profiling

    World's first open source data quality & data preparation project

    ..., Meta Data Discovery, Anomaly Discovery, Data Cleansing, Reporting and Analytic. It also had Hadoop ( Big data ) support to move files to/from Hadoop Grid, Create, Load and Profile Hive Tables. This project is also known as "Aggregate Profiler" Resful API for this project is getting built as (Beta Version) https://sourceforge.net/projects/restful-api-for-osdq/ apache spark based data quality is getting built at https://sourceforge.net/projects/apache-spark-osdq/
    Leader badge
    Downloads: 64 This Week
    Last Update:
    See Project
  • 14
    Oryx

    Oryx

    Lambda architecture on Apache Spark, Apache Kafka for real-time

    Oryx 2 is a realization of the lambda architecture built on Apache Spark and Apache Kafka, but with specialization for real-time large-scale machine learning. It is a framework for building applications but also includes packaged, end-to-end applications for collaborative filtering, classification, regression and clustering. The application is written in Java, using Apache Spark, Hadoop, Tomcat, Kafka, Zookeeper and more. Configuration uses a single Typesafe Config config file, wherein...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15

    HSRA

    Hadoop spliced read aligner for RNA-seq data

    HSRA is a MapReduce-based parallel tool for mapping reads from RNA sequencing (RNA-seq) experiments. RNA-seq analyses typically begin by mapping reads to a reference genome in order to determine the location from which the reads were originated, which is a very time-consuming step. This tool allows bioinformatics researchers to efficiently distribute their mapping tasks over the nodes of a cluster by combining a fast multithreaded spliced aligner (HISAT2) with Apache Hadoop, which...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16

    MarDRe

    MapReduce-based tool to remove duplicate DNA reads

    ..., MarDRe takes advantage of the MapReduce programming model to significantly improve ParDRe performance on distributed systems, especially on cloud-based infrastructures. Written in pure Java to maximize cross-platform compatibility, MarDRe is built upon the open-source Apache Hadoop project, the most popular distributed computing framework for Big Data processing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    apache spark data pipeline osDQ

    apache spark data pipeline osDQ

    osDQ dedicated to create apache spark based data pipeline using JSON

    ... file Windows : java -cp .\lib\*;osdq-spark-0.0.1.jar org.arrah.framework.spark.run.TransformRunner -c .\example\samplerun.json Mac UNIX java -cp ./lib/*:./osdq-spark-0.0.1.jar org.arrah.framework.spark.run.TransformRunner -c ./example/samplerun.json For those on windows, you need to have hadoop distribtion unzipped on local drive and HADOOP_HOME set. Also copy winutils.exe from here into HADOOP_HOME\bin
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    X-RIME is a open source project devoted to provide Hadoop based solution for large scale social network analysis.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Easy Machine Learning

    Easy Machine Learning

    Easy Machine Learning is a general-purpose dataflow-based system

    Machine learning algorithms have become the key components in many big data applications. However, the full potential of machine learning is still far from being realized because using machine learning algorithms is hard, especially on distributed platforms such as Hadoop and Spark. The key barriers come from not only the implementation of the algorithms themselves but also the processing for applying them to real applications which often involve multiple steps and different algorithms. Our...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Slimgrid is a Java library for grid computations which is lighter than other ones (JPPF, Hadoop, ...). The main design goals are: minimalism, simplicity, pervasiveness. If you need to grab something which does not require you to comprehend massive and complex API's, do exhaustive configurations and installations, is robust and reliable, uses just one port for all management and communication, then SlimGrid is the right choice. The SlimGrid is built on top of the Apache's ZooKeeper library...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Voldemort

    Voldemort

    A distributed key-value storage system

    Voldemort is a distributed database that’s an open source clone of Amazon’s Dynamo. It automatically replicates data over multiple servers, and automatically partitions them as well so each server only contains a subset of the total data. It offers many other features such as pluggable serialization support, data item versioning and an SSD Optimized Read Write storage engine. Voldemort is not a relational database or an object database. It is essentially a big, distributed, persistent,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    An Apache Zookeeper-based utility for assigning unique, sequential ID numbers in a distributed system (such as a Hadoop Map/Reduce job).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    HareDB HBase Client

    HareDB HBase Client

    GUI Tools for HBase (including PIG and high speed Hive Query)

    Most people are not familiar with command mode. However, there is only command mode in the world of Hadoop and HBase. For the reason above, we are focusing on developing a set of tools, “HBase Client”, which can be used more easily and having a more friendly interface.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24

    RSS Atom Feed Analytics With MapReduce

    This is a data analytics project for RSS feeds using hadoop MapReduce

    This project accepts the output of jatomrss project as the input. It applies the MR logic on the same to perform the analytics
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Hadoop-BAM is a Java library for the manipulation of files in common bioinformatics formats using the Hadoop MapReduce framework with the Picard SAM JDK, and command line tools similar to SAMtools. The file formats currently supported are BAM, SAM, FASTQ, FASTA, QSEQ, BCF, and VCF. For a longer high-level description of Hadoop-BAM, refer to the article "Hadoop-BAM: directly manipulating next generation sequencing data in the cloud" in Bioinformatics Volume 28 Issue 6 pp. 876-877, available...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next