hadoop free download - SourceForge

Showing 61 open source projects for "hadoop"

View related business solutions

Java Clear Filters & Widen Search

SKUDONET Open Source Load Balancer
Take advantage of Open Source Load Balancer to elevate your business security and IT infrastructure with a custom ADC Solution.

SKUDONET ADC, operates at the application layer, efficiently distributing network load and application load across multiple servers. This not only enhances the performance of your application but also ensures that your web servers can handle more traffic seamlessly.

Learn More
The next chapter in business mental wellness
Entrust your employee well-being to Calmerry's nationwide network of licensed mental health professionals.

Calmerry is beneficial for businesses of all sizes, particularly those in high-stress industries, organizations with remote teams, and HR departments seeking to improve employee well-being and productivity

Learn More
1

Apache HBase

Get random, realtime read/write access to your Big Data

... HBase provides Bigtable-like capabilities on top of Hadoop and HDFS. Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options. Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX. Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase tables.

Downloads: 6 This Week

Last Update: 2024-07-24
See Project
2

ANTLR

Parser generator to read, process, or translate structured text

... and Pig, the data warehouse and analysis systems for Hadoop, both use ANTLR. Lex Machina uses ANTLR for information extraction from legal texts. Oracle uses ANTLR within SQL Developer IDE and their migration tools. NetBeans IDE parses C++ with ANTLR. The HQL language in the Hibernate object-relational mapping framework is built with ANTLR.

Downloads: 11 This Week

Last Update: 2024-08-03
See Project
3

Apache Phoenix

Mirror of Apache Phoenix

Apache Phoenix is a SQL skin over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix enables OLTP and operational analytics in Hadoop for low latency applications by combining the best of both worlds. The power of standard SQL and JDBC APIs with full ACID transaction capabilities and the flexibility of late-bound, schema-on-read capabilities from the NoSQL world by leveraging HBase as its backing store. Apache Phoenix is fully...

Downloads: 0 This Week

Last Update: 2024-04-16
See Project
4

HugeGraph

A graph database that supports more than 100+ billion data

HugeGraph is a convenient, efficient, and adaptable graph database compatible with the Apache TinkerPop3 framework and the Gremlin query language. HugeGraph supports fast import performance in the case of more than 10 billion Vertices and Edges Graph, millisecond-level OLTP query capability, and can be integrated into big data platforms like Hadoop or Spark for OLAP analysis. The main scenarios of HugeGraph include correlation search, fraud detection, and knowledge graph. Not only supports...

Downloads: 1 This Week

Last Update: 2024-03-22
See Project
Recruit and Manage your Workforce
Evolia makes it easier to hire, schedule and track time worked by frontline in medium and large-sized businesses.

Evolia is a web and mobile platform that connects enterprises with 1000’s of local shift workers and offers free workforce scheduling and time and attendance solutions. Is your business on Evolia?

Learn More
5

Apache Drill

Apache Drill is a distributed MPP query layer for self describing data

Apache Drill is a distributed MPP query layer that supports SQL and alternative query languages against NoSQL and Hadoop data storage systems. It was inspired in part by Google's Dremel. Get faster insights without the overhead (data loading, schema creation and maintenance, transformations, etc.) Analyze the multi-structured and nested data in non-relational datastores directly without transforming or restricting the data. Leverage your existing SQL skillsets and BI tools including Tableau...

Downloads: 0 This Week

Last Update: 2024-05-17
See Project
6

Apache Hudi

Upserts, Deletes And Incremental Processing on Big Data

Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and Incrementals. Hudi manages the storage of large analytical datasets on DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage). Apache Hudi is a transactional data lake platform that brings database and data warehouse capabilities to the data lake. Hudi reimagines slow old-school batch data processing with a powerful new incremental processing framework for low latency minute-level analytics. Hudi provides...

Downloads: 0 This Week

Last Update: 2024-06-04
See Project
7

Genie

Distributed Big Data Orchestration Service

Genie is a completely open source distributed job orchestration engine developed by Netflix. Genie provides REST-ful APIs to run a variety of big data jobs like Hadoop, Pig, Hive, Spark, Presto, Sqoop and more. It also provides APIs for managing the metadata of many distributed processing clusters and the commands and applications which run on them.

Downloads: 0 This Week

Last Update: 2023-08-17
See Project
8

IoTDB

Apache IoTDB

Apache IoTDB (Database for Internet of Things) is an IoT native database with high performance for data management and analysis, deployable on the edge and the cloud. Due to its light-weight architecture, high performance and rich feature set together with its deep integration with Apache Hadoop, Spark and Flink, Apache IoTDB can meet the requirements of massive data storage, high-speed data ingestion and complex data analysis in the IoT industrial fields. In the scene of factories...

Downloads: 0 This Week

Last Update: 2024-06-27
See Project
9

JRecord

Read Cobol data files in Java

provide Java Record based IO routines for Fixed Width (including Text, Mainframe, Cobol and Binary) and delimited Flat files via a Record Layout (Cobol, CSV or XML). The source is now available at https://github.com/bmTas/JRecord Projects using JRecord include: * https://github.com/thospfuller/rcoboldi - Cobol File in R * https://github.com/tmalaska/CopybookInputFormat - Cobol files in Hadoop * https://github.com/gss2002/copybook_formatter * https://github.com/gss2002/ftp2hdfs has some...

9 Reviews

Downloads: 29 This Week

Last Update: 2024-08-04
See Project
Enterprise AI Search, Intranet, and Wiki in one platform.
Your company’s all-in-one solution for trusted information

Cut through the noise and end information overload with Guru, an all-in-one wiki, intranet, and knowledge base that serves as your company's single source of truth.

Learn More
10

OpenTSDB

A scalable, distributed time series database

OpenTSDB is a distributed, scalable Time Series Database (TSDB) written on top of HBase. OpenTSDB was written to address a common need: store, index and serve metrics collected from computer systems (network gear, operating systems, applications) at a large scale, and make this data easily accessible and graphable. Store and serve massive amounts of time series data without losing granularity. Generate graphs from the GUI, pull from the HTTP API, choose an open source front-end. OpenTSDB...

Downloads: 0 This Week

Last Update: 2021-12-10
See Project
11

spatial-framework-for-hadoop

The Spatial Framework for Hadoop allows developers

The Spatial Framework for Hadoop allows developers and data scientists to use the Hadoop data processing system for spatial data analysis. For tools, samples, and tutorials that use this framework, head over to GIS Tools for Hadoop. At the root level of this repository, you can build a single jar with everything in the framework using Apache Ant. Alternatively, you can build a jar at the root level of each framework component. Custom MapReduce jobs that use the Esri Geometry API require...

Downloads: 0 This Week

Last Update: 2023-06-12
See Project
12

geometry-api-java

The Esri Geometry API for Java enables developers to write apps

The Esri Geometry API for Java can be used to enable spatial data processing in 3rd-party data-processing solutions. Developers of custom MapReduce-based applications for Hadoop can use this API for spatial processing of data in the Hadoop system. The API is also used by the Hive UDF’s and could be used by developers building geometry functions for 3rd-party applications such as Cassandra, HBase, Storm and many other Java-based “big data” applications.

Downloads: 1 This Week

Last Update: 2023-06-12
See Project
13

Open Source Data Quality and Profiling

World's first open source data quality & data preparation project

..., Meta Data Discovery, Anomaly Discovery, Data Cleansing, Reporting and Analytic. It also had Hadoop ( Big data ) support to move files to/from Hadoop Grid, Create, Load and Profile Hive Tables. This project is also known as "Aggregate Profiler" Resful API for this project is getting built as (Beta Version) https://sourceforge.net/projects/restful-api-for-osdq/ apache spark based data quality is getting built at https://sourceforge.net/projects/apache-spark-osdq/

8 Reviews

Downloads: 64 This Week

Last Update: 2021-01-20
See Project
14

Oryx

Lambda architecture on Apache Spark, Apache Kafka for real-time

Oryx 2 is a realization of the lambda architecture built on Apache Spark and Apache Kafka, but with specialization for real-time large-scale machine learning. It is a framework for building applications but also includes packaged, end-to-end applications for collaborative filtering, classification, regression and clustering. The application is written in Java, using Apache Spark, Hadoop, Tomcat, Kafka, Zookeeper and more. Configuration uses a single Typesafe Config config file, wherein...

Downloads: 0 This Week

Last Update: 2023-08-16
See Project
15

HSRA

Hadoop spliced read aligner for RNA-seq data

HSRA is a MapReduce-based parallel tool for mapping reads from RNA sequencing (RNA-seq) experiments. RNA-seq analyses typically begin by mapping reads to a reference genome in order to determine the location from which the reads were originated, which is a very time-consuming step. This tool allows bioinformatics researchers to efficiently distribute their mapping tasks over the nodes of a cluster by combining a fast multithreaded spliced aligner (HISAT2) with Apache Hadoop, which...

Downloads: 0 This Week

Last Update: 2019-01-23
See Project
16

MarDRe

MapReduce-based tool to remove duplicate DNA reads

..., MarDRe takes advantage of the MapReduce programming model to significantly improve ParDRe performance on distributed systems, especially on cloud-based infrastructures. Written in pure Java to maximize cross-platform compatibility, MarDRe is built upon the open-source Apache Hadoop project, the most popular distributed computing framework for Big Data processing.

Downloads: 0 This Week

Last Update: 2019-01-23
See Project
17

apache spark data pipeline osDQ

osDQ dedicated to create apache spark based data pipeline using JSON

... file Windows : java -cp .\lib\*;osdq-spark-0.0.1.jar org.arrah.framework.spark.run.TransformRunner -c .\example\samplerun.json Mac UNIX java -cp ./lib/*:./osdq-spark-0.0.1.jar org.arrah.framework.spark.run.TransformRunner -c ./example/samplerun.json For those on windows, you need to have hadoop distribtion unzipped on local drive and HADOOP_HOME set. Also copy winutils.exe from here into HADOOP_HOME\bin

Downloads: 0 This Week

Last Update: 2019-01-20
See Project
18

X-RIME

X-RIME is a open source project devoted to provide Hadoop based solution for large scale social network analysis.

Downloads: 0 This Week

Last Update: 2018-11-26
See Project
19

Easy Machine Learning

Easy Machine Learning is a general-purpose dataflow-based system

Machine learning algorithms have become the key components in many big data applications. However, the full potential of machine learning is still far from being realized because using machine learning algorithms is hard, especially on distributed platforms such as Hadoop and Spark. The key barriers come from not only the implementation of the algorithms themselves but also the processing for applying them to real applications which often involve multiple steps and different algorithms. Our...

Downloads: 0 This Week

Last Update: 2 days ago
See Project
20

slimgrid

Slimgrid is a Java library for grid computations which is lighter than other ones (JPPF, Hadoop, ...). The main design goals are: minimalism, simplicity, pervasiveness. If you need to grab something which does not require you to comprehend massive and complex API's, do exhaustive configurations and installations, is robust and reliable, uses just one port for all management and communication, then SlimGrid is the right choice. The SlimGrid is built on top of the Apache's ZooKeeper library...

Downloads: 0 This Week

Last Update: 2018-07-24
See Project
21

Voldemort

A distributed key-value storage system

Voldemort is a distributed database that’s an open source clone of Amazon’s Dynamo. It automatically replicates data over multiple servers, and automatically partitions them as well so each server only contains a subset of the total data. It offers many other features such as pluggable serialization support, data item versioning and an SSD Optimized Read Write storage engine. Voldemort is not a relational database or an object database. It is essentially a big, distributed, persistent,...

Downloads: 1 This Week

Last Update: 2020-07-16
See Project
22

zk_idgen

An Apache Zookeeper-based utility for assigning unique, sequential ID numbers in a distributed system (such as a Hadoop Map/Reduce job).

1 Review

Downloads: 0 This Week

Last Update: 2017-05-24
See Project
23

HareDB HBase Client

GUI Tools for HBase (including PIG and high speed Hive Query)

Most people are not familiar with command mode. However, there is only command mode in the world of Hadoop and HBase. For the reason above, we are focusing on developing a set of tools, “HBase Client”, which can be used more easily and having a more friendly interface.

2 Reviews

Downloads: 1 This Week

Last Update: 2017-04-24
See Project
24

RSS Atom Feed Analytics With MapReduce

This is a data analytics project for RSS feeds using hadoop MapReduce

This project accepts the output of jatomrss project as the input. It applies the MR logic on the same to perform the analytics

Downloads: 0 This Week

Last Update: 2016-09-24
See Project
25

Hadoop-BAM

Hadoop-BAM is a Java library for the manipulation of files in common bioinformatics formats using the Hadoop MapReduce framework with the Picard SAM JDK, and command line tools similar to SAMtools. The file formats currently supported are BAM, SAM, FASTQ, FASTA, QSEQ, BCF, and VCF. For a longer high-level description of Hadoop-BAM, refer to the article "Hadoop-BAM: directly manipulating next generation sequencing data in the cloud" in Bioinformatics Volume 28 Issue 6 pp. 876-877, available...

2 Reviews

Downloads: 0 This Week

Last Update: 2016-04-18
See Project