Showing 62 open source projects for "mapreduce"

View related business solutions
  • Get Advanced Threat Protection for Your Azure Workloads Icon
    Get Advanced Threat Protection for Your Azure Workloads

    FortiGate NGFW on Azure Enables You to Protect Your Workloads Beyond Basic Azure Security Services

    FortiGate NGFW identifies and stops advanced threats with powerful application control, malware protection, web filtering, antivirus, and IPS technology. As the attack surface expands, FortiGate provides integrated and automated protection against emerging and sophisticated threats while securing hybrid or multi-cloud environments. Deploy today in Azure Marketplace.
  • Cloud data warehouse to power your data-driven innovation Icon
    Cloud data warehouse to power your data-driven innovation

    BigQuery is a serverless and cost-effective enterprise data warehouse that works across clouds and scales with your data.

    BigQuery Studio provides a single, unified interface for all data practitioners of various coding skills to simplify analytics workflows from data ingestion and preparation to data exploration and visualization to ML model creation and use. It also allows you to use simple SQL to access Vertex AI foundational models directly inside BigQuery for text processing tasks, such as sentiment analysis, entity extraction, and many more without having to deal with specialized models.
  • 1
    Apache HBase

    Apache HBase

    Get random, realtime read/write access to your Big Data

    ... HBase provides Bigtable-like capabilities on top of Hadoop and HDFS. Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options. Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX. Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase tables.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 2
    PowerJob

    PowerJob

    Enterprise job scheduling middleware with distributed computing

    ... stand-alone, broadcast, Map and MapReduce. Distributed computing resources could be utilized in MapReduce mode, try the magic out here! Both job dependency management and data communications between jobs are supported. Developers can write their processors in Java, Shell, Python, and will subsequently support multilingual scheduling via HTTP.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    DTail

    DTail

    DTail is a distributed DevOps tool for tailing, grepping, catting logs

    DTail (a distributed tail program) is a DevOps tool for engineers programmed in Google Go for following (tailing), catting and grepping (including gzip and zstd decompression support) log files on many machines concurrently. An advanced feature of DTail is to execute distributed MapReduce aggregations across many devices. For secure authorization and transport encryption, the SSH protocol is used. Furthermore, DTail respects the UNIX file system permission model (traditional on all Linux/UNIX...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Luigi

    Luigi

    Python module that helps you build complex pipelines of batch jobs

    ... jobs, dumping data to/from databases, running machine learning algorithms, or anything else. You can build pretty much any task you want, but Luigi also comes with a toolbox of several common task templates that you use. It includes support for running Python mapreduce jobs in Hadoop, as well as Hive, and Pig, jobs. It also comes with file system abstractions for HDFS, and local files that ensures all file system operations are atomic.
    Downloads: 0 This Week
    Last Update:
    See Project
  • PRTG Network Monitor | Making the lives of sysadmins easier Icon
    PRTG Network Monitor | Making the lives of sysadmins easier

    Stay ahead of IT infrastructure issues

    PRTG Network Monitor is an all-inclusive monitoring software solution developed by Paessler. Equipped with an easy-to-use, intuitive interface with a cutting-edge monitoring engine, PRTG Network Monitor optimizes connections and workloads as well as reduces operational costs by avoiding outages while saving time and controlling service level agreements (SLAs). The solution is packed with specialized monitoring features that include flexible alerting, cluster failover solution, distributed monitoring, in-depth reporting, maps and dashboards, and more.
  • 5

    JRecord

    Read Cobol data files in Java

    ... code that allows ftping RDW files directly from the Mainframe into Hadoop/HDFS as a mapreduce job or standalone client.
    Leader badge
    Downloads: 23 This Week
    Last Update:
    See Project
  • 6

    SkePi

    Data parallel and stream parallel skeletons implemented in erlang.

    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    spatial-framework-for-hadoop

    spatial-framework-for-hadoop

    The Spatial Framework for Hadoop allows developers

    The Spatial Framework for Hadoop allows developers and data scientists to use the Hadoop data processing system for spatial data analysis. For tools, samples, and tutorials that use this framework, head over to GIS Tools for Hadoop. At the root level of this repository, you can build a single jar with everything in the framework using Apache Ant. Alternatively, you can build a jar at the root level of each framework component. Custom MapReduce jobs that use the Esri Geometry API require...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    geometry-api-java

    geometry-api-java

    The Esri Geometry API for Java enables developers to write apps

    The Esri Geometry API for Java can be used to enable spatial data processing in 3rd-party data-processing solutions. Developers of custom MapReduce-based applications for Hadoop can use this API for spatial processing of data in the Hadoop system. The API is also used by the Hive UDF’s and could be used by developers building geometry functions for 3rd-party applications such as Cassandra, HBase, Storm and many other Java-based “big data” applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Gizmo Microservice Toolkit

    Gizmo Microservice Toolkit

    A Microservice Toolkit from The New York Times

    At The New York Times, our development teams have been adopting the Go programming language over the last three years to build better back-end services. In the past I’ve written about using Go for Elastic MapReduce streaming. I’ve also talked about using Go at GothamGo for news analysis and to improve our email and alert systems at the Golang NYC Meetup. We use Go for a wide variety of tasks, but the most common use throughout the company is for building JSON APIs. When we first began building...
    Downloads: 0 This Week
    Last Update:
    See Project
  • ThermoGrid Contractor Management Software Icon
    ThermoGrid Contractor Management Software

    ThermoGrid is a specialized contractor management software tool for managing field service operations

    Nail down how you manage your day-to-day and level up your services. Whether you are a plumber, electrician, or HVAC technician, ThermoGrid brings together all areas of your business so you can get the job done right.
  • 10
    rq

    rq

    A tool for doing record analysis and transformation

    This is the home of the tool called rq (record query). It's a tool that's used for performing queries on streams of records in various formats. The goal is to make ad-hoc exploration of data sets easy without having to use more heavy-weight tools like SQL/MapReduce/custom programs. rq fills a similar niche as tools like awk or sed, but works with structured (record) data instead of text. It was created with love out of the best parts of Rust, and is distributed as a dependency-free binary...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11

    HSRA

    Hadoop spliced read aligner for RNA-seq data

    HSRA is a MapReduce-based parallel tool for mapping reads from RNA sequencing (RNA-seq) experiments. RNA-seq analyses typically begin by mapping reads to a reference genome in order to determine the location from which the reads were originated, which is a very time-consuming step. This tool allows bioinformatics researchers to efficiently distribute their mapping tasks over the nodes of a cluster by combining a fast multithreaded spliced aligner (HISAT2) with Apache Hadoop, which...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12

    MarDRe

    MapReduce-based tool to remove duplicate DNA reads

    MarDRe is a de novo MapReduce-based parallel tool to remove duplicate and near-duplicate DNA reads through the clustering of single-end and paired-end sequences from FASTQ/FASTA datasets. This tool allows bioinformatics to avoid the analysis of not necessary reads, reducing the time of subsequent procedures with the dataset. MarDRe is the Big Data counterpart of ParDRe (link above), which employs HPC technologies (i.e., hybrid MPI/multithreading) to reduce runtime on multicore systems. Instead...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13

    spark-msna

    Algorithm on Spark for aligning multiple similar DNA/RNA sequences

    The algorithm uses suffix tree for identifying common substrings and uses a modified Needleman-Wunsch algorithm for pairwise alignments. In order to improve the efficiency of pairwise alignments, an unsupervised learning based on clustering technique is used to create a knowledge base to guide them.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    EventQL

    EventQL

    Distributed "massively parallel" SQL query engine

    EventQL is a distributed, column-oriented database built for large-scale event collection and analytics. It runs super-fast SQL and MapReduce queries. The community software … the ideal channel for companies and organizations looking for additional interactions with their community? The first AC Repair appeared in the Best AC Repair Miami research landscape as early as the end of the 2000s, but the great added value offered by these HVAC companies in Miami was not recognized or even questioned...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15

    ParDRe

    Parallel tool to remove duplicate DNA reads

    ... of cores and, thanks to the message-passing technology, it can be executed on clusters. There also exists a MapReduce counterpart of ParDRe, called MarDRe (see the link above). UPDATE: From version 2.0.5 ParDRe also provides support to remove only optical duplicates (and leave biologically interesting duplicates) as well as to work with compressed input/output with .gz format.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 16

    RSS Atom Feed Analytics With MapReduce

    This is a data analytics project for RSS feeds using hadoop MapReduce

    This project accepts the output of jatomrss project as the input. It applies the MR logic on the same to perform the analytics
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    MIREX
    MIREX (MapReduce Information Retrieval Experiments) provides solutions to easily and quickly run large-scale information retrieval experiments on a cluster of machines using Hadoop. Version 0.3 has tools for the TREC ClueWeb09 and ClueWeb12 collections.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18

    owl reasoning over big biomedical data

    A OWL reasoning framework for the analysis of big biomedical data

    A general OWL reasoning framework for the analysis of big biomedical data and implement a MapReduce-based property chain reasoning prototype system. OWL reasoning method is ideally suitable for problems involved complex semantic associations because it is able to infer logical consequences based on a set of asserted rules or axioms. MapReduce framework isused to solve the problem of scalability. In our experiment, we focus on the discovery of associations between Traditional Chinese Medicine...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    ankus

    ankus

    Data Mining and Machine Learning Algorithms based on MapReduce

    [The feature of ankus] * ankus is a 'web-based big data mining project and tool'. - MapReduce-based data mining/machine learning algorithms library - Hadoop-based distributed bigdata system - offering a web-based GUI for easy use [The ankus project & License] * The ankus project consists of three as an open source. * ankus has Dual licensed under the community and commercial licenses. * community license is following GPLv3 - Some algorithms in Core Project do not under the OSS...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    hmrjp-maven-plugin

    hmrjp-maven-plugin

    Hadoop mapreduce maven plugin

    hmrjp-maven-plugin is a maven plugin which helps creating, running and verifying hadoop mapreduce jobs remotely just like any other java project which is built using maven.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    CliqueSquare

    CliqueSquare

    Distributed RDF Processing over Hadoop

    CliqueSquare is a system for storing and querying large RDF graphs relying on Hadoop’s distributed file system (HDFS) and Hadoop’s MapReduce open-source implementation. It provides a novel partitioning and storage scheme that permits 1-level joins to be evaluated locally using efficient map-only joins. In addition, CliqueSquare is equipped with a unique optimization algorithm based on graphs and cliques capable of generating highly parallelizable flat query plans relying on n-ary equality joins.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Hadoop-BAM is a Java library for the manipulation of files in common bioinformatics formats using the Hadoop MapReduce framework with the Picard SAM JDK, and command line tools similar to SAMtools. The file formats currently supported are BAM, SAM, FASTQ, FASTA, QSEQ, BCF, and VCF. For a longer high-level description of Hadoop-BAM, refer to the article "Hadoop-BAM: directly manipulating next generation sequencing data in the cloud" in Bioinformatics Volume 28 Issue 6 pp. 876-877, available...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    MapReduce++
    MapReduce++ is a project for implementation of parallel algorithms. It has currently two C++ implementations of the MapReduce abstraction: the MapMP library (multiprocessors) and the MaPI framework (multicomputers).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24

    Cloud2Sim

    Adaptive and Distributed Architecture for Cloud/MapReduce Simulations

    An Adaptive and Distributed Architecture for Cloud and MapReduce Algorithms and Simulations. Please cite the below papers, if you used this project or referred to this in your work. Kathiravelu, P. & L. Veiga (2014). An Adaptive Distributed Simulator for Cloud and MapReduce Algorithms and Architectures. In IEEE/ACM 7th International Conference on Utility and Cloud Computing (UCC 2014), London, UK. pp. 79 – 88. IEEE Computer Society. Kathiravelu, P. & L. Veiga (2014). Concurrent...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Mr.FSM

    Mr.FSM

    Large-Scale Frequent Subgraph Mining in MapReduce

    This is the program used in the following paper: Wenqing Lin, Xiaokui Xiao, and Gabriel Ghinita. Large-Scale Frequent Subgraph Mining in MapReduce. In Proceedings of the 30th IEEE International Conference on Data Engineering (ICDE), pages 844-855, 2014. Please cite the paper if you choose to use the program. If having any problems, please report to {wlin1 at ntu dot edu dot sg}.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next