Showing 16 open source projects for "hadoop"

View related business solutions
  • Top-Rated Free CRM Software Icon
    Top-Rated Free CRM Software

    216,000+ customers in over 135 countries grow their businesses with HubSpot

    HubSpot is an AI-powered customer platform with all the software, integrations, and resources you need to connect your marketing, sales, and customer service. HubSpot's connected platform enables you to grow your business faster by focusing on what matters most: your customers.
  • Business Continuity Solutions | ConnectWise BCDR Icon
    Business Continuity Solutions | ConnectWise BCDR

    Build a foundation for data security and disaster recovery to fit your clients’ needs no matter the budget.

    Whether natural disaster, cyberattack, or plain-old human error, data can disappear in the blink of an eye. ConnectWise BCDR (formerly Recover) delivers reliable and secure backup and disaster recovery backed by powerful automation and a 24/7 NOC to get your clients back to work in minutes, not days.
  • 1
    Luigi

    Luigi

    Python module that helps you build complex pipelines of batch jobs

    Luigi is a Python (3.6, 3.7, 3.8, 3.9 tested) package that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, and much more. The purpose of Luigi is to address all the plumbing typically associated with long-running batch processes. You want to chain many tasks, automate them, and failures will happen. These tasks can be anything, but are typically long running things like Hadoop...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 2
    Apache Drill

    Apache Drill

    Apache Drill is a distributed MPP query layer for self describing data

    Apache Drill is a distributed MPP query layer that supports SQL and alternative query languages against NoSQL and Hadoop data storage systems. It was inspired in part by Google's Dremel. Get faster insights without the overhead (data loading, schema creation and maintenance, transformations, etc.) Analyze the multi-structured and nested data in non-relational datastores directly without transforming or restricting the data. Leverage your existing SQL skillsets and BI tools including Tableau...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    ANTLR

    ANTLR

    Parser generator to read, process, or translate structured text

    ... and Pig, the data warehouse and analysis systems for Hadoop, both use ANTLR. Lex Machina uses ANTLR for information extraction from legal texts. Oracle uses ANTLR within SQL Developer IDE and their migration tools. NetBeans IDE parses C++ with ANTLR. The HQL language in the Hibernate object-relational mapping framework is built with ANTLR.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    SageMaker Spark

    SageMaker Spark

    A Spark library for Amazon SageMaker

    ... trained models, and, if you have your own ML algorithms built into SageMaker compatible Docker containers, you can use SageMaker Spark to train and infer on DataFrames with your own algorithms -- all at Spark scale. SageMaker Spark depends on hadoop-aws-2.8.1. To run Spark applications that depend on SageMaker Spark, you need to build Spark with Hadoop 2.8. However, if you are running Spark applications on EMR, you can use Spark built with Hadoop 2.7.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Discover Multiview ERP: The Financial Management Revolution Icon
    Discover Multiview ERP: The Financial Management Revolution

    Reclaim precious moments with loved ones while our robust cloud accounting software streamlines your financial processes.

    Built for growing businesses and well-established enterprises alike, Multiview is a highly scalable and robust ERP.
  • 5
    XGBoost

    XGBoost

    Scalable and Flexible Gradient Boosting

    ... can be used for Python, Java, Scala, R, C++ and more. It can run on a single machine, Hadoop, Spark, Dask, Flink and most other distributed environments, and is capable of solving problems beyond billions of examples.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    Slimgrid is a Java library for grid computations which is lighter than other ones (JPPF, Hadoop, ...). The main design goals are: minimalism, simplicity, pervasiveness. If you need to grab something which does not require you to comprehend massive and complex API's, do exhaustive configurations and installations, is robust and reliable, uses just one port for all management and communication, then SlimGrid is the right choice. The SlimGrid is built on top of the Apache's ZooKeeper library...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    An Apache Zookeeper-based utility for assigning unique, sequential ID numbers in a distributed system (such as a Hadoop Map/Reduce job).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Hadoop-BAM is a Java library for the manipulation of files in common bioinformatics formats using the Hadoop MapReduce framework with the Picard SAM JDK, and command line tools similar to SAMtools. The file formats currently supported are BAM, SAM, FASTQ, FASTA, QSEQ, BCF, and VCF. For a longer high-level description of Hadoop-BAM, refer to the article "Hadoop-BAM: directly manipulating next generation sequencing data in the cloud" in Bioinformatics Volume 28 Issue 6 pp. 876-877...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9

    CSVTOHIVE

    Generate Hive Scripts Automatically from CSV Files

    Generates Hive Scripts Automatically from a CSV Files. 1. Script copies csv files to Hadoop Files System. 2. Generates CREATE statements to create tables. 3. Generates .hive files in the same folder as that of csv folder and also generates run.sh with all consolidated files. So just switch to the folder where .hive scripts are residing and run run.sh (./run.sh). This tool will also set execute permissions on .hive and run.sh scripts so you can directly execute run.sh.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Propelling Payments for Software Platforms Icon
    Propelling Payments for Software Platforms

    For SaaS businesses to monetize payments through its turnkey PayFac-as-a-Service solution.

    Exact Payments delivers easy-to-integrate embedded payment solutions enabling you to rapidly onboard merchants, instantly activate a variety of payment methods and accelerate your revenue — delivering an end-to-end payment processing platform for SaaS businesses.
  • 10
    Aspose for Hadoop

    Aspose for Hadoop

    This project holds source code for Aspose for Hadoop project.

    Aspose for Hadoop project enables Apache Hadoop / MapReduce developers to work with various binary file formats. The developers can create and convert binary sequence files into text sequence files.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    BIRT Report Designer

    BIRT Report Designer

    Open Source Reporting & Data Visualization Platform

    .... With a flexible Open Data Access framework, developers can write custom data drivers to access data from any source, including Big Data sources like Apache Hadoop, Cassandra, and MongoDB, along with all traditional relational databases, Flat Files, XML data streams, and data stored in proprietary systems. Built for embedding, BIRT includes APIs for data access, chart generation, output formats, content execution, and integration within larger applications.
    Downloads: 22 This Week
    Last Update:
    See Project
  • 12
    R Hadoop for Big Data

    R Hadoop for Big Data

    Download Free Associated R open source script files for big data analy

    Download Free Associated R open source script files for big data analysis with Hadoop and R These are R script source file from Ram Venkat from a past Meetup we did at http://www.meetup.com/R-Matlab-Users/events/85160532/ Also, there is a long video and Powerpoint presentation slide PDF with R files at: http://quantlabs.net/blog/2012/11/how-to-use-hadoop-and-r-for-big-data-parallel-processing-free-download-pdf/ Download source files from http://quantlabs.net/blog/2012/11/download-free...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13

    oozie-workflow-checker

    Validation of complex Apache Oozie Hadoop workflow

    Library validated complex Oozie workflows (http://oozie.apache.org/). Two usage scenarios: 1) Execute workflow with specified parameters, and as result get list of passed nodes. Sample in WorkflowDirProcessorIntegrationTest Note: from all workflow functions only "wf:conf" is supported now. 2) Check called actions exists or build full call tree in xml format Sample in OozieWorkflowCheckerTest: You can override properties from "config-default.xml" and "job.properties" by file...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14

    ForeIndex

    Distributed Index with Apache Hadoop, Apache Lucene and Apache Tika

    This is a distributed index framework using Apache Hadoop, Apache Lucene and Apache Tika, to index large volume of data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    This software allows for a user to generate test data. This is useful for testing Hadoop or other data processing clusters.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Framework for development of simple evolutionary algorithms / island models programs in distributed environment using MapReduce programming model based on hadoop.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next