Menu

Tree [0c59c9] master /
 History

HTTPS access


File Date Author Commit
 core 2013-11-11 asposemarketplace asposemarketplace [0c59c9] Committing first version of Aspose for Hadoop p...
 README.md 2013-11-11 asposemarketplace asposemarketplace [0c59c9] Committing first version of Aspose for Hadoop p...

Read Me

Aspose For Hadoop

Aspose for Hadoop project enables Hadoop developers to work with binary file formats. The Hadoop / MR developers can use this project to create and convert binary sequence files into text sequence files. The text can then be used for analysis purpose in MapReduce algorithms.

Packages

com.aspose.hadoop.core

  Provides Aspose for Java wrapper classes to parse binary formats into text. The package also includes a couple of classes to override Hadoop input formats so as to be used for creating binary sequence files.

com.aspose.hadoop.examples

  Provides mapper examples for converting binary sequence file(s) into text sequence file(s). Each mapper example takes a particular set of binary format as exaplained in the next section.

Mapper Examples Flow and Usage

CreateBinarySequence

  Picks up the set of files from an HDFS directory, create binary sequence file(s) and stores the binary sequence file(s) to an HDFS directory.
  Usage: [HADOOP_HOME]$ bin/hadoop jar aspose-hadoop.jar com.aspose.hadoop.examples.CreateBinarySequence <HDFS input directory> <HDFS output directory>

CreateDocumentTextSequence

  Picks up binary sequence file(s) generated by documents (MS Words / OpenOffice docs) from an input HDFS directory, parses text from the documents, creates text sequence(s) file to be stored on an output HDFS directory.
  Usage: [Hadoop_HOME]$ bin/hadoop jar aspose-hadoop.jar com.aspose.hadoop.examples.CreateDocumentTextSequence <HDFS input directory> <HDFS output directory>
  Tip: Put your documents in an HDFS directory, use CreateBinarySequence mapper to generate binary sequence file(s). Finally supply output directory of the mapper as an input here.

CreateSpreadSheetTextSequence

  Picks up binary sequence file(s) generated by spreadsheets (MS Excel / OpenOffice spreadsheets) from an input HDFS directory, parses text from the spreadsheets, creates text sequence file(s) to be stored on an output HDFS directory.
  Usage: [Hadoop_HOME]$ bin/hadoop jar aspose-hadoop.jar com.aspose.hadoop.examples.CreateSpreadSheetTextSequence <HDFS input directory> <HDFS output directory>
  Tip: Put your spreadsheets in an HDFS directory, use CreateBinarySequence mapper to generate binary sequence file(s). Finally supply output directory of the mapper as an input here.

CreatePresentationTextSequence

  Picks up binary sequence file(s) generated by presentations (MS PowerPoint PPTX presentations) from an input HDFS directory, parses text from the presentations, creates text sequence file(s) to be stored on an output HDFS directory.
  Usage: [Hadoop_HOME]$ bin/hadoop jar aspose-hadoop.jar com.aspose.hadoop.examples.CreatePresentationTextSequence <HDFS input directory> <HDFS output directory>
  Tip: Put your PPTX presentations in an HDFS directory, use CreateBinarySequence mapper to generate binary sequence file(s). Finally supply output directory of the mapper as an input here.

CreateEmailTextSequence

  Picks up binary sequence file(s) generated by emails (msg emails) from an input HDFS directory, parses text from the msg files, creates text sequence file(s) to be stored on an output HDFS directory.
  Usage: [Hadoop_HOME]$ bin/hadoop jar Aspose-hadoop.jar com.aspose.hadoop.examples.CreateEmailTextSequence <HDFS input directory> <HDFS output directory>
  Tip: Put your msg files in an HDFS directory, use CreateBinarySequence mapper to generate binary sequence file(s). Finally supply output directory of the mapper as an input here.

Aspose Pty Ltd: www.aspose.com

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.