Aspose for Hadoop Code
This project holds source code for Aspose for Hadoop project.
Brought to you by:
asposemp
File | Date | Author | Commit |
---|---|---|---|
core | 2013-11-11 |
![]() |
[0c59c9] Committing first version of Aspose for Hadoop p... |
README.md | 2013-11-11 |
![]() |
[0c59c9] Committing first version of Aspose for Hadoop p... |
Aspose for Hadoop project enables Hadoop developers to work with binary file formats. The Hadoop / MR developers can use this project to create and convert binary sequence files into text sequence files. The text can then be used for analysis purpose in MapReduce algorithms.
com.aspose.hadoop.core
Provides Aspose for Java wrapper classes to parse binary formats into text. The package also includes a couple of classes to override Hadoop input formats so as to be used for creating binary sequence files.
com.aspose.hadoop.examples
Provides mapper examples for converting binary sequence file(s) into text sequence file(s). Each mapper example takes a particular set of binary format as exaplained in the next section.
CreateBinarySequence
Picks up the set of files from an HDFS directory, create binary sequence file(s) and stores the binary sequence file(s) to an HDFS directory.
Usage: [HADOOP_HOME]$ bin/hadoop jar aspose-hadoop.jar com.aspose.hadoop.examples.CreateBinarySequence <HDFS input directory> <HDFS output directory>
CreateDocumentTextSequence
Picks up binary sequence file(s) generated by documents (MS Words / OpenOffice docs) from an input HDFS directory, parses text from the documents, creates text sequence(s) file to be stored on an output HDFS directory.
Usage: [Hadoop_HOME]$ bin/hadoop jar aspose-hadoop.jar com.aspose.hadoop.examples.CreateDocumentTextSequence <HDFS input directory> <HDFS output directory>
Tip: Put your documents in an HDFS directory, use CreateBinarySequence mapper to generate binary sequence file(s). Finally supply output directory of the mapper as an input here.
CreateSpreadSheetTextSequence
Picks up binary sequence file(s) generated by spreadsheets (MS Excel / OpenOffice spreadsheets) from an input HDFS directory, parses text from the spreadsheets, creates text sequence file(s) to be stored on an output HDFS directory.
Usage: [Hadoop_HOME]$ bin/hadoop jar aspose-hadoop.jar com.aspose.hadoop.examples.CreateSpreadSheetTextSequence <HDFS input directory> <HDFS output directory>
Tip: Put your spreadsheets in an HDFS directory, use CreateBinarySequence mapper to generate binary sequence file(s). Finally supply output directory of the mapper as an input here.
CreatePresentationTextSequence
Picks up binary sequence file(s) generated by presentations (MS PowerPoint PPTX presentations) from an input HDFS directory, parses text from the presentations, creates text sequence file(s) to be stored on an output HDFS directory.
Usage: [Hadoop_HOME]$ bin/hadoop jar aspose-hadoop.jar com.aspose.hadoop.examples.CreatePresentationTextSequence <HDFS input directory> <HDFS output directory>
Tip: Put your PPTX presentations in an HDFS directory, use CreateBinarySequence mapper to generate binary sequence file(s). Finally supply output directory of the mapper as an input here.
CreateEmailTextSequence
Picks up binary sequence file(s) generated by emails (msg emails) from an input HDFS directory, parses text from the msg files, creates text sequence file(s) to be stored on an output HDFS directory.
Usage: [Hadoop_HOME]$ bin/hadoop jar Aspose-hadoop.jar com.aspose.hadoop.examples.CreateEmailTextSequence <HDFS input directory> <HDFS output directory>
Tip: Put your msg files in an HDFS directory, use CreateBinarySequence mapper to generate binary sequence file(s). Finally supply output directory of the mapper as an input here.
Aspose Pty Ltd: www.aspose.com