DocFetcher / Forum / General Discussion: [code] Support for maff format (Mozilla archive format) 1/3

this is a code contribution to support the maff (Mozilla archive format)

https://en.wikipedia.org/wiki/Mozilla_Archive_Format

maff format is a zip file in which the web page contents is saved within the zip file.

http://maf.mozdev.org/maff-specification.html

There are a couple of files edited:

this is a the codes for the parser itself.
src/net/sourceforge/docfetcher/model/parse/MaffParser.java

there are some edits in these files
src/net/sourceforge/docfetcher/enums/Msg.java

        filetype_html ("HTML (html, htm, ..)", Comments.filetype),
+       filetype_maf ("Mozilla Archive Format (maf, maff)", Comments.filetype),
        filetype_odt ("OpenOffice.org Writer (odt, ott)", Comments.filetype),

src/net/sourceforge/docfetcher/model/parse/ParseService.java

                new EpubParser(),
                new ChmParser(),
+               new MaffParser(),

                new OpenOfficeWriterParser(),
                new OpenOfficeCalcParser(),

src/net/sourceforge/docfetcher/model/parse/ParseServiceTest.java

public final class ParseServiceTest {
                .add(TestFiles.lorem_ipsum_abw_gz.get(), new AbiWordParser())
                .add(TestFiles.lorem_ipsum_docx.get(), new MSWord2007Parser())
                .add(TestFiles.lorem_ipsum_html.get(), new HtmlParser())
+               .add(TestFiles.lorem_ipsum_maf.get(), new MaffParser())
                .add(TestFiles.lorem_ipsum_odt.get(), new OpenOfficeWriterParser())
                .add(TestFiles.lorem_ipsum_pdf.get(), new PdfParser())
                .add(TestFiles.lorem_ipsum_rtf.get(), new RtfParser())


                       MSOffice2007Parser.MSExcel2007Parser.class,
                        MSOffice2007Parser.MSPowerPoint2007Parser.class,
                        MSOffice2007Parser.MSWord2007Parser.class,
+                       MaffParser.class,
                        OpenOfficeParser.OpenOfficeCalcParser.class,
                        OpenOfficeParser.OpenOfficeDrawParser.class,
                       OpenOfficeParser.OpenOfficeImpressParser.class

                        MSOffice2007Parser.MSExcel2007Parser.class,
                        MSOffice2007Parser.MSPowerPoint2007Parser.class,
+                       MaffParser.class,
                        OpenOfficeParser.OpenOfficeCalcParser.class,
                        OpenOfficeParser.OpenOfficeDrawParser.class,

^ i'm having to add MaffParser as a test case for mime type overlap as maff is a zip application type and has a mime type of html. it seem doc fetcher is able to still detect them correctly based on the file name extension ".maf" or ".maff"

an example test file and updating the reference
dev/test-files/lorem-ipsum/lorem-ipsum.maff
src/net/sourceforge/docfetcher/TestFiles.java

the source codes is in the zip archive attached. this is the only post as pior i'm not aware of the attachment feature. so there isn't any follow-up 2/3 and 3/3 separate threads of this thread.

Last edit: andrew goh 2020-02-27

maffparser.zip

[code] Support for maff format (Mozilla archive format) 1/3

Desktop search application

Forums

Help

[code] Support for maff format (Mozilla archive format) 1/3

[code] Support for maff format (Mozilla archive format) 1/3

Desktop search application

Forums

Help

[code] Support for maff format (Mozilla archive format) 1/3 document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

[code] Support for maff format (Mozilla archive format) 1/3