try to increase the java heap memory like -Xmx2048m On 14/9/20 10:19 pm, Dennis Sizonenko wrote: Hi. Trying to index .rtf files folder by folder. When it comes up to 700+ files, the software stops working. Sometimes I receive a note "not enoght memory". Have to make force close of the software. Tried on anothe PC, same thing. What can be done here? Thanks. can't index more then 740 files https://sourceforge.net/p/docfetcher/discussion/702424/thread/57fcbe3723/?limit=25#1882 Sent from sourceforge.net...
no worries, no hurry. but those who needs it can use the codes themselves this solved a difficult problem for me, each maf zip archive is treated as a file rather than a folder instead of indexing the internal contents which will always have a file name of index.html.
this is a code contribution to support the maff (Mozilla archive format) https://en.wikipedia.org/wiki/Mozilla_Archive_Format maff format is a zip file in which the web page contents is saved within the zip file. http://maf.mozdev.org/maff-specification.html the main parser codes: src/net/sourceforge/docfetcher/model/parse/MaffParser.java package net.sourceforge.docfetcher.model.parse; import java.io.File; import java.io.IOException; import java.io.InputStream; import java.util.Arrays; import java.util.Collection;...
i run it to index a mix of a significant volume of files, so i'd guess it should work rather well. maff is still supported with this browser extension, there are other extensions supporting it as well https://github.com/danny0838/webscrapbook
i run it to index a mix of several GB of files, so i'd guess it should work rather well. maff is still supported with this browser extension, there are other extensions supporting it as well https://github.com/danny0838/webscrapbook
Support for maff format (Mozilla archive format