From: Dominic W. <wi...@ma...> - 2005-11-09 16:53:35
|
Dear Montse, I am so sorry to be so long in replying, I have been travelling and have been very busy at work and at home recently. I don't know if there is a biggest appropriate number of files for multifile corpora on different platforms. What I have done in the past to get similar behaviour is to use a single corpus file split into different documents, i.e. <DOC> <TEXT> This is a sentence from the BNC. </TEXT> </DOC> Pretty ugly, I'll grant you, bad case of markup making the corpus unnecessarily bigger. Also, there were problems if we ever tried to put 2 tags on the same line, which I guess is just a C parsing issue that we could fix if we tracked it down. Sorry I can't respond to your actual question at the moment, but if you want a hack to do much the same thing, this should work. Best wishes, Dominic On Oct 31, 2005, at 11:20 AM, Montse Cuadros wrote: > Dear All, > > I'm trying to build a model for BNC corpus but instead of building > for document, I want to do it for sentence. I have all the corpus > separately sentence-by-sentence in files and then when trying to > construct the model, it fails showing me this error: > > > Allocating filename memory: Cannot allocate memory > Couldn't initialize tokenizer. > make: *** [/corpus/models//BNC_SENTENCE/wordlist] Error 1 > > The file directory, is huge, now contains 5.009.088 files and I don't > know if there is any problem because of the amount of files, and/or > just because they contain a very few data and doesn't make sense to > construct such a model or just because of the default parametres. > > Thanks in advance, > > Bests, > > Montse > > > ------------------------------------------------------- > This SF.Net email is sponsored by the JBoss Inc. > Get Certified Today * Register for a JBoss Training Course > Free Certification Exam for All Training Attendees Through End of 2005 > Visit http://www.jboss.com/services/certification for more information > _______________________________________________ > infomap-nlp-users mailing list > inf...@li... > https://lists.sourceforge.net/lists/listinfo/infomap-nlp-users > |