Dear Montse,
I am so sorry to be so long in replying, I have been travelling and
have been very busy at work and at home recently.
I don't know if there is a biggest appropriate number of files for
multifile corpora on different platforms. What I have done in the past
to get similar behaviour is to use a single corpus file split into
different documents, i.e.
<DOC>
<TEXT>
This is a sentence from the BNC.
</TEXT>
</DOC>
Pretty ugly, I'll grant you, bad case of markup making the corpus
unnecessarily bigger. Also, there were problems if we ever tried to put
2 tags on the same line, which I guess is just a C parsing issue that
we could fix if we tracked it down.
Sorry I can't respond to your actual question at the moment, but if you
want a hack to do much the same thing, this should work.
Best wishes,
Dominic
On Oct 31, 2005, at 11:20 AM, Montse Cuadros wrote:
> Dear All,
>
> I'm trying to build a model for BNC corpus but instead of building
> for document, I want to do it for sentence. I have all the corpus
> separately sentence-by-sentence in files and then when trying to
> construct the model, it fails showing me this error:
>
>
> Allocating filename memory: Cannot allocate memory
> Couldn't initialize tokenizer.
> make: *** [/corpus/models//BNC_SENTENCE/wordlist] Error 1
>
> The file directory, is huge, now contains 5.009.088 files and I don't
> know if there is any problem because of the amount of files, and/or
> just because they contain a very few data and doesn't make sense to
> construct such a model or just because of the default parametres.
>
> Thanks in advance,
>
> Bests,
>
> Montse
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by the JBoss Inc.
> Get Certified Today * Register for a JBoss Training Course
> Free Certification Exam for All Training Attendees Through End of 2005
> Visit http://www.jboss.com/services/certification for more information
> _______________________________________________
> infomap-nlp-users mailing list
> inf...@li...
> https://lists.sourceforge.net/lists/listinfo/infomap-nlp-users
>
|