Menu

Failed in plotting with R "Simple error: cannot allocate vector size 4.0Go

Anonymous
2016-12-26
2016-12-27
  • Anonymous

    Anonymous - 2016-12-26

    Hello, First, thank you for Khcoder,
    I have an error that appears using Cluster Analysis of Documents and I can not fix it by configuring the environment variables R (R_MAX_MEM_SIZE, R_VSIZE),
    I have 7 GB of RAM. How can I fix the problem?
    Thank you for your help,

     
  • HIGUCHI Koichi

    HIGUCHI Koichi - 2016-12-26

    Hello,

    Well, it seems that I need some more info to answer your question.

    How many documents are you dealing with? And how many words do you use for that analysis?

    Also, open “task manager” and watch the memory usage when you run the analysis. R uses 6GB or 7GB and then dies? Or it uses only 2GB or something?

     

    Last edit: HIGUCHI Koichi 2016-12-26
  • Anonymous

    Anonymous - 2016-12-26

    Thank you for your reply,
    I use a single document (314000 sentences, 6400000 token).
    I want to use Cluster Analysis Documents with 1000 tags and 50 clusters (Ward method).
    For the Task Manager, Rterm.exe stops at a consumption between 6 and 7Go.

     
  • HIGUCHI Koichi

    HIGUCHI Koichi - 2016-12-26

    Hello,

    It seems that R is actually running out of physical RAM. So, you have to (I) reduce the data size, (II) try a different clustering method, or (III) increase the physical RAM.

    (I) What is important here is the size of data matrix sent to R.

    [a] Number of rows = number of documents
    [b] Number of columns = number of distinctive words that are used for the analysis

    About the value of [a], if you select “Sentences” as the “Unit”, it will be 314000. But if you select “Paragraphs” as the “Unit”, the value of [a] will somewhat decrease. Or you can perform random sampling and compose the data file again to reduce this value.

    About the value of [b], you can check it as “Number of selected words” in the cluster analysis option screen. You can reduce the value by increasing "Min. TF" or “Min. DF”.

    (II) You can try CLARA as the clustering method. It stands for Clustering LARge Applications. You can choose it in the cluster analysis option screen.

    (III) R consume relatively large amount of RAM. So, you can add physical RAM to your system but I don’t think it helps a lot.

     

    Last edit: HIGUCHI Koichi 2016-12-26
  • Anonymous

    Anonymous - 2016-12-27

    Hello, I'm already working on a reduced random sample so I can not reduce it anymore.
    The use of CLARA has solved the memory problem :-).
    Now I have another problem but that deserves can be a new post.
    I use tags of several languages in order to have multi-language aggregates. The result dismisses Arabic, Jabonese, Chinese, ... and I do not understand why,
    I will analyze the distribution of these words and see if this is not a frequency problem otherwise I will make a separate post.
    Thanks a lot for your help,

     

Anonymous
Anonymous

Add attachments
Cancel





MongoDB Logo MongoDB