Hi Keita,
I agree. It's most likely running out of space in node-local storage.
In fact, I think the error is coming from the test_api code which only
writes to node local storage.
Is it failing on the first checkpoint?
Do you have SCR configured to write to /tmp, and if so, can you check
the size of /tmp on your compute nodes? For example, login into a
compute node and run "df /tmp".
-Adam
Kathryn Mohror wrote:
>Hi Keita,
>
>Sorry for the delay. I am not sure why that would be happening. Adam do you have an idea?
>
>It might that you are running short of node local storage. SCR will first write to node-local storage and then to the PFS. However 8 MB seems a bit small to be causing that problem. Hopefully Adam will have an idea.
>
>Kathryn
>
>On Apr 21, 2014, at 4:18 PM, Teranishi, Keita <kn...@sa...> wrote:
>
>
>
>>Hi,
>>
>>I am still playing with test_api in example directory and found the code
>>throws an error when I set file size bigger than 8 Mbytes. I¹d like to
>>know (1) what is the root cause of this error and (2) any possible ways to
>>mitigate this problem.
>>
>>I set SCR_PREFIX to be a scratch space (1 Pbytes) in the lustre file
>>system connected to the PC cluster. The rest of the parameters should be
>>set to the default.
>>
>>1 on chama33: ERROR: Error writing: write(12, 0x2aaaba61000a, 13443078)
>>errno=28 No space left on device @ test_common.c:86
>>
>>Thanks,
>>---------------------------------------------------------------------------
>>--
>>Keita Teranishi
>>Principal Member of Technical Staff
>>Scalable Modeling and Analysis Systems
>>Sandia National Laboratories
>>Livermore, CA 94551
>>+1 (925) 294-3738
>>
>>
>>------------------------------------------------------------------------------
>>Start Your Social Network Today - Download eXo Platform
>>Build your Enterprise Intranet with eXo Platform Software
>>Java Based Open Source Intranet - Social, Extensible, Cloud Ready
>>Get Started Now And Turn Your Intranet Into A Collaboration Platform
>>http://p.sf.net/sfu/ExoPlatform
>>_______________________________________________
>>Scalablecr-discuss mailing list
>>Sca...@li...
>>https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss
>>
>>
>
>_________________________________________________________________
>Kathryn Mohror, ka...@ll..., http://scalability.llnl.gov/
>Scalability Team @ Lawrence Livermore National Laboratory, Livermore, CA, USA
>
>
>
>
>
>
>
>
>
>
>
>------------------------------------------------------------------------
>
>------------------------------------------------------------------------------
>Start Your Social Network Today - Download eXo Platform
>Build your Enterprise Intranet with eXo Platform Software
>Java Based Open Source Intranet - Social, Extensible, Cloud Ready
>Get Started Now And Turn Your Intranet Into A Collaboration Platform
>http://p.sf.net/sfu/ExoPlatform
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Scalablecr-discuss mailing list
>Sca...@li...
>https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss
>
>
|