log in

Occasional Computation Errors?

Message boards : Questions/Problems/Bugs : Occasional Computation Errors?
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile denim
Avatar

Send message
Joined: 19 Sep 09
Posts: 17
Credit: 17,974,133
RAC: 0
Message 281 - Posted: 12 Dec 2009, 1:21:02 UTC

Is anyone else getting these from time to time? Say a few a day?
ID: 281 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greg
Project administrator

Send message
Joined: 26 Jun 08
Posts: 640
Credit: 433,290,478
RAC: 331,269
Message 282 - Posted: 12 Dec 2009, 3:43:24 UTC - in response to Message 281.  

There are two primary sources of occasional errors:

xmalloc: Cannot allocate memory
This indicates that when the workunit started or restarted, there wasn't quite enough contiguous memory available to run it. This happens most often on Windows computers with the lasievef application while you are actively using other programs on your computer. If this is happening quite often, you may want to disable the lasievef application in your NFS@Home preferences.

Special q # does not divide
This happens on restart of a workunit if the checkpoint was not properly written when the calculation was paused. This can happen if some other file is being written at the same time, or if an antivirus or defragmenting program intervenes. There isn't really anything you can do to prevent it, but fortunately it is rare.

Greg
ID: 282 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>WildWildWest] lamoule

Send message
Joined: 14 Dec 09
Posts: 2
Credit: 7,385,736
RAC: 0
Message 285 - Posted: 14 Dec 2009, 20:52:54 UTC

I have found a bug : every time i suspend the project in order to let other projects wus finish, when i resume a lasievef wu it goes in error. I precise that my computer in not doing anything else when resuming.

computers : ID: 4492 and ID: 4491 (dedicated to boinc)
ID: 285 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greg
Project administrator

Send message
Joined: 26 Jun 08
Posts: 640
Credit: 433,290,478
RAC: 331,269
Message 286 - Posted: 15 Dec 2009, 4:18:50 UTC - in response to Message 285.  

They appear to be xmalloc errors, which are really errors from the operating system indicating that the requested memory could not be allocated. Are all of the computers on which this is happening Windows computers? Windows seems to have more trouble allocating large blocks of memory.

For example, this is a workunit that restarted successfully twice, then failed on the third restart, whereas this workunit succeeded on every restart.
ID: 286 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>WildWildWest] lamoule

Send message
Joined: 14 Dec 09
Posts: 2
Credit: 7,385,736
RAC: 0
Message 288 - Posted: 15 Dec 2009, 15:58:52 UTC - in response to Message 286.  

Understood.

I didn't pay attention when i first attached the project of the 3 kind of WU's.

The xmalloc problem was on a W2K3
GenuineIntel
Intel(R) Xeon(R) CPU E5504 @ 2.00GHz [x86 Family 6 Model 26 Stepping 5]
(8 processors)

with only 4g of ram.

Everything crashes when i got 8 lasievef units working at the same time : windows swapping, units crashing.

maybe unticking the checkbox of the lasievef wus could prevent that kind of trouble for those who like me join a project using an account manager (BAM).

Very good job anyway and thanks for your quick answer
ID: 288 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Questions/Problems/Bugs : Occasional Computation Errors?


Home | My Account | Message Boards