Unusual Compute Errors
Message boards :
Questions/Problems/Bugs :
Unusual Compute Errors
Message board moderation
Author | Message |
---|---|
Send message Joined: 26 Jun 08 Posts: 645 Credit: 472,400,038 RAC: 255,740 |
The skewness error was due to bad parameters on some of the workunits. Those have been corrected. The asm_modinv32 error is simply a sign that we are pushing the limits of the Windows sieve binary. I will investigate further in the coming weeks. |
Send message Joined: 16 Sep 09 Posts: 6 Credit: 618,986 RAC: 0 |
|
Send message Joined: 11 Dec 12 Posts: 17 Credit: 23,645,935 RAC: 8,479 |
Client state Compute error Exit status 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED |
Send message Joined: 26 Jun 08 Posts: 645 Credit: 472,400,038 RAC: 255,740 |
Client state Compute error That's one I see only rarely. It indicates that BOINC estimated the completion time of the workunit to be much later than the deadline. This perhaps could happen if BOINC is running in a VM that's getting very little CPU time at the beginning of the workunit. If that's not the case and the error persists, try resetting the project. |
Send message Joined: 5 Jun 10 Posts: 1 Credit: 1,051,446 RAC: 0 |
I'll try the reset on the project. However I have just completed 8 "15e Lattice Sieve v1.08 (notphenomiix6)" tasks 4 completed and validated 4 exited with error 197. Debugging info adds that an "unhandled exception" occurred as well. These all finished on Jan 23 2014 and all had the same deadline of Jan 27 2014 31785400 28310074 267598 23 Jan 2014, 22:28:08 UTC 24 Jan 2014, 7:27:03 UTC Error while computing 15,195.74 1,528.27 --- 15e Lattice Sieve v1.08 (notphenomiix6) 31785397 28310071 267598 23 Jan 2014, 22:28:08 UTC 24 Jan 2014, 7:27:03 UTC Completed and validated 11,040.94 1,854.83 44.00 15e Lattice Sieve v1.08 (notphenomiix6) 31785396 28310070 267598 23 Jan 2014, 22:28:08 UTC 24 Jan 2014, 7:27:03 UTC Error while computing 15,195.53 1,545.02 --- 15e Lattice Sieve v1.08 (notphenomiix6) 31785394 28310068 267598 23 Jan 2014, 22:28:08 UTC 24 Jan 2014, 7:27:03 UTC Completed and validated 11,122.22 1,868.94 44.00 15e Lattice Sieve v1.08 (notphenomiix6) 31785391 28310065 267598 23 Jan 2014, 22:28:08 UTC 24 Jan 2014, 7:27:03 UTC Error while computing 15,195.29 1,512.05 --- 15e Lattice Sieve v1.08 (notphenomiix6) 31785389 28310063 267598 23 Jan 2014, 22:28:08 UTC 24 Jan 2014, 7:27:03 UTC Completed and validated 11,436.74 1,930.84 44.00 15e Lattice Sieve v1.08 (notphenomiix6) 31785385 28310059 267598 23 Jan 2014, 22:28:08 UTC 24 Jan 2014, 7:27:03 UTC Completed and validated 11,315.50 1,909.84 44.00 15e Lattice Sieve v1.08 (notphenomiix6) 31785379 28310053 267598 23 Jan 2014, 22:28:08 UTC 24 Jan 2014, 7:27:03 UTC Error while computing 15,195.96 1,556.77 --- 15e Lattice Sieve v1.08 (notphenomiix6) |
Send message Joined: 13 Aug 13 Posts: 9 Credit: 2,500,034 RAC: 0 |
Client state Compute error No, this is not the reason for this error, it is not deadline related. There is a value RSC_FPOPS_BOUND configured for the work units. If the used FPOps pass this FPOps limit, the core client aborts the workunit. Afaik. the values are converted into CPU time, as the core client does not actually know the used-up FPOps. I.e., when the result starts, the RSC_FPOPS_BOUNC value together with the benchmark results lead to a maximum CPU time that the core client will allow for that workunit. So, if the computer runs the BOINC benchmark on a turbo core but the calculation runs not in turbo mode OR if part of the calculation ran in a clocked-down (powersaving) mode, there is a high risk of violating the FPOps limit. Increase RSC_FPOPS_BOUND if you want to eliminate those errors - it isn't actually thought to limit the execution time for a "healthy" result, it is just thought to catch applications stuck in endless loops. Set it much higher than RSC_FPOPS_EST, at least 5 times as high or even 10 times. |
Send message Joined: 13 Nov 09 Posts: 1 Credit: 7,073,614 RAC: 3 |
I've had an error "TD sieve on side 0: 1626047 does not divide" on this task. Looks like it's happening on windows only as it takes a Linux host to complete it as here. Seen quite a few of these around. I got this error too http://escatter11.fullerton.edu/nfs/result.php?resultid=32583006 http://escatter11.fullerton.edu/nfs/result.php?resultid=32582970 http://escatter11.fullerton.edu/nfs/result.php?resultid=32582964 http://escatter11.fullerton.edu/nfs/result.php?resultid=32582963 An other error on an other machine http://escatter11.fullerton.edu/nfs/result.php?resultid=32583060 |
Send message Joined: 13 Aug 13 Posts: 9 Credit: 2,500,034 RAC: 0 |
This one (your 32582964) has been completed by a Win x64 machine, so it is not necessarily a Linux vs. Windows thing. |
Send message Joined: 26 Sep 09 Posts: 218 Credit: 22,841,893 RAC: 3 |
I suppose you don't have enough memory available to run the 16e wu's. You need at least 1.5GB/thread. Although you are running a mix of 14e, 15e and 16e, imagine if, for example, your 12 core i7-4930K tries to run all 16e wu's. 16 GB is not enough. On this particular machine you should set to only run the 14e and the 15e wu's. Carlos |
Send message Joined: 11 Dec 12 Posts: 17 Credit: 23,645,935 RAC: 8,479 |
Client state Compute error Can it have to do with my GPUs running OpenCL applications in the meantime? I see the computation times of other CPU projects increase dramatically when running e.g. Collatz on my GPUs. |