log in

16e tasks frequently fail for me, with Computation Error - across multiple PCs and OSes

Message boards : Questions/Problems/Bugs : 16e tasks frequently fail for me, with Computation Error - across multiple PCs and OSes
Message board moderation

To post messages, you must log in.

AuthorMessage
infinitejones

Send message
Joined: 10 May 21
Posts: 2
Credit: 0
RAC: 0
Message 2156 - Posted: 10 May 2021, 23:35:29 UTC
Last modified: 10 May 2021, 23:36:21 UTC

I run NFS on a number of different machines, primarily MacOS and Linux. I've been seeing a high number of failing tasks on all of them, particular 16e tasks - "Computation Error" within the first couple of minutes.

I've seen a couple of other posts about this in this forum, but no real indication of what's going on or how to prevent it. Given that I'm seeing it across multiple machines and OSes, I'm fairly confident it's not environment-specific on my side.

Here's an example from today - Boinc 7.16.14 on MacOS 10.15.7, on a Mac mini. Updates to Boinc and MacOS don't seem to have any impact on the volume or frequency of the failing tasks.



In the time it's taken me to write this post, the single non-failed task has progressed to 12.5% quite happily. I do run other projects on all machines where I'm seeing this, and I don't see frequently failing tasks for any of those.

For what it's worth, the other machines where I see frequent failures have been running either Ubuntu or Debian (latest versions of either, plus latest Boinc for those OSes). I don't have NFS running on any of them currently, though, because of the high volume of failed tasks.

What can I do to investigate and resolve these problems?

Edit: the img tag of my screenshot seems to be failing. Here's the URL:

https://imgur.com/a/43Pi8su
ID: 2156 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greg
Project administrator

Send message
Joined: 26 Jun 08
Posts: 645
Credit: 473,008,988
RAC: 273,781
Message 2157 - Posted: 11 May 2021, 4:50:18 UTC - in response to Message 2156.  

Those were run on Mac and ended with a segmentation fault. I typically see a bit higher error rates on the Mac app, especially now that we are running a more difficult quartic, but the project-wide Mac error rates are ok. I'm not sure why that host is unhappy. Perhaps you could try one of the other apps with that host to see if this particular number is causing the issue?
ID: 2157 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
infinitejones

Send message
Joined: 10 May 21
Posts: 2
Credit: 0
RAC: 0
Message 2158 - Posted: 11 May 2021, 5:08:27 UTC - in response to Message 2157.  

OK, thanks v much.

I'm running under an Account Manager so I can't directly choose which NFS apps I run, but I'll check whether the person who runs the pool can adjust it for us.
ID: 2158 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gemini8

Send message
Joined: 6 May 16
Posts: 5
Credit: 9,116,040
RAC: 4,411
Message 2185 - Posted: 17 May 2021, 6:48:04 UTC

Hi Greg.
I think I'm encountering something similar.
This is on my latest and Mac and OS (MacBook Air 2017 running Catalina) which hasn't done NFS work before.
But another Mac running Macintosh OS 10.14.6 build 18G87 might have the same issue if I can jugde from just one task that it sent back for now.
I think in the past my Macs had no such problems with NFS, as they have credit.

Two workunits might make it through, as they are running further, while another one tells me it's waiting for memory.
I suppose RAM limitation might be the key to this problem.
Please have a look at the stderr:
<core_client_version>7.14.4</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)</message>
<stderr_txt>
boinc initialized
work files resolved, now working
-> lasieve5f_1.11_x86_64-apple-darwin
-> -r
-> -f
-> 2102270000
-> -c
-> 2000
-> -R
-> -o
-> ../../projects/escatter11.fullerton.edu_nfs/S2L2162_2102270_0_r21334165_0
-> ../../projects/escatter11.fullerton.edu_nfs/S2L2162.poly
SIGSEGV: segmentation violation

Crashed executable name: lasieve5f_1.11_x86_64-apple-darwin
built using BOINC library version 7.5.0
Machine type Intel x86-64h Haswell (64-bit executable)
System version: Macintosh OS 10.15.7 build 19H1030
Mon May 17 07:11:05 2021

atos cannot load symbols for the file lasieve5f_1.11_x86_64-apple-darwin for architecture x86_64.
0   lasieve5f_1.11_x86_64-apple-darwin  0x000000010007d21c  
SIGPIPE: write on a pipe with no reader
1   lasieve5f_1.11_x86_64-apple-darwin  0x0000000100071ad7  
SIGPIPE: write on a pipe with no reader
2   libsystem_platform.dylib            0x00007fff6954a5fd  

Thread 0 crashed with X86 Thread State (64-bit):
  rax: 0x0100001f  rbx: 0x00000003  rcx: 0x7ffeefbfc228  rdx: 0x00000028
  rdi: 0x7ffeefbfc298  rsi: 0x00000003  rbp: 0x7ffeefbfc280  rsp: 0x7ffeefbfc228
   r8: 0x00000607   r9: 0x00000000  r10: 0x000009c8  r11: 0x00000206
  r12: 0x00000003  r13: 0x000009c8  r14: 0x7ffeefbfc298  r15: 0x00000028
  rip: 0x7fff69492dfa  rfl: 0x00000206

Binary Images Description:
       0x100000000 -        0x10009bfff /Library/Application Support/BOINC Data/slots/5/../../projects/escatter11.fullerton.edu_nfs/lasieve5f_1.11_x86_64-apple-darwin
    0x7fff66336000 -     0x7fff66337fff /usr/lib/libSystem.B.dylib
    0x7fff6661c000 -     0x7fff6666efff /usr/lib/libc++.1.dylib
    0x7fff6666f000 -     0x7fff66684fff /usr/lib/libc++abi.dylib
    0x7fff68196000 -     0x7fff681c9fff /usr/lib/libobjc.A.dylib
    0x7fff6861f000 -     0x7fff68669fff /usr/lib/libstdc++.6.dylib
    0x7fff69133000 -     0x7fff69138fff /usr/lib/system/libcache.dylib
    0x7fff69139000 -     0x7fff69144fff /usr/lib/system/libcommonCrypto.dylib
    0x7fff69145000 -     0x7fff6914cfff /usr/lib/system/libcompiler_rt.dylib
    0x7fff6914d000 -     0x7fff69156fff /usr/lib/system/libcopyfile.dylib
    0x7fff69157000 -     0x7fff691e9fff /usr/lib/system/libcorecrypto.dylib
    0x7fff692f6000 -     0x7fff69336fff /usr/lib/system/libdispatch.dylib
    0x7fff69337000 -     0x7fff6936dfff /usr/lib/system/libdyld.dylib
    0x7fff6936e000 -     0x7fff6936efff /usr/lib/system/libkeymgr.dylib
    0x7fff6937c000 -     0x7fff6937cfff /usr/lib/system/liblaunch.dylib
    0x7fff6937d000 -     0x7fff69382fff /usr/lib/system/libmacho.dylib
    0x7fff69383000 -     0x7fff69385fff /usr/lib/system/libquarantine.dylib
    0x7fff69386000 -     0x7fff69387fff /usr/lib/system/libremovefile.dylib
    0x7fff69388000 -     0x7fff6939ffff /usr/lib/system/libsystem_asl.dylib
    0x7fff693a0000 -     0x7fff693a0fff /usr/lib/system/libsystem_blocks.dylib
    0x7fff693a1000 -     0x7fff69428fff /usr/lib/system/libsystem_c.dylib
    0x7fff69429000 -     0x7fff6942cfff /usr/lib/system/libsystem_configuration.dylib
    0x7fff6942d000 -     0x7fff69430fff /usr/lib/system/libsystem_coreservices.dylib
    0x7fff69431000 -     0x7fff69439fff /usr/lib/system/libsystem_darwin.dylib
    0x7fff6943a000 -     0x7fff69441fff /usr/lib/system/libsystem_dnssd.dylib
    0x7fff69442000 -     0x7fff69443fff /usr/lib/system/libsystem_featureflags.dylib
    0x7fff69444000 -     0x7fff69491fff /usr/lib/system/libsystem_info.dylib
    0x7fff69492000 -     0x7fff694befff /usr/lib/system/libsystem_kernel.dylib
    0x7fff694bf000 -     0x7fff69506fff /usr/lib/system/libsystem_m.dylib
    0x7fff69507000 -     0x7fff6952efff /usr/lib/system/libsystem_malloc.dylib
    0x7fff6952f000 -     0x7fff6953cfff /usr/lib/system/libsystem_networkextension.dylib
    0x7fff6953d000 -     0x7fff69546fff /usr/lib/system/libsystem_notify.dylib
    0x7fff69547000 -     0x7fff6954ffff /usr/lib/system/libsystem_platform.dylib
    0x7fff69550000 -     0x7fff6955afff /usr/lib/system/libsystem_pthread.dylib
    0x7fff6955b000 -     0x7fff6955ffff /usr/lib/system/libsystem_sandbox.dylib
    0x7fff69560000 -     0x7fff69562fff /usr/lib/system/libsystem_secinit.dylib
    0x7fff69563000 -     0x7fff6956afff /usr/lib/system/libsystem_symptoms.dylib
    0x7fff6956b000 -     0x7fff69581fff /usr/lib/system/libsystem_trace.dylib
    0x7fff69583000 -     0x7fff69588fff /usr/lib/system/libunwind.dylib
    0x7fff69589000 -     0x7fff695befff /usr/lib/system/libxpc.dylib


Exiting...

</stderr_txt>
]]>

For now I only have hands-on access to the MacBook as I'm not at home, but I can have a look later-on.
Thanks for your time!
- - - - - - - - - -
Greetings, Jens
ID: 2185 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gemini8

Send message
Joined: 6 May 16
Posts: 5
Credit: 9,116,040
RAC: 4,411
Message 2190 - Posted: 18 May 2021, 4:23:55 UTC - in response to Message 2185.  

Short addition:
This doesn't look right for me:
Gültig (17) · Ungültig (0) · Fehler (85)

(valid, invalid, errors).
- - - - - - - - - -
Greetings, Jens
ID: 2190 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Questions/Problems/Bugs : 16e tasks frequently fail for me, with Computation Error - across multiple PCs and OSes


Home | My Account | Message Boards