16e tasks frequently fail for me, with Computation Error - across multiple PCs and OSes
Message boards :
Questions/Problems/Bugs :
16e tasks frequently fail for me, with Computation Error - across multiple PCs and OSes
Message board moderation
Author | Message |
---|---|
Send message Joined: 10 May 21 Posts: 2 Credit: 0 RAC: 0 |
I run NFS on a number of different machines, primarily MacOS and Linux. I've been seeing a high number of failing tasks on all of them, particular 16e tasks - "Computation Error" within the first couple of minutes. I've seen a couple of other posts about this in this forum, but no real indication of what's going on or how to prevent it. Given that I'm seeing it across multiple machines and OSes, I'm fairly confident it's not environment-specific on my side. Here's an example from today - Boinc 7.16.14 on MacOS 10.15.7, on a Mac mini. Updates to Boinc and MacOS don't seem to have any impact on the volume or frequency of the failing tasks. In the time it's taken me to write this post, the single non-failed task has progressed to 12.5% quite happily. I do run other projects on all machines where I'm seeing this, and I don't see frequently failing tasks for any of those. For what it's worth, the other machines where I see frequent failures have been running either Ubuntu or Debian (latest versions of either, plus latest Boinc for those OSes). I don't have NFS running on any of them currently, though, because of the high volume of failed tasks. What can I do to investigate and resolve these problems? Edit: the img tag of my screenshot seems to be failing. Here's the URL: https://imgur.com/a/43Pi8su |
Send message Joined: 26 Jun 08 Posts: 645 Credit: 473,009,118 RAC: 261,026 |
Those were run on Mac and ended with a segmentation fault. I typically see a bit higher error rates on the Mac app, especially now that we are running a more difficult quartic, but the project-wide Mac error rates are ok. I'm not sure why that host is unhappy. Perhaps you could try one of the other apps with that host to see if this particular number is causing the issue? |
Send message Joined: 10 May 21 Posts: 2 Credit: 0 RAC: 0 |
OK, thanks v much. I'm running under an Account Manager so I can't directly choose which NFS apps I run, but I'll check whether the person who runs the pool can adjust it for us. |
Send message Joined: 6 May 16 Posts: 5 Credit: 9,117,188 RAC: 4,476 |
Hi Greg. I think I'm encountering something similar. This is on my latest and Mac and OS (MacBook Air 2017 running Catalina) which hasn't done NFS work before. But another Mac running Macintosh OS 10.14.6 build 18G87 might have the same issue if I can jugde from just one task that it sent back for now. I think in the past my Macs had no such problems with NFS, as they have credit. Two workunits might make it through, as they are running further, while another one tells me it's waiting for memory. I suppose RAM limitation might be the key to this problem. Please have a look at the stderr: <core_client_version>7.14.4</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63)</message> <stderr_txt> boinc initialized work files resolved, now working -> lasieve5f_1.11_x86_64-apple-darwin -> -r -> -f -> 2102270000 -> -c -> 2000 -> -R -> -o -> ../../projects/escatter11.fullerton.edu_nfs/S2L2162_2102270_0_r21334165_0 -> ../../projects/escatter11.fullerton.edu_nfs/S2L2162.poly SIGSEGV: segmentation violation Crashed executable name: lasieve5f_1.11_x86_64-apple-darwin built using BOINC library version 7.5.0 Machine type Intel x86-64h Haswell (64-bit executable) System version: Macintosh OS 10.15.7 build 19H1030 Mon May 17 07:11:05 2021 atos cannot load symbols for the file lasieve5f_1.11_x86_64-apple-darwin for architecture x86_64. 0 lasieve5f_1.11_x86_64-apple-darwin 0x000000010007d21c SIGPIPE: write on a pipe with no reader 1 lasieve5f_1.11_x86_64-apple-darwin 0x0000000100071ad7 SIGPIPE: write on a pipe with no reader 2 libsystem_platform.dylib 0x00007fff6954a5fd Thread 0 crashed with X86 Thread State (64-bit): rax: 0x0100001f rbx: 0x00000003 rcx: 0x7ffeefbfc228 rdx: 0x00000028 rdi: 0x7ffeefbfc298 rsi: 0x00000003 rbp: 0x7ffeefbfc280 rsp: 0x7ffeefbfc228 r8: 0x00000607 r9: 0x00000000 r10: 0x000009c8 r11: 0x00000206 r12: 0x00000003 r13: 0x000009c8 r14: 0x7ffeefbfc298 r15: 0x00000028 rip: 0x7fff69492dfa rfl: 0x00000206 Binary Images Description: 0x100000000 - 0x10009bfff /Library/Application Support/BOINC Data/slots/5/../../projects/escatter11.fullerton.edu_nfs/lasieve5f_1.11_x86_64-apple-darwin 0x7fff66336000 - 0x7fff66337fff /usr/lib/libSystem.B.dylib 0x7fff6661c000 - 0x7fff6666efff /usr/lib/libc++.1.dylib 0x7fff6666f000 - 0x7fff66684fff /usr/lib/libc++abi.dylib 0x7fff68196000 - 0x7fff681c9fff /usr/lib/libobjc.A.dylib 0x7fff6861f000 - 0x7fff68669fff /usr/lib/libstdc++.6.dylib 0x7fff69133000 - 0x7fff69138fff /usr/lib/system/libcache.dylib 0x7fff69139000 - 0x7fff69144fff /usr/lib/system/libcommonCrypto.dylib 0x7fff69145000 - 0x7fff6914cfff /usr/lib/system/libcompiler_rt.dylib 0x7fff6914d000 - 0x7fff69156fff /usr/lib/system/libcopyfile.dylib 0x7fff69157000 - 0x7fff691e9fff /usr/lib/system/libcorecrypto.dylib 0x7fff692f6000 - 0x7fff69336fff /usr/lib/system/libdispatch.dylib 0x7fff69337000 - 0x7fff6936dfff /usr/lib/system/libdyld.dylib 0x7fff6936e000 - 0x7fff6936efff /usr/lib/system/libkeymgr.dylib 0x7fff6937c000 - 0x7fff6937cfff /usr/lib/system/liblaunch.dylib 0x7fff6937d000 - 0x7fff69382fff /usr/lib/system/libmacho.dylib 0x7fff69383000 - 0x7fff69385fff /usr/lib/system/libquarantine.dylib 0x7fff69386000 - 0x7fff69387fff /usr/lib/system/libremovefile.dylib 0x7fff69388000 - 0x7fff6939ffff /usr/lib/system/libsystem_asl.dylib 0x7fff693a0000 - 0x7fff693a0fff /usr/lib/system/libsystem_blocks.dylib 0x7fff693a1000 - 0x7fff69428fff /usr/lib/system/libsystem_c.dylib 0x7fff69429000 - 0x7fff6942cfff /usr/lib/system/libsystem_configuration.dylib 0x7fff6942d000 - 0x7fff69430fff /usr/lib/system/libsystem_coreservices.dylib 0x7fff69431000 - 0x7fff69439fff /usr/lib/system/libsystem_darwin.dylib 0x7fff6943a000 - 0x7fff69441fff /usr/lib/system/libsystem_dnssd.dylib 0x7fff69442000 - 0x7fff69443fff /usr/lib/system/libsystem_featureflags.dylib 0x7fff69444000 - 0x7fff69491fff /usr/lib/system/libsystem_info.dylib 0x7fff69492000 - 0x7fff694befff /usr/lib/system/libsystem_kernel.dylib 0x7fff694bf000 - 0x7fff69506fff /usr/lib/system/libsystem_m.dylib 0x7fff69507000 - 0x7fff6952efff /usr/lib/system/libsystem_malloc.dylib 0x7fff6952f000 - 0x7fff6953cfff /usr/lib/system/libsystem_networkextension.dylib 0x7fff6953d000 - 0x7fff69546fff /usr/lib/system/libsystem_notify.dylib 0x7fff69547000 - 0x7fff6954ffff /usr/lib/system/libsystem_platform.dylib 0x7fff69550000 - 0x7fff6955afff /usr/lib/system/libsystem_pthread.dylib 0x7fff6955b000 - 0x7fff6955ffff /usr/lib/system/libsystem_sandbox.dylib 0x7fff69560000 - 0x7fff69562fff /usr/lib/system/libsystem_secinit.dylib 0x7fff69563000 - 0x7fff6956afff /usr/lib/system/libsystem_symptoms.dylib 0x7fff6956b000 - 0x7fff69581fff /usr/lib/system/libsystem_trace.dylib 0x7fff69583000 - 0x7fff69588fff /usr/lib/system/libunwind.dylib 0x7fff69589000 - 0x7fff695befff /usr/lib/system/libxpc.dylib Exiting... </stderr_txt> ]]> For now I only have hands-on access to the MacBook as I'm not at home, but I can have a look later-on. Thanks for your time! - - - - - - - - - - Greetings, Jens |
Send message Joined: 6 May 16 Posts: 5 Credit: 9,117,188 RAC: 4,476 |
Short addition: This doesn't look right for me: Gültig (17) · Ungültig (0) · Fehler (85) (valid, invalid, errors). - - - - - - - - - - Greetings, Jens |