log in

All 16e tasks named "G6p*" error out on max elapsed?

Message boards : Questions/Problems/Bugs : All 16e tasks named "G6p*" error out on max elapsed?
Message board moderation

To post messages, you must log in.

AuthorMessage
DigiK-oz

Send message
Joined: 29 Nov 09
Posts: 1
Credit: 2,786,853
RAC: 0
Message 1598 - Posted: 9 Oct 2015, 6:46:59 UTC

Since today, all 16e tasks whose names start with "G6p" get a computation error with the message "maximum elapsed time exceeded" after about 2000 seconds. Anybody else seeing this? Any solutions, other than disallowing any new work or only accepting 14e/15e tasks?
ID: 1598 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jondi_hanluc

Send message
Joined: 16 May 12
Posts: 1
Credit: 36,651,725
RAC: 0
Message 1599 - Posted: 9 Oct 2015, 7:46:02 UTC - in response to Message 1598.  

I'm getting this too though it doesn't appear to be every Gp49b** work unit, some are completing.
ID: 1599 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dj Ninja

Send message
Joined: 3 Apr 15
Posts: 2
Credit: 6,237,746
RAC: 0
Message 1607 - Posted: 8 Nov 2015, 2:18:34 UTC

The same on every G2m1285b task when runnig on very fast (i7) machines.
ID: 1607 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Gigacruncher [TSBTs Pirate]
Volunteer moderator

Send message
Joined: 26 Sep 09
Posts: 212
Credit: 22,049,483
RAC: 13,070
Message 1608 - Posted: 8 Nov 2015, 9:44:27 UTC

Greg has been warned about these errors.
ID: 1608 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dj Ninja

Send message
Joined: 3 Apr 15
Posts: 2
Credit: 6,237,746
RAC: 0
Message 1610 - Posted: 9 Nov 2015, 7:54:19 UTC

Tried again, happens to all new generated WUs.
I have to quit NFS until this issue has been fixed.
ID: 1610 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
timbit

Send message
Joined: 19 Mar 10
Posts: 3
Credit: 1,828,730
RAC: 61
Message 1615 - Posted: 1 Dec 2015, 7:22:31 UTC

I'm having this same issue:

Stderr output

<core_client_version>7.6.9</core_client_version>
<![CDATA[
<message>
exceeded elapsed time limit 1969.46 (86400.00G/43.87G)
</message>
<stderr_txt>

197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED

In my BOINC client, the estimated time of completion for these workunits are 05:19. Seems rather small, as 14e and 15e estimated times appear near the 40-50 minute range.

Perhaps increasing the estimated time to completion (is that done via FLOPS count estimate?) will fix things?

Would that be rsc_fpops_est? But it's the same between 15e and 16 so I don't know.
Is someone looking into this? I can help debug, if that's possible.
ID: 1615 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
timbit

Send message
Joined: 19 Mar 10
Posts: 3
Credit: 1,828,730
RAC: 61
Message 1616 - Posted: 1 Dec 2015, 7:36:07 UTC - in response to Message 1615.  

Just doing some more digging (I have an intel core i5-2400)

(86400.00G/43.87G)

86400 refers to <rsc_fpops_bound> in init_data.xml.
or
in client_state.xml:

<app_version>
<app_name>lasieve5f</app_name>
<version_num>111</version_num>
<platform>windows_x86_64</platform>
<avg_ncpus>1.000000</avg_ncpus>
<max_ncpus>1.000000</max_ncpus>
<flops>43869921209.976120</flops>
<api_version>7.1.0</api_version>
<file_ref>
<file_name>lasieve5f_1.11_windows_x86_64.exe</file_name>
<main_program/>
</file_ref>
</app_version>

Maybe flops needs to be smaller?

Currently
86400 / 43.87 = 1969 seconds.

For 14e and 15e: (snipped for clarity) client_state.xml

<app_version>
<app_name>lasieved</app_name>
<flops>3877241497.655855</flops>
</app_version>
<app_version>
<app_name>lasievee</app_name>
<flops>6054631322.103277</flops>
</app_version>
<app_version>
<app_name>lasieve5f</app_name>
<flops>43869921209.976120</flops>
</app_version>

I don't know why the flops is so big for lasieve5f. Can this be easily modified?
ID: 1616 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greg
Project administrator

Send message
Joined: 26 Jun 08
Posts: 640
Credit: 433,775,088
RAC: 344,736
Message 1617 - Posted: 3 Dec 2015, 20:21:30 UTC - in response to Message 1616.  

You can close BOINC and manually edit client_state.xml to reduce the flops value, perhaps to match that from the other apps. It is likely a side effect resulting from the short wu's. In any case, to try to mitigate this I reset the server statistics for that app, and I regenerated all workunits for 1290000 and higher using a larger bound. Hopefully that'll help in the meantime.
ID: 1617 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
timbit

Send message
Joined: 19 Mar 10
Posts: 3
Credit: 1,828,730
RAC: 61
Message 1618 - Posted: 4 Dec 2015, 17:56:47 UTC - in response to Message 1617.  

Thanks Greg, this has seemed to work.

I shut down the BOINC client. Then I modified the <flops> value for 16e to be the same as 15e:

client_state.xml

<app_version>
<app_name>lasieve5f</app_name>
<version_num>111</version_num>
<platform>windows_x86_64</platform>
<avg_ncpus>1.000000</avg_ncpus>
<max_ncpus>1.000000</max_ncpus>
<flops>3395101616.607331</flops> <==== changed to same as laaievee
<api_version>7.1.0</api_version>
<file_ref>
<file_name>lasieve5f_1.11_windows_x86_64.exe</file_name>
<main_program/>
</file_ref>
</app_version>

Then restarted my client. This appears to have worked (although it seems the BOINC app has changed the value to another smaller value), but the flops value is nowhere as high as it originally was. Things are working and my workunits seem to take around 45 mins on a Intel core i5-2400.

Thanks.
ID: 1618 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Questions/Problems/Bugs : All 16e tasks named "G6p*" error out on max elapsed?


Home | My Account | Message Boards