All 16e tasks named "G6p*" error out on max elapsed?

log in

Advanced search

Message boards : Questions/Problems/Bugs : All 16e tasks named "G6p*" error out on max elapsed?

Author Message
DigiK-oz
Send message
Joined: 29 Nov 09
Posts: 1
Credit: 2,786,853
RAC: 0
Message 1598 - Posted: 9 Oct 2015, 6:46:59 UTC

Since today, all 16e tasks whose names start with "G6p" get a computation error with the message "maximum elapsed time exceeded" after about 2000 seconds. Anybody else seeing this? Any solutions, other than disallowing any new work or only accepting 14e/15e tasks?

jondi_hanluc
Send message
Joined: 16 May 12
Posts: 1
Credit: 26,532,135
RAC: 4
Message 1599 - Posted: 9 Oct 2015, 7:46:02 UTC - in response to Message 1598.

I'm getting this too though it doesn't appear to be every Gp49b** work unit, some are completing.

Dj Ninja
Send message
Joined: 3 Apr 15
Posts: 2
Credit: 4,147,476
RAC: 0
Message 1607 - Posted: 8 Nov 2015, 2:18:34 UTC

The same on every G2m1285b task when runnig on very fast (i7) machines.

Profile Carlos Pinho [TSBTs Pirate]
Volunteer moderator
Send message
Joined: 26 Sep 09
Posts: 162
Credit: 7,723,521
RAC: 0
Message 1608 - Posted: 8 Nov 2015, 9:44:27 UTC

Greg has been warned about these errors.

Dj Ninja
Send message
Joined: 3 Apr 15
Posts: 2
Credit: 4,147,476
RAC: 0
Message 1610 - Posted: 9 Nov 2015, 7:54:19 UTC

Tried again, happens to all new generated WUs.
I have to quit NFS until this issue has been fixed.

timbit
Send message
Joined: 19 Mar 10
Posts: 3
Credit: 1,742,166
RAC: 0
Message 1615 - Posted: 1 Dec 2015, 7:22:31 UTC

I'm having this same issue:

Stderr output

<core_client_version>7.6.9</core_client_version>
<![CDATA[
<message>
exceeded elapsed time limit 1969.46 (86400.00G/43.87G)
</message>
<stderr_txt>

197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED

In my BOINC client, the estimated time of completion for these workunits are 05:19. Seems rather small, as 14e and 15e estimated times appear near the 40-50 minute range.

Perhaps increasing the estimated time to completion (is that done via FLOPS count estimate?) will fix things?

Would that be rsc_fpops_est? But it's the same between 15e and 16 so I don't know.
Is someone looking into this? I can help debug, if that's possible.

timbit
Send message
Joined: 19 Mar 10
Posts: 3
Credit: 1,742,166
RAC: 0
Message 1616 - Posted: 1 Dec 2015, 7:36:07 UTC - in response to Message 1615.

Just doing some more digging (I have an intel core i5-2400)

(86400.00G/43.87G)

86400 refers to <rsc_fpops_bound> in init_data.xml.
or
in client_state.xml:

<app_version>
<app_name>lasieve5f</app_name>
<version_num>111</version_num>
<platform>windows_x86_64</platform>
<avg_ncpus>1.000000</avg_ncpus>
<max_ncpus>1.000000</max_ncpus>
<flops>43869921209.976120</flops>
<api_version>7.1.0</api_version>
<file_ref>
<file_name>lasieve5f_1.11_windows_x86_64.exe</file_name>
<main_program/>
</file_ref>
</app_version>

Maybe flops needs to be smaller?

Currently
86400 / 43.87 = 1969 seconds.

For 14e and 15e: (snipped for clarity) client_state.xml

<app_version>
<app_name>lasieved</app_name>
<flops>3877241497.655855</flops>
</app_version>
<app_version>
<app_name>lasievee</app_name>
<flops>6054631322.103277</flops>
</app_version>
<app_version>
<app_name>lasieve5f</app_name>
<flops>43869921209.976120</flops>
</app_version>

I don't know why the flops is so big for lasieve5f. Can this be easily modified?

Greg
Project administrator
Send message
Joined: 26 Jun 08
Posts: 582
Credit: 223,912,432
RAC: 17,546
Message 1617 - Posted: 3 Dec 2015, 20:21:30 UTC - in response to Message 1616.

You can close BOINC and manually edit client_state.xml to reduce the flops value, perhaps to match that from the other apps. It is likely a side effect resulting from the short wu's. In any case, to try to mitigate this I reset the server statistics for that app, and I regenerated all workunits for 1290000 and higher using a larger bound. Hopefully that'll help in the meantime.

timbit
Send message
Joined: 19 Mar 10
Posts: 3
Credit: 1,742,166
RAC: 0
Message 1618 - Posted: 4 Dec 2015, 17:56:47 UTC - in response to Message 1617.

Thanks Greg, this has seemed to work.

I shut down the BOINC client. Then I modified the <flops> value for 16e to be the same as 15e:

client_state.xml

<app_version>
<app_name>lasieve5f</app_name>
<version_num>111</version_num>
<platform>windows_x86_64</platform>
<avg_ncpus>1.000000</avg_ncpus>
<max_ncpus>1.000000</max_ncpus>
<flops>3395101616.607331</flops> <==== changed to same as laaievee
<api_version>7.1.0</api_version>
<file_ref>
<file_name>lasieve5f_1.11_windows_x86_64.exe</file_name>
<main_program/>
</file_ref>
</app_version>

Then restarted my client. This appears to have worked (although it seems the BOINC app has changed the value to another smaller value), but the flops value is nowhere as high as it originally was. Things are working and my workunits seem to take around 45 mins on a Intel core i5-2400.

Thanks.

ethan mike
Send message
Joined: 4 Dec 15
Posts: 1
Credit: 0
RAC: 0
Message 1619 - Posted: 5 Dec 2015, 4:32:34 UTC - in response to Message 1599.

I'm getting this too though it doesn't appear to be every Gp49b** work unit, some are completing.

Message boards : Questions/Problems/Bugs : All 16e tasks named "G6p*" error out on max elapsed?


Home | My Account | Message Boards