log in

lasievef 1.08 still errors w/ windows

Message boards : Questions/Problems/Bugs : lasievef 1.08 still errors w/ windows
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
zombie67 [MM]
Avatar

Send message
Joined: 7 Sep 09
Posts: 24
Credit: 54,003,504
RAC: 0
Message 535 - Posted: 30 Aug 2010, 20:21:10 UTC

FYI, even with the new 1.08, I am still generating nothing but errors with lasievef tasks. This is across my 5 XP64 machines.

Here is a sample:

http://escatter11.fullerton.edu/nfs/result.php?resultid=8102432
Reno, NV
Team: SETI.USA

ID: 535 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greg
Project administrator

Send message
Joined: 26 Jun 08
Posts: 640
Credit: 436,153,738
RAC: 161,489
Message 536 - Posted: 30 Aug 2010, 21:21:49 UTC - in response to Message 535.  

xmalloc errors indicate that there was insufficient contiguous free memory. The current lasievef workunits still require about 1GB of free memory to run, and something else is using that memory on your computer. Now that the bug has been squashed, I will re-release "b" workunits. They should reduce memory use from 1GB to about 700 MB per workunit, which is a bit of an improvement. However, they won't actually go out for 2-3 days. Perhaps you can try again then.
ID: 536 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 7 Sep 09
Posts: 24
Credit: 54,003,504
RAC: 0
Message 537 - Posted: 30 Aug 2010, 21:34:28 UTC - in response to Message 536.  
Last modified: 30 Aug 2010, 22:16:19 UTC

xmalloc errors indicate that there was insufficient contiguous free memory. The current lasievef workunits still require about 1GB of free memory to run, and something else is using that memory on your computer. Now that the bug has been squashed, I will re-release "b" workunits. They should reduce memory use from 1GB to about 700 MB per workunit, which is a bit of an improvement. However, they won't actually go out for 2-3 days. Perhaps you can try again then.


Are you saying that the problem is with my machines? They have way more than 1GB free per core. Some normally have 7-9gb free any any given time. These fail even when running only a single task.

FYI, this problem started around 08/25.

Also FYI, this is 100% of the tasks failing 100% of the time.

So this is not being caused by running short of memory. Or if it is, then the tasks are requiring >6gb per task.

Edit: For example, here is a quad core machine with 8gb of RAM. That is enough for 2gb per task. So if it is failing because there is not enough RAM, then the task is trying to use more than 2gb per task. And that assumes that 4 of these are running at the same time, which rarely happens. This project has a very small resource percentage, specifically to avoid that scenario.

http://escatter11.fullerton.edu/nfs/show_host_detail.php?hostid=10470

Edit #2:

I ran this task all by itself. Task manager showed memory usage climbed steadily for the first 100 seconds, to about 100k. Then for the last few seconds, it shot up to ~230k, and then errored out. It never consumed more than that, and the machine had >7gb sitting free the whole time.

http://escatter11.fullerton.edu/nfs/result.php?resultid=8103106

Another observation: My wingmen with windows are not able to compute these tasks without errors too. So it's not just me or my machines.
Reno, NV
Team: SETI.USA

ID: 537 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greg
Project administrator

Send message
Joined: 26 Jun 08
Posts: 640
Credit: 436,153,738
RAC: 161,489
Message 538 - Posted: 31 Aug 2010, 0:42:36 UTC - in response to Message 537.  
Last modified: 31 Aug 2010, 0:47:58 UTC

I do find this very strange. xmalloc errors come only from the operating system telling the software that insufficient contiguous memory is available. It is possible that the computer has been on a long time and the memory has become very fragmented. There was also a Microsoft bug that can cause this, but the fix has been out for a while. Can you try reboot one of the larger memory computers to see if that helps?

As a comparison, here's one of my computers with similar specs:
http://escatter11.fullerton.edu/nfs/show_host_detail.php?hostid=10590
Note that all the the "b" tasks failed with version 1.07, but otherwise nearly all tasks succeed.

Edit: If rebooting doesn't help, I can produce a Windows 64-bit binary for manual installation. I haven't done so since it will be slower than the 32-bit binary without the assembly optimizations.
ID: 538 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 7 Sep 09
Posts: 24
Credit: 54,003,504
RAC: 0
Message 539 - Posted: 31 Aug 2010, 2:47:44 UTC
Last modified: 31 Aug 2010, 2:58:34 UTC

Will do. And I'll let you know how it goes after reboot. But how does this address all my wingmen that also failed in the same way?

Edit: Fresh reboot. Same thing with two more tasks.

These machines were doing just fine a month ago with these lasievef tasks. They have been running them exclusively since the announce about more credits. Did something change recently (I mean, before the 1.08 change), say about a week ago?

Edit #2: Just to remove doubt, here is the history of this machine. It's fairly lumpy because of the low resource share. But still, you can see it was working fine until recently. All my machines graphs look like this too.


Reno, NV
Team: SETI.USA

ID: 539 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greg
Project administrator

Send message
Joined: 26 Jun 08
Posts: 640
Credit: 436,153,738
RAC: 161,489
Message 540 - Posted: 31 Aug 2010, 3:21:56 UTC - in response to Message 539.  
Last modified: 31 Aug 2010, 4:01:30 UTC

Yes, we are now doing the largest factorization we have every tried, and as such it requests more memory than before. Let me see if I can rush in the lower memory parameters so you won't have to wait as long to try them. They will be "S5m409b2" workunits, so they will be recognizable.

Edit: They're going out now. Give one of these a try.
ID: 540 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 7 Sep 09
Posts: 24
Credit: 54,003,504
RAC: 0
Message 541 - Posted: 31 Aug 2010, 5:00:14 UTC - in response to Message 540.  

Yes, we are now doing the largest factorization we have every tried, and as such it requests more memory than before. Let me see if I can rush in the lower memory parameters so you won't have to wait as long to try them. They will be "S5m409b2" workunits, so they will be recognizable.

Edit: They're going out now. Give one of these a try.



Sorry. Same thing:

http://escatter11.fullerton.edu/nfs/result.php?resultid=8115730

FWIW, memory consumption was the same. That task got up to about 230k, and then errored out. I don't think memory usage is not the problem. At least not on the my side, and my wingman's side. Maybe a setting with the app? What changed a week ago?
Reno, NV
Team: SETI.USA

ID: 541 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
barth

Send message
Joined: 17 Mar 10
Posts: 1
Credit: 63,675
RAC: 0
Message 542 - Posted: 31 Aug 2010, 12:29:31 UTC - in response to Message 541.  

Hi,

I have also problems with the S5m409c tasks lately, see:
http://escatter11.fullerton.edu/nfs/result.php?resultid=8111997
for instance. I'm also on a windows system, and have seen that I ran out of memory, but should that not simply suspend the task until the computer is idle? I got some valid results of the S5m409c series, too.

I hope this can be resolved, this is really one of the very few useful math projects.

Peter
ID: 542 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ai5000

Send message
Joined: 15 Sep 09
Posts: 1
Credit: 1,802,222
RAC: 0
Message 543 - Posted: 31 Aug 2010, 19:26:59 UTC
Last modified: 31 Aug 2010, 19:41:10 UTC

I just had one of the S5m409b2 tasks fail as well.

http://escatter11.fullerton.edu/nfs/result.php?resultid=8132325

Edit: Are the S5m409b2 tasks out now suppose to be using less memory? The S5m409b2 and S5m409c tasks I'm currently running are using about 1.1 Gb each.
ID: 543 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greg
Project administrator

Send message
Joined: 26 Jun 08
Posts: 640
Credit: 436,153,738
RAC: 161,489
Message 544 - Posted: 1 Sep 2010, 3:47:07 UTC

I've been able to reproduce this problem locally using a Windows XP computer. It is a memory allocation issue. In my testcase, the program asks for 1.1 GB of memory and Windows XP consistently says no (specifically, malloc returns NULL) even though plenty of memory is free while both 32-bit Linux and Windows 7 say sure. I'll use this computer to test parameters for future factorizations to make sure this doesn't happen again.
ID: 544 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greg
Project administrator

Send message
Joined: 26 Jun 08
Posts: 640
Credit: 436,153,738
RAC: 161,489
Message 545 - Posted: 1 Sep 2010, 6:13:35 UTC

Alright, try a few of the S5m409d tasks going out now. If you get a "q does not divide" error, ignore it and try another. If you keep getting "xmalloc" errors, then I'll try tweaking the parameters again.
ID: 545 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>Dell>LesDelliens]La ...

Send message
Joined: 7 Sep 09
Posts: 5
Credit: 37,812
RAC: 0
Message 546 - Posted: 1 Sep 2010, 8:25:43 UTC

Hi Greg! I've just come across some S5m409d tasks right now and I'm wondering if they are supposed to be much longer than the previous b2 and c ones? Because on my linux 64 the b2 and c ones finished in an hour or so whereas for the d ones I'm only around 10% after more than an hour.

Thanks
ID: 546 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile silent Float

Send message
Joined: 4 Nov 09
Posts: 3
Credit: 250,044
RAC: 0
Message 547 - Posted: 1 Sep 2010, 10:55:35 UTC - in response to Message 546.  

Hi Greg! I've just come across some S5m409d tasks right now and I'm wondering if they are supposed to be much longer than the previous b2 and c ones? Because on my linux 64 the b2 and c ones finished in an hour or so whereas for the d ones I'm only around 10% after more than an hour.

Thanks


Same problem with my system (WinXP32).

Thanks
ID: 547 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bdodson*

Send message
Joined: 2 Oct 09
Posts: 50
Credit: 111,128,218
RAC: 0
Message 548 - Posted: 1 Sep 2010, 14:35:05 UTC - in response to Message 547.  



Same problem with my system (WinXP32).

Thanks


Me too. Up from 41 min to what looks like 5 hrs on our core i7 server. -bd

(Not that this is necessarily a problem; so long as the longer
tasks are producing more relations in proportion to the time.)
ID: 548 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greg
Project administrator

Send message
Joined: 26 Jun 08
Posts: 640
Credit: 436,153,738
RAC: 161,489
Message 549 - Posted: 1 Sep 2010, 16:05:11 UTC - in response to Message 548.  

Nope, not the intention. I need to stop doing things at midnight... On the next connection of your client to the server, these will be aborted by the server. You can do a project update manually if you like to speed up the process. Sorry for the inconvenience! Additional testing is underway now...
ID: 549 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jeff Blank

Send message
Joined: 9 Mar 10
Posts: 3
Credit: 2,529,966
RAC: 0
Message 550 - Posted: 1 Sep 2010, 16:06:36 UTC
Last modified: 1 Sep 2010, 16:09:34 UTC

In addition to the somewhat increased compute-time estimates that others are seeing with lasievef v1.08, I'm also seeing very infrequent checkpointing on Linux-32, only every 7-8 minutes.

lasievee v1.08 behaviour is the same as with previous versions, frequent checkpoints and more normal compute times.

EDIT: never mind, Greg's post came in while I was typing this up.
ID: 550 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile silent Float

Send message
Joined: 4 Nov 09
Posts: 3
Credit: 250,044
RAC: 0
Message 551 - Posted: 1 Sep 2010, 16:54:46 UTC - in response to Message 548.  



Same problem with my system (WinXP32).

Thanks


Me too. Up from 41 min to what looks like 5 hrs on our core i7 server. -bd

(Not that this is necessarily a problem; so long as the longer
tasks are producing more relations in proportion to the time.)


WU S5m409D_201007_0 "Computation error" after 4H 26min !! (WinXP32)

Longer tasks are welcome, 32 bit systems have limited memory size so one can manually "Resume" 1 or 2 WU and run these with other projects.

Thanks
ID: 551 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greg
Project administrator

Send message
Joined: 26 Jun 08
Posts: 640
Credit: 436,153,738
RAC: 161,489
Message 552 - Posted: 1 Sep 2010, 19:53:05 UTC

I just updated the application for 32-bit Windows. I changed the memory allocation routine to use the Windows function VirtualAlloc() rather than malloc(), which seems to help a little on 32-bit Windows XP. Also, booting Windows XP with the /3GB switch if you can also seems to help a little. I'm interested to know if this new version helps with Windows XP 64. This version also prints out the size of the large allocation.

We do seem to be running into the limits of the Windows XP memory manager. Windows 7 doesn't seem to have these problems with 1GB allocations.
ID: 552 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zpm

Send message
Joined: 17 Oct 09
Posts: 5
Credit: 3,495
RAC: 0
Message 553 - Posted: 1 Sep 2010, 21:41:24 UTC - in response to Message 552.  
Last modified: 1 Sep 2010, 21:43:52 UTC

Hey, Greg. i haven't run this project in a while but I'm running the 1.08 on win 7 right now with no problems so; your theory seems to hold true.

edit:

any chance of seeing a 64-bit app in the future.
ID: 553 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greg
Project administrator

Send message
Joined: 26 Jun 08
Posts: 640
Credit: 436,153,738
RAC: 161,489
Message 554 - Posted: 1 Sep 2010, 21:49:09 UTC - in response to Message 553.  

Unfortunately not in the short term. In principle I could build a Win64 app from the C code, but it would be slower than the current 32-bit app with assembly optimizations. We do have 64-bit assembly used for the Linux version, but I'm not an assembly guru and I'm told by an assembly expert that the code would be a huge pain to convert thanks to differing Windows calling conventions. There are a few wrappers out to mitigate this issue, but I'm told thanks to the large number of parameters passed these won't work in this case.
ID: 554 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Questions/Problems/Bugs : lasievef 1.08 still errors w/ windows


Home | My Account | Message Boards