lasievef 1.08 still errors w/ windows
Message boards :
Questions/Problems/Bugs :
lasievef 1.08 still errors w/ windows
Message board moderation
Author | Message |
---|---|
Send message Joined: 7 Sep 09 Posts: 24 Credit: 55,483,750 RAC: 18 |
FYI, even with the new 1.08, I am still generating nothing but errors with lasievef tasks. This is across my 5 XP64 machines. Here is a sample: http://escatter11.fullerton.edu/nfs/result.php?resultid=8102432 Reno, NV Team: SETI.USA |
Send message Joined: 26 Jun 08 Posts: 644 Credit: 460,473,368 RAC: 60,339 |
xmalloc errors indicate that there was insufficient contiguous free memory. The current lasievef workunits still require about 1GB of free memory to run, and something else is using that memory on your computer. Now that the bug has been squashed, I will re-release "b" workunits. They should reduce memory use from 1GB to about 700 MB per workunit, which is a bit of an improvement. However, they won't actually go out for 2-3 days. Perhaps you can try again then. |
Send message Joined: 7 Sep 09 Posts: 24 Credit: 55,483,750 RAC: 18 |
xmalloc errors indicate that there was insufficient contiguous free memory. The current lasievef workunits still require about 1GB of free memory to run, and something else is using that memory on your computer. Now that the bug has been squashed, I will re-release "b" workunits. They should reduce memory use from 1GB to about 700 MB per workunit, which is a bit of an improvement. However, they won't actually go out for 2-3 days. Perhaps you can try again then. Are you saying that the problem is with my machines? They have way more than 1GB free per core. Some normally have 7-9gb free any any given time. These fail even when running only a single task. FYI, this problem started around 08/25. Also FYI, this is 100% of the tasks failing 100% of the time. So this is not being caused by running short of memory. Or if it is, then the tasks are requiring >6gb per task. Edit: For example, here is a quad core machine with 8gb of RAM. That is enough for 2gb per task. So if it is failing because there is not enough RAM, then the task is trying to use more than 2gb per task. And that assumes that 4 of these are running at the same time, which rarely happens. This project has a very small resource percentage, specifically to avoid that scenario. http://escatter11.fullerton.edu/nfs/show_host_detail.php?hostid=10470 Edit #2: I ran this task all by itself. Task manager showed memory usage climbed steadily for the first 100 seconds, to about 100k. Then for the last few seconds, it shot up to ~230k, and then errored out. It never consumed more than that, and the machine had >7gb sitting free the whole time. http://escatter11.fullerton.edu/nfs/result.php?resultid=8103106 Another observation: My wingmen with windows are not able to compute these tasks without errors too. So it's not just me or my machines. Reno, NV Team: SETI.USA |
Send message Joined: 26 Jun 08 Posts: 644 Credit: 460,473,368 RAC: 60,339 |
I do find this very strange. xmalloc errors come only from the operating system telling the software that insufficient contiguous memory is available. It is possible that the computer has been on a long time and the memory has become very fragmented. There was also a Microsoft bug that can cause this, but the fix has been out for a while. Can you try reboot one of the larger memory computers to see if that helps? As a comparison, here's one of my computers with similar specs: http://escatter11.fullerton.edu/nfs/show_host_detail.php?hostid=10590 Note that all the the "b" tasks failed with version 1.07, but otherwise nearly all tasks succeed. Edit: If rebooting doesn't help, I can produce a Windows 64-bit binary for manual installation. I haven't done so since it will be slower than the 32-bit binary without the assembly optimizations. |
Send message Joined: 7 Sep 09 Posts: 24 Credit: 55,483,750 RAC: 18 |
Will do. And I'll let you know how it goes after reboot. But how does this address all my wingmen that also failed in the same way? Edit: Fresh reboot. Same thing with two more tasks. These machines were doing just fine a month ago with these lasievef tasks. They have been running them exclusively since the announce about more credits. Did something change recently (I mean, before the 1.08 change), say about a week ago? Edit #2: Just to remove doubt, here is the history of this machine. It's fairly lumpy because of the low resource share. But still, you can see it was working fine until recently. All my machines graphs look like this too. Reno, NV Team: SETI.USA |
Send message Joined: 26 Jun 08 Posts: 644 Credit: 460,473,368 RAC: 60,339 |
Yes, we are now doing the largest factorization we have every tried, and as such it requests more memory than before. Let me see if I can rush in the lower memory parameters so you won't have to wait as long to try them. They will be "S5m409b2" workunits, so they will be recognizable. Edit: They're going out now. Give one of these a try. |
Send message Joined: 7 Sep 09 Posts: 24 Credit: 55,483,750 RAC: 18 |
Yes, we are now doing the largest factorization we have every tried, and as such it requests more memory than before. Let me see if I can rush in the lower memory parameters so you won't have to wait as long to try them. They will be "S5m409b2" workunits, so they will be recognizable. Sorry. Same thing: http://escatter11.fullerton.edu/nfs/result.php?resultid=8115730 FWIW, memory consumption was the same. That task got up to about 230k, and then errored out. I don't think memory usage is not the problem. At least not on the my side, and my wingman's side. Maybe a setting with the app? What changed a week ago? Reno, NV Team: SETI.USA |
Send message Joined: 17 Mar 10 Posts: 1 Credit: 63,675 RAC: 0 |
Hi, I have also problems with the S5m409c tasks lately, see: http://escatter11.fullerton.edu/nfs/result.php?resultid=8111997 for instance. I'm also on a windows system, and have seen that I ran out of memory, but should that not simply suspend the task until the computer is idle? I got some valid results of the S5m409c series, too. I hope this can be resolved, this is really one of the very few useful math projects. Peter |
Send message Joined: 15 Sep 09 Posts: 1 Credit: 1,802,222 RAC: 0 |
I just had one of the S5m409b2 tasks fail as well. http://escatter11.fullerton.edu/nfs/result.php?resultid=8132325 Edit: Are the S5m409b2 tasks out now suppose to be using less memory? The S5m409b2 and S5m409c tasks I'm currently running are using about 1.1 Gb each. |
Send message Joined: 26 Jun 08 Posts: 644 Credit: 460,473,368 RAC: 60,339 |
I've been able to reproduce this problem locally using a Windows XP computer. It is a memory allocation issue. In my testcase, the program asks for 1.1 GB of memory and Windows XP consistently says no (specifically, malloc returns NULL) even though plenty of memory is free while both 32-bit Linux and Windows 7 say sure. I'll use this computer to test parameters for future factorizations to make sure this doesn't happen again. |
Send message Joined: 26 Jun 08 Posts: 644 Credit: 460,473,368 RAC: 60,339 |
Alright, try a few of the S5m409d tasks going out now. If you get a "q does not divide" error, ignore it and try another. If you keep getting "xmalloc" errors, then I'll try tweaking the parameters again. |
Send message Joined: 7 Sep 09 Posts: 5 Credit: 37,812 RAC: 0 |
Hi Greg! I've just come across some S5m409d tasks right now and I'm wondering if they are supposed to be much longer than the previous b2 and c ones? Because on my linux 64 the b2 and c ones finished in an hour or so whereas for the d ones I'm only around 10% after more than an hour. Thanks |
Send message Joined: 4 Nov 09 Posts: 3 Credit: 250,044 RAC: 0 |
Hi Greg! I've just come across some S5m409d tasks right now and I'm wondering if they are supposed to be much longer than the previous b2 and c ones? Because on my linux 64 the b2 and c ones finished in an hour or so whereas for the d ones I'm only around 10% after more than an hour. Same problem with my system (WinXP32). Thanks |
Send message Joined: 2 Oct 09 Posts: 50 Credit: 111,128,218 RAC: 0 |
Me too. Up from 41 min to what looks like 5 hrs on our core i7 server. -bd (Not that this is necessarily a problem; so long as the longer tasks are producing more relations in proportion to the time.) |
Send message Joined: 26 Jun 08 Posts: 644 Credit: 460,473,368 RAC: 60,339 |
Nope, not the intention. I need to stop doing things at midnight... On the next connection of your client to the server, these will be aborted by the server. You can do a project update manually if you like to speed up the process. Sorry for the inconvenience! Additional testing is underway now... |
Send message Joined: 9 Mar 10 Posts: 3 Credit: 2,529,966 RAC: 0 |
In addition to the somewhat increased compute-time estimates that others are seeing with lasievef v1.08, I'm also seeing very infrequent checkpointing on Linux-32, only every 7-8 minutes. lasievee v1.08 behaviour is the same as with previous versions, frequent checkpoints and more normal compute times. EDIT: never mind, Greg's post came in while I was typing this up. |
Send message Joined: 4 Nov 09 Posts: 3 Credit: 250,044 RAC: 0 |
WU S5m409D_201007_0 "Computation error" after 4H 26min !! (WinXP32) Longer tasks are welcome, 32 bit systems have limited memory size so one can manually "Resume" 1 or 2 WU and run these with other projects. Thanks |
Send message Joined: 26 Jun 08 Posts: 644 Credit: 460,473,368 RAC: 60,339 |
I just updated the application for 32-bit Windows. I changed the memory allocation routine to use the Windows function VirtualAlloc() rather than malloc(), which seems to help a little on 32-bit Windows XP. Also, booting Windows XP with the /3GB switch if you can also seems to help a little. I'm interested to know if this new version helps with Windows XP 64. This version also prints out the size of the large allocation. We do seem to be running into the limits of the Windows XP memory manager. Windows 7 doesn't seem to have these problems with 1GB allocations. |
Send message Joined: 17 Oct 09 Posts: 5 Credit: 3,495 RAC: 0 |
Hey, Greg. i haven't run this project in a while but I'm running the 1.08 on win 7 right now with no problems so; your theory seems to hold true. edit: any chance of seeing a 64-bit app in the future. |
Send message Joined: 26 Jun 08 Posts: 644 Credit: 460,473,368 RAC: 60,339 |
Unfortunately not in the short term. In principle I could build a Win64 app from the C code, but it would be slower than the current 32-bit app with assembly optimizations. We do have 64-bit assembly used for the Linux version, but I'm not an assembly guru and I'm told by an assembly expert that the code would be a huge pain to convert thanks to differing Windows calling conventions. There are a few wrappers out to mitigate this issue, but I'm told thanks to the large number of parameters passed these won't work in this case. |