Error's while Computing ?
Message boards :
Questions/Problems/Bugs :
Error's while Computing ?
Message board moderation
Author | Message |
---|---|
Send message Joined: 9 Oct 09 Posts: 30 Credit: 26,809,482 RAC: 0 |
I hadn't noticed before today but I'm amazed at the amount of Error While Computing 16e Wu's I have. I was checking some one else's Box's and seen he had quit a few so decided to check mine & seen I had a Ton of them myself on every Box. Even my Stock running Box's have them so it's not due to OC'ing. Seems like an awful waste of Resource time to get that many & probably one reason why there so little participation here other than the Admin themselves. It could be just the Windows OS Machines that are affected by this as I don't notice it so much on the Linux Box's ... I'm running at least 4GB of Memory on every Box I have & am only running 3 Cores on the Quads & 6 on the 6 Core Box's which have 12GB of Memory so lack of Memory shouldn't be a problem. |
Send message Joined: 26 Jun 08 Posts: 645 Credit: 472,378,768 RAC: 255,171 |
You're right. This number does have an unusually high failure rate of about 10% on Windows but not on Linux, and it's not the usual memory error. These larger numbers are pushing the boundaries of the siever. The good news is that the failures mostly happen after a few minutes of runtime so not much time is lost and that I have an explicit list of test cases to use in debugging. The bad news is that my plate is full right now, so it will be a few weeks before I can get to it. Thanks for letting me know! |
Send message Joined: 2 Oct 09 Posts: 50 Credit: 111,128,218 RAC: 0 |
I hadn't noticed before today but I'm amazed at the amount of Error While Computing 16e Wu's I have. ... Thanks for the promotion, friend (and what have you done to Poorboy?). For the record, the admin here at NFS@Home is a singleton; Greg and Greg only. I can verify divergence from the admin view by reminding everyone that the 15e Wu's have at least a comparable amount of math interest as the 16e's, with none of the Error issues. I was --- of course, as always --- happy to see you back in the top10. I find it hard to believe that the difference in the credit adjustment between 16e's and 15e's is enough to make a visible ripple in your daily credit totals. I can further support my non-admin status by a first report that the 16e number on the "Status" page listed as in "postprocessing" finished last night --- a very remarkable three-prime factorization, with all three primes far away from ECM range (that is, I and the other non-boinc contributors didn't "miss" a small factor, for that part of the computation). This was a Champion 290-digit number, with prime factors having 85-digits, 96-digits and 110-digits; a real achievement for boinc computation. -Bruce ( ... let's wait for the news, for further info ...) |
Send message Joined: 9 Oct 09 Posts: 30 Credit: 26,809,482 RAC: 0 |
I was just pointing out the abnormal amount of errors bruce, or is that prohibited at this project ??? the cedit doesn't mean squat to me but if theres to many errors then it's just a waste of time to run the wu's. I switched over to the 15e Wu's to see how they run on my box's ... For the Record the 16e Wu's don't like a reboot or exit from BOINC very much, on a 6 core box I may get 2 or 3 wu's with computation error's if I reboot or even exit from BOINC. I have my Preferences set to keep in memory ... |
Send message Joined: 2 Oct 09 Posts: 50 Credit: 111,128,218 RAC: 0 |
I was just pointing out the abnormal amount of errors bruce, or is that prohibited at this project ??? the cedit doesn't mean squat to me but if theres to many errors then it's just a waste of time to run the wu's. I switched over to the 15e Wu's to see how they run on my box's ... Thanks.
And thanks for the report; replying as a friend of the admin. Our two nvidia/tessla C2050's (under linux, on a box with six cores) finished the precomputation for a 16e project (non-boinc) they were working on. Any suggestion of a boinc project with an application they'd be able to run? The usual GPU suspects I tried either didn't have what they advertised, or an app with Wu's that just gave errors --- shortly after starting. -Bruce |
Send message Joined: 9 Oct 09 Posts: 30 Credit: 26,809,482 RAC: 0 |
So far I've only had 1 Wu Error out on the 15e's so that a plus, will keep monitoring though ... Tessla's an odd ball card, don't really know what you could run with it, haven't seen many if any around really ... |
Send message Joined: 5 Sep 09 Posts: 6 Credit: 79,480 RAC: 0 |
...Our two nvidia/Tesla C2050's (under linux, on a box with six cores) finished the precomputation for a 16e project (non-boinc) they were working on. Any suggestion of a boinc project with an application they'd be able to run? The usual GPU suspects I tried either didn't have what they advertised, or an app with Wu's that just gave errors --- shortly after starting. The next unreserved gnfs (after the c187 2,956+ that looks good now, thanks to you and JRK) would be the staggering c194 12,283+ and c194 2,2186M. I am not sure if there are even any reasonable internal parameters but I'll have a look in the source. These could be fun - independently with degree 5 and degree 6 (who knows). -Serge |
Send message Joined: 9 Oct 09 Posts: 30 Credit: 26,809,482 RAC: 0 |
I just found a 15e that had been running for close to 7 Hr's now and was only @ 11% Completion. I just suspended it for now in case you want any file info from it, then I can Delete/Abort it or restart it by Exiting BOINC when I'm going to be around for an hour to monitor it. If after another hr or so if it shows the same tendencies I will just Abort it ??? |
Send message Joined: 16 Oct 09 Posts: 46 Credit: 833,166 RAC: 1 |
I just found a 15e that had been running for close to 7 Hr's now and was only @ 11% Completion. I just suspended it for now in case you want any file info from it, then I can Delete/Abort it or restart it by Exiting BOINC when I'm going to be around for an hour to monitor it. If after another hr or so if it shows the same tendencies I will just Abort it ??? If it hasen't progressed past 11% in an hour try a system restart. If still the same after a restart I would abort it |
Send message Joined: 9 Oct 09 Posts: 30 Credit: 26,809,482 RAC: 0 |
Our two nvidia/tessla C2050's (under linux, on a box with six cores) finished the computation for a 16e project (non-boinc) they were working on. Any I see some "[2] NVIDIA Tesla C1060 (4095MB) driver: 19562" running the PrimeGrid "Proth Prime Search (Sieve) v1.30 (cuda23)" Wu's if your interested or think you can get your nvidia/tessla C2050's to run them ... ??? You need the following app_info.xml & the executable and the cudart.dll from HERE ... <app_info> <app> <name>pps_sr2sieve</name> <user_friendly_name>Proth Prime Search (Sieve)</user_friendly_name> </app> <file_info> <name>primegrid_ppsieve_1.30_windows_intelx86__cuda23.exe</name> <executable/> </file_info> <app_version> <app_name>pps_sr2sieve</app_name> <version_num>130</version_num> <plan_class>cuda23</plan_class> <avg_ncpus>0.05</avg_ncpus> <max_ncpus>1</max_ncpus> <flops>1.0e11</flops> <coproc> <type>CUDA</type> <count>1</count> </coproc> <cmdline></cmdline> <file_ref> <file_name>primegrid_ppsieve_1.30_windows_intelx86__cuda23.exe</file_name> <main_program/> </file_ref> </app_version> </app_info> You need to put all three files into the primegrid project directory. In my case it is C:\Program Files\BOINC\projects\www.primegrid.com. You need to completely shutdown BOINC (Manager, Client and all the currently running science apps) before you copy the files into the directory. After that you can restart BOINC and should see something like this in the messages: 13 28.10.2010 21:47:02 NVIDIA GPU GeForce GTX 460 (driver version 26063, CUDA version 3020, compute capability 2.1, 993MB, 363 GFLOPS peak) 14 Collatz Conjecture 28.10.2010 21:47:02 Found app_info.xml; using anonymous platform 15 PrimeGrid 28.10.2010 21:47:02 Found app_info.xml; using anonymous platform |
Send message Joined: 2 Oct 09 Posts: 50 Credit: 111,128,218 RAC: 0 |
OK, I have 285696 Nov 8 15:38 cudart.dll 400930 Nov 8 15:48 primegrid_ppsieve_1.30_i686-pc-linux-gnu__cuda23 624 Nov 8 15:52 app_info.xml And get messages like Tessla C2050 (driver version unknown, CUDA version 3010) Found app_info.xml; using anon platform File projects... ppsieve_1.30... .... [error] no url for file transfer primegrid_ppsieve_1.30_windows_intelx86 Happily downloading tasks, but they're just saying "in progress" ... ah, the task reports "Status" as downloading ... no compute time. I did wonder about the cuda23, as these fermi's are cuda3-somethings; and why a linux app would have a .dll (windows device linker?) file. At least I managed to get connected, at last ... -Bruce
|
Send message Joined: 9 Oct 09 Posts: 30 Credit: 26,809,482 RAC: 0 |
Are you running Linux ? if not you need the Windows .exc, best you bring it up in the PG Forum in the Sieving Forum. There's guys there that know a whole lot more than me when it comes to setting up GPU's for running the Sieve Wu's. I had to be walked thru it myself to get a few of mine running the GTX 4xx Cards ... PS: Just started today to try and get my ATI's to run the Wu's, supposedly it can be done as a few have already done it but I'm just getting started on it now so will see how it goes ... |
Send message Joined: 2 Oct 09 Posts: 50 Credit: 111,128,218 RAC: 0 |
Are you running Linux ? Yes. Although I was noticing that the Berkely site lists instructions ("yum ...") I don't understand for fedora7; while we use a ("close") variant "centos" (as do many university sites with heavy computational interests). The linux versions 56, 58 don't seem to run nearly as well as 6.10 used to; and I was hoping to get a recent version more likely to be GPU aware.
Now that would catch some eyeballs. The people looking at this are computing types; while this boinc NFS project adapted an initial version set up by hackers for breaking RSA-keys on calculators (TI...). I'd probably best get the fermi cards back to gnfs polynomials. The current 15e project, G2p1195, uses a polynomial Greg found using their tessla cards. We've been working with the people that wrote the GPU code; and appear to have gotten a marginal improvement on the polynomial for the next project, 5M895. -Bruce |
Send message Joined: 9 Oct 09 Posts: 30 Credit: 26,809,482 RAC: 0 |
If your running Linux the 2 Lines that read in the App File >>> primegrid_ppsieve_1.30_windows_intelx86__cuda23.exe <<< must read as the Linux executable that your using ... |
Send message Joined: 2 Oct 09 Posts: 50 Credit: 111,128,218 RAC: 0 |
If your running Linux the 2 Lines that read in the App File >>> primegrid_ppsieve_1.30_windows_intelx86__cuda23.exe <<< must read as the Linux executable that your using ... That was exciting. I switched these two lines to match the linux binary I downloaded primegrid_ppsieve_1.30_i686-pc-linux-gnu__cuda23 and that ran through several 10's of "computation errors". My primegrid account gives the error message <message> process exited with code 22 (0x16, -234) </message> <stderr_txt> execv: No such file or directory </stderr_txt> Likewise, the boinc Manager message says (repeatedly) starting pps_sr2sieve_... starting task pps_... using pps_sr2sieve version 130 computation for task ... finished output file pps...0_0 for task pps...0 absent I suppose, sooner-or-later, someone should object to this off-topic exchange; even between 2 of the present top3 on NFS@Home, but I wonder that we're still making progress? Looks like the error message just reports that the expected output file is missing? -Bruce |
Send message Joined: 9 Oct 09 Posts: 30 Credit: 26,809,482 RAC: 0 |
and that ran through several 10's of "computation errors". lol ... I get that a lot to trying to get them to run on some Box's, hard to say what it is, for me anyway since I don't know much about Linux other than the Basics. Something is missing fer sure probably but I can't say what. Post at the Project is your best bet ... |
Send message Joined: 5 Sep 09 Posts: 6 Credit: 1,889,316 RAC: 0 |
You're right. This number does have an unusually high failure rate of about 10% on Windows but not on Linux, and it's not the usual memory error. These larger numbers are pushing the boundaries of the siever. The good news is that the failures mostly happen after a few minutes of runtime so not much time is lost and that I have an explicit list of test cases to use in debugging. The bad news is that my plate is full right now, so it will be a few weeks before I can get to it. Thanks for letting me know! Hi Greg and hello everyone. I thought I would also report in. in case my computer (a Dell Precision Work Station) has information you need in your investigations of these computational errors. I'm also getting a lot of Computational errors. I'm not an educated person, but I love Mathematics and numbers, and so I want contribute some spare CPU cycles from my computer to your Project. NFS@Home I Wish you much success with your project NFS@Home Best Wishes Byron Operating System Microsoft Windows XP Professional x86 Edition, Service Pack 3, (05.01.2600.00) BOINC client version 6.10.58 Name S2p997_423152_0 Workunit 8929616 Created 12 Dec 2010 1:50:53 UTC Sent 13 Dec 2010 19:01:43 UTC Received 14 Dec 2010 16:19:44 UTC Server state Over Outcome Client error Client state Compute error Exit status 1 (0x1) Computer ID 13771 Report deadline 17 Dec 2010 7:01:43 UTC Run time 361.046875 CPU time 345.4531 stderr out <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> boinc initialized work files resolved, now working -> projects/escatter11.fullerton.edu_nfs/lasievef_1.09_windows_intelx86.exe -> -r -> -f -> 423152000 -> -c -> 1000 -> -R -> ../../projects/escatter11.fullerton.edu_nfs/S2p997.poly -> -o -> ../../projects/escatter11.fullerton.edu_nfs/S2p997_423152_0_0 No heartbeat from core client for 30 sec - exiting boinc initialized work files resolved, now working -> projects/escatter11.fullerton.edu_nfs/lasievef_1.09_windows_intelx86.exe -> -r -> -f -> 423152000 -> -c -> 1000 -> -R -> ../../projects/escatter11.fullerton.edu_nfs/S2p997.poly -> -o -> ../../projects/escatter11.fullerton.edu_nfs/S2p997_423152_0_0 Allocating 1197 MB. Can't allocate 1197 MB. xmalloc: m called boinc_finish </stderr_txt> ]]> Validate state Invalid Claimed credit 0.73592563277863 Granted credit 0 application version 16e Lattice Sieve v1.09 |
Send message Joined: 2 Oct 09 Posts: 50 Credit: 111,128,218 RAC: 0 |
You're right. This number does have an unusually high failure rate of about 10% on Windows but not on Linux, and it's not the usual memory error. ...it will be a few weeks before I can get to it. Thanks for letting me know! Thanks for the report (if Greg will pardon my presumption) and the wishes. I wasn't able to tell whether the computing errors were bothering you (unlike Paladin/.Steve with his #1 _world_ bonic ranking, who likely doesn't need any extra distractions). I do run lasievef/16e tasks on our large linux workstations; but have the five with lower memory (1Gb/core) working on the smaller lasievee/15e tasks (a total of 20 xeon cores). As I sometimes teach an intro course on (mathematical) cryptography, and have been contributing to applied cryptographic computing since 1995 (starting with the first RSA120); I have more than enough education, much of it since my pure math phd (1976). I'm not entirely sure that I can make a case for snfs projects under 15e, as compared to snfs projects under 16e. But we have two serious gnfs projects next in the 15e queue (found in the "Status of Numbers" link, on the main NFS@Home page). First, the math part of a gnfs project is substantially more elaborate than what is required for snfs ("512-bits in 1993" for snfs -vs- "512-bits in 1999" for gnfs; and then, "768-bits in 2000" for snfs -vs "768-bits in 2010" for gnfs!). Next, the current computing issues are very interesting. In particular, the project definition for these two 180-digit gnfs numbers was found (by Greg) using GPU computing on nvidia/tessla graphics cards. You won't see a difference in the WUs (unless you locate the "gnfs polynomial" used in the project definition), but these results will be at least as interesting as the current 16e snfs. Regards, bdodson (No promises about the _next_ 16e snfs, and the one after that, which are the reason for Greg's push to larger/harder projects; but those may be even worse for low memory computing.) |
Send message Joined: 9 Oct 09 Posts: 30 Credit: 26,809,482 RAC: 0 |
unlike Paladin/.Steve with his #1 _world_ What is it Bruce, the #1 Ranking or just me that you don't like, I've never done anything to this Project but run some Wu's. But I get the feeling you think I've harmed the Project in some way. Just because I'm not attached to the Project 100% like you doesn't mean I don't like the project, it's just that there are 100 other Projects out there to run besides this one. I have to buy all my own equipment & pay for all my own electricity on a Retirees Pension unlike you who can use the "our large Linux workstations" from the University I take it free of charge. So if I choose to run other Projects then that should be my choice and without fear of animosity from some Project that feels I should devote all my time and resources to it instead of spreading the Pharm out some to benefit more than 1 Project ... Still your Friend, .Steve/Paladin/PoorBoy |
Send message Joined: 2 Oct 09 Posts: 50 Credit: 111,128,218 RAC: 0 |
unlike Paladin/.Steve with his #1 _world_ Whoa!! The number 1 Ranking is a fantastic acomplishment; I'm a huge fan. Of the boinc contributors with what appears to be nearly unlimitted resources (top 100, say), you're one of our most interesting contributors. You were one of the first boinc contributors to contact me, and I spend quite a bit of time trying to learn from what you do. You can be somewhat abrasive, but that's not necessarily a bad thing; among friends. I'm 110% certain that I don't dislike you. Puzzled sometimes, perhaps; but in a good way.
One of the things I've learned is the extent to which projects that have apps the make good use of GPUs have an advantage in attracting contributors. I'm guessing, but I wonder whether 95% (or more) of your credit is from GPU contributions? I was just reading that you favor 580's. I'm stunned to hear that you purchase all of the Pharm with your own funds. That's some Retiree Pension, we all should have been so lucky; I'm just about to turn 61, but it looks like I'll need an extra 5-7 years after my age 66 to make-up for our 401k's having become 201k's. And besides, I'm fairly sure that you've been very well informed in deciding how to spend the funds you have available. Uhm, you couldn't choose to direct your GPU resources here; since we don't know how to do number-field-sieving (nfs as in snfs/gnfs, used in the lasieve's) with GPUs. I've been trying to convince anyone I can that finding polynomials is a really good thing; but I'm not sure that I've succeeded at convincing myself --- doesn't appear to be a couple of orders of magnitude of a bump, the way a really killer GPU app should be. So we're discussing the remaining 05% of your resources; and they're for sure yours to allocate as you choose. And, uhmmm, "fear of animosity"? Hmm; I hope not. So, anyway, after a year-or-so of watching your GPU acomplishments, I did in fact convince our HPC committee to purchase two Nvidia cards (actually, that's not my fault; the engineering faculty wanted double prec. floating point, so what we got was the tessla 20s). They seemed fairly happy about the $9K purchase; a large blurb on the main web page. I'm happy to be able to report that I've just today managed to get primegrid's Proth Prime sieve to run; and it's already running credit circles around my NFS@Home credit; looks like 100K credit from a rather large pool of x86_64's by CPU; vs 500K credit from just the two cards -- looks like they're just a tad slower than the 580's. So even I won't be 100% NFS, with just a week of two of Proth sieving. You carry a lot of weight. By right of having earned it. I'll only be one of 100's if I manage to break into the top 1000. So what can I do to make amends for having left the wrong impression in my posts? Would switching teams help? I'm not getting much, if anything, from BonicStats. With warmest regards, your Friend, Bruce/bdodson. |