Very long post processing

Author	Message
[AF>Dell>LesDelliens]La ... Send message Joined: 7 Sep 09 Posts: 5 Credit: 37,812 RAC: 0	Message 74 - Posted: 17 Sep 2009, 11:09:20 UTC Hi all! I noticed on front page that post processing might take nearly 20 days for each number! that is more than the time needed to process it (with boinc). Won't we if we continue like this come to a situation where too many numbers have been processed by us but still need to be post processed by your team? or will processing time rise to "balance" this? Thanks, sorry if question is not quite clear (french :D) La frite ID: 74 · Rating: 0 · rate: / Reply Quote

Greg Project administrator Send message Joined: 26 Jun 08 Posts: 645 Credit: 473,545,378 RAC: 249,192	Message 75 - Posted: 17 Sep 2009, 17:14:04 UTC - in response to Message 74. The postprocessing time will always be much greater than the time required for the community to sieve the number. This will get worse, not better, as we move to larger numbers. However, I have access to or can readily recruit the resources to do the postprocessing on 20-25 numbers at once. So as long as I can keep the (postprocessing time)/(sieving time) < 20, there will be no problem. With the current targets and parameters, we are at about 19/3.5 = 5.4. As the project grows and that ratio approaches 20, I can tweak the parameters to make sieving a bit harder but the postprocessing easier. ID: 75 · Rating: 0 · rate: / Reply Quote

verstapp Send message Joined: 23 Sep 09 Posts: 3 Credit: 1,734,906 RAC: 0	Message 84 - Posted: 24 Sep 2009, 8:42:03 UTC - in response to Message 75. Or you could try subcontracting [boinc] the postprocessing too. ID: 84 · Rating: 0 · rate: / Reply Quote

Greg Project administrator Send message Joined: 26 Jun 08 Posts: 645 Credit: 473,545,378 RAC: 249,192	Message 87 - Posted: 24 Sep 2009, 18:19:02 UTC - in response to Message 84. I wish I could. Unfortunately, the post-processing involves solving a large sparse matrix, which requires very high-bandwidth, low-latency communication between nodes. In fact, on a single computer this part of the computation scales more with memory speed than CPU speed. (Core i7's with DDR3 are great at it!) This is exactly the type of problem that BOINC can't do. ID: 87 · Rating: 0 · rate: / Reply Quote

Bigred Send message Joined: 14 Sep 09 Posts: 1 Credit: 1,001,793 RAC: 0	Message 88 - Posted: 25 Sep 2009, 10:55:17 UTC What about doing the post processing using a GPU application? It could be either Cuda(Nvidia) or Cal(ATI). Even better would be an application for both. ID: 88 · Rating: 0 · rate: / Reply Quote

Greg Project administrator Send message Joined: 26 Jun 08 Posts: 645 Credit: 473,545,378 RAC: 249,192	Message 90 - Posted: 25 Sep 2009, 19:53:56 UTC - in response to Message 88. Memory requirements are too high. It requires at least 5 gigabytes of memory. No GPUs have that much memory at the moment, and transfers to/from the host to the GPU kill the speed of the application. Post-processing isn't feasible for BOINC, though, because the calculation requires the complete matrix, which is a few hundred megabytes in size. ID: 90 · Rating: 0 · rate: / Reply Quote

Incognito II Send message Joined: 19 Nov 09 Posts: 4 Credit: 4,687,684 RAC: 0	Message 248 - Posted: 26 Nov 2009, 16:55:19 UTC Greg, who is doing most of the post processing? What type of machines/cluster are being used? Just curious. ID: 248 · Rating: 0 · rate: / Reply Quote

bdodson* Send message Joined: 2 Oct 09 Posts: 50 Credit: 111,128,218 RAC: 0	Message 249 - Posted: 26 Nov 2009, 17:21:32 UTC - in response to Message 248. Greg, who is doing most of the post processing? What type of machines/cluster are being used? Just curious. Greg's always done most of the post processing. Only the intensive sparse matrix calculation has been farmed out on recent numbers. One of my other friends reports having done one of the recent November matrices, and waiting for R269 (a number of larger "difficulty", with a matrix that will take longer). I'd be interested to hear more, as well, but am not sure how soon Greg will get back online with the local holiday(s). -Bruce ID: 249 · Rating: 0 · rate: / Reply Quote

Incognito II Send message Joined: 19 Nov 09 Posts: 4 Credit: 4,687,684 RAC: 0	Message 250 - Posted: 26 Nov 2009, 17:50:23 UTC Thanks for the reply Bruce. I was just curious, I ran the old NFSNet projct on a number of machines way back years ago. IIRC, The project leaders at the time were a bit more informative with the details of what was happening in the background/post processing. Actually, as long as you have been working on the Cunningham Tables and with the hardware you have available, I'm surprized you are not helping in the post processing, then again maybe you are doing some of your own? ID: 250 · Rating: 0 · rate: / Reply Quote

bdodson* Send message Joined: 2 Oct 09 Posts: 50 Credit: 111,128,218 RAC: 0	Message 251 - Posted: 26 Nov 2009, 18:32:29 UTC - in response to Message 250. ..., I ran the old NFSNet projct on a number of machines way back years ago. IIRC, The project leaders at the time were a bit more informative with the details of what was happening in the background/post processing. Good to hear; was that back when they had stats, and automated task distribution? Once the stats went down, most of the sieving was either me here or Greg. In either case, a lot smaller group, with different interest/tolerance in hearing the details. Also, despite the huge progress in Wanted numbers, NFS@Home is still quite new. I'm not sure that Greg's set a firm protocal for who's available for that one intensive step, the matrix computation. Still a work-in-progress. Actually, as long as you have been working on the Cunningham Tables and with the hardware you have available, I'm surprized you are not helping in the post processing, then again maybe you are doing some of your own? I'm usually only able to run matrices on our newest clusters, often with best results before they're quite open to our users. I ran a bunch on our old compute server with Greg (the one still listed with 32 cores). Not sure how long the new Xeons will stay useful for matrix work; I've been running smaller projects with Batalov. Almost all of our hardware is exclusively run under a UWisc scheduler called condor; no user logins or job submission. Something in the range of 200+ linux x86-64s, which I use for nfs sieving projects (most recently M941, about half of that computation). Then a pc/grid of windows machines mostly and some 32-bit linux on which I run ecm. The volumn and quality of the NFS@Home factorizations seems to me to represent a new era for Cunningham numbers, for all but the most exclusive projects using .com or .gov (or both) resources. Those would include the two record computations, M1039 for snfs and RSA200 for gnfs; still somewhat past our present range. -Bruce ID: 251 · Rating: 0 · rate: / Reply Quote

Incognito II Send message Joined: 19 Nov 09 Posts: 4 Credit: 4,687,684 RAC: 0	Message 252 - Posted: 26 Nov 2009, 19:34:07 UTC Yes, it was back when NFSNet had the automated task system setup. Back when Intel P3's and Athlon T-Birds were top of the line, maybe a few P4's in the mix? Honestly I'm surprized at the number of people running NFS now, but I guess that is BOINC's appeal, you set it up and they will come! :) Anyway, I'm glad to see it doing so well, I just had to give it a shot, at least for a while. ID: 252 · Rating: 0 · rate: / Reply Quote

Greg Project administrator Send message Joined: 26 Jun 08 Posts: 645 Credit: 473,545,378 RAC: 249,192	Message 253 - Posted: 26 Nov 2009, 20:47:42 UTC Sure, it's not secret. :-) Once most of the workunit results have come in (I typically don't wait for the 0.2% of relations at the end of the "long tail"), I transfer them from the BOINC server to our large memory (32-core 64 GB) computer for filtering. I then use msieve to do an in-memory filtering run. This usually produces a somewhat better matrix than using disk-based passes on a smaller memory computer. It is then ready for linear algebra (LA). LA requires at least 8GB of memory. The speed of msieve's linear algebra is bound by main memory bandwidth, so Intel Core 2 and especially Core i7 processors are perfect. I currently have access to five Core 2 Quad's with sufficient memory (this should be 11, but I'm still waiting on a memory upgrade). I run as many locally as I can just to avoid the off-campus transfers, but I also keep a list of kind people who have volunteered to run a 3-5 week LA calculation for no BOINC credit. If I don't have room for a number locally, I contact someone from the list a few days in advance to see if their computer is free. Depending on their wishes and transfer bandwidth, I then transfer the entire data set (10-25 GB typically) or just the matrix (3-4 GB typically) to them. If they have the entire data set, they can then perform both the LA and square roots, and report the factors back to me. If they have only the matrix, they send the solutions (100-200 MB typically) back to me, and I run the square roots locally. Finally, I report the factorization both here and at MersenneForum (I'm frmky there), and send an email to Sam Wagstaff. Not too complicated, really. It just involves transferring a lot of data around. I'm planning to get a student involved this spring, but for now I'm doing it all myself and I've found that it doesn't take much time. And I'm enjoying it, which helps! :-) ID: 253 · Rating: 0 · rate: / Reply Quote

Gigacruncher [TSBTs Pirate] Volunteer moderator Send message Joined: 26 Sep 09 Posts: 218 Credit: 22,841,893 RAC: 2	Message 254 - Posted: 26 Nov 2009, 23:53:37 UTC Last modified: 26 Nov 2009, 23:55:50 UTC Incognito, I ran the post-processing of 12,233- for Greg on a quad-core with 6 GB. It was very iterative until we manage to have a decent matrix to run on it. I downloaded something like 40 GB of data until the final files!!! As Greg said you must have at least 8 GB of free memory or even more if your machine isn't dedicated to crunching. I had that problem because you can feel the machine slower when you have a client using 5 GB on a 6 GB machine even with 20 GB of virtual memory. Until I get a new machine or add more memory I can't help Greg with the actual size of the numbers so if you have a fast machine with lots of free memory please consider helping Greg, he deserves. Carlos ID: 254 · Rating: 0 · rate: / Reply Quote

Incognito II Send message Joined: 19 Nov 09 Posts: 4 Credit: 4,687,684 RAC: 0	Message 255 - Posted: 27 Nov 2009, 1:13:56 UTC Greg, thanks for the reply .. very informative and interesting! Carlos, I do have a couple of I7's with 6GB RAM and a couple of Phenom II's with 8GB RAM. The problem for me would be the hugh file transfers, 40gb's a one time and my ISP would shut me off. :( ID: 255 · Rating: 0 · rate: / Reply Quote

bdodson* Send message Joined: 2 Oct 09 Posts: 50 Credit: 111,128,218 RAC: 0	Message 257 - Posted: 3 Dec 2009, 16:48:21 UTC - in response to Message 251. Actually, as long as you have been working on the Cunningham Tables and with the hardware you have available, I'm surprized you are not helping in the post processing, then again maybe you are doing some of your own? ...Almost all of our hardware is exclusively run under a UWisc scheduler called condor; no user logins or job submission. Something in the range of 200+ linux x86-64s, which I use for nfs sieving projects (most recently M941, about half of that computation). ... -Bruce The large number sieved here (entirely) before M941 has just been completed by Greg as c274 = p62p100p113. This is a new "Champion" Cunningham factorization, second place: Special number field sieve by SNFS difficulty: 5501 c307 2,1039- K.Aoki+J.Franke+T.Kleinjung+A.K.Lenstra+D.A.Osvik 5787 c274 5,398+ G.Childers+B.Dodson 5739 c228 12,256+ T.Womack+B.Dodson At 280-digits, M941 will take over second place when the matrix step finishes, about six weeks from now. -Bruce ID: 257 · Rating: 0 · rate: / Reply Quote

Chris S Send message Joined: 1 Dec 09 Posts: 2 Credit: 4,064 RAC: 0	Message 258 - Posted: 3 Dec 2009, 17:46:09 UTC Hi! Any possibility of optimised apps here in the future? ID: 258 · Rating: 0 · rate: / Reply Quote

Greg Project administrator Send message Joined: 26 Jun 08 Posts: 645 Credit: 473,545,378 RAC: 249,192	Message 259 - Posted: 3 Dec 2009, 20:00:04 UTC - in response to Message 258. Last modified: 3 Dec 2009, 20:03:36 UTC The current apps are optimized apps. They are based on the mature gnfs-lasieve code (32-bit and 64-bit) written by Jens Franke and Thorsten Kleinjung, and all include assembly optimizations. ID: 259 · Rating: 0 · rate: / Reply Quote

Chris S Send message Joined: 1 Dec 09 Posts: 2 Credit: 4,064 RAC: 0	Message 260 - Posted: 3 Dec 2009, 20:14:17 UTC Thanks for the reply Greg, I hadn't realised that. :-) ID: 260 · Rating: 0 · rate: / Reply Quote