Using IBM i? Need to create Excel, CSV, HTML, JSON, PDF, SPOOL reports? Learn more about the fastest and least expensive tool for the job: SQL iQuery.
Good morning. I have a situation where we need to process a large number of rcds in a short period of time and I'm wondering if anyone might have been in this same situation and came up with a good solution. The program is simple, but it is currently processing approx 16M rcds. Here's a brief look at what the program needs to do.
READ FILEA
DOW NOT %EOF
SETGT <On key fields> FILEB
READPE FILEB <-- The reason this is needed is because the key includes year and week and all may not exist, so we need to grab the latest available
IF NOT %EOF
CHAIN <key> FILEC
IF %FOUND
ADD FIELDS TO TOTALS
UPDATE FILEC
ELSE
MOVE FIELDS AND TOTALS TO RCD
WRITE FILEC
ENDIF
ENDIF
READ FILEA
ENDDO
Any suggestions would be greatly appreciated. I was actually trying to think of ways to possibly break it down into multiple jobs being submitted, or something like that. I'd be interested to hear any other ideas. Thanks, all!!
Two generic concepts apply to all I/O optimisation:
1) Reduce the number of rows
2) Reduce the number of columns
One can reduce the number of rows in several ways:
1) Break the batch into pieces and run them in parallel
2) Process the rows at a different time. Instead of waiting to process 16M rows, process them one at a time, when they are written into that 16M row table, for instance. Triggers might help here.
3) Use a logical file that omits unnecessary (deleted / inactive / last year) rows.
Reducing the number of columns can be pretty easy. Imagine this 16M row file is a sales history file, and that it has customer, item, de!--script--ion, weight, serial number, quantity, price, extension, date sold. Now imagine that all you care about is totals by customer/item number. Every CHAIN brings in all the columns; only 3 are needed. Say the record length is 100 bytes. For 16M rows, you are moving 1.6B bytes of data from disk to your program. If you had a logical file which only contained cust, item, qty then a CHAIN to that file would need to move way fewer bytes - say 15 - per record. Now, moving through 16M rows would only require the transfer of 240M bytes, which is quite a reduction, and without a single change to the logic in your program.
Figure out the SQL statement that will do what you want and use it. It'll finish faster than your RPG program.
Also if SQL is not an option, then look at the OVRDBF command and SEQONLY parameter. Make that as large as possible SEQONLY(*YES 50000) There is also another NBRRCDS parameter somewhere in there, that can be useful.
When you use READE/READPE, the compiler disallows record blocking on that table. If you use SETLL/SETGT/CHAIN/READ/READP and then check for key breaks yourself (IF stmts), it will run tons faster. And you need to add BLOCK(*YES) to those F-Specs. Run the job, dispay the job and you will see the I/O count go down dramatically if you did it right.
Chris Ringer