Using IBM i? Need to create Excel, CSV, HTML, JSON, PDF, SPOOL reports? Learn more about the fastest and least expensive tool for the job: SQL iQuery.
This is the season of giving, so I'm giving away another great tool for RPG IV programmers; CSV Parser for RPG. The design for this CSV Parser is that the RPG program reads a CSV file from the IFS, one "record" at a time, and calls the routine to split apart the comma-separated values.
CSV Parser doesn't try to get fancy and map the resulting data into packed, zoned, date, time, or character fields, but rather simply returns the individual fields are array elements. It is then up to the caller of the CSV Parser to copy that data into the database record's fields.
The IBM-supplied CPYFRMIMPF does a pretty good job of converting and copying a CSV file to a DB2 database file; I use it all the time. But when I have an interactive program, whether it is a legacy green-screen or a cool browser-based interface, I sometimes need to ability to allow the end-user to:
Obviously loading data "into a program" is a PC design element that we, IBM i developers do not employ. But we can give our end-users the illusion of this capability by automating the import of data from the IFS as best we can. To that end, the CSV Parser can be used to extract data from a CSV file and store it into a database file, a subfile or a web page--whatever you prefer.
There are three steps in using CSV Parser:
The two things the RPG developer has to write theirself that CSV Parser will perform (read the CSV file and convert the parsed data) are relatively simple things to accomplish.
I don't want to get into a whole IFS terminology/technology/API topic here. There are plenty of articles on this topic already available. But I will cover the basic interfaces that are helpful when using CSV Parser.
There are 2 types of file open routines for IFS files, open() and fopen(). The open() or open64() APIs are really intended for byte-style I/O, where you're reading a single or multiple bytes at a time and not up to a linefeed. The fopen reads up to a linefeed, making it perfect for our application. Here are the IFS APIs we will be using in the example application:
.....D ifsOpen PR * extProc('_C_IFS_fopen') D szFileName * Value Options(*STRING:*TRIM) D ifsMode * Value Options(*STRING) D ifsClose PR 10I 0 extProc('_C_IFS_fclose') D pIfsFile * Value D ifsEof PR 10I 0 extProc('_C_IFS_feof') D pIfsFile * Value D ifsGets PR * extProc('_C_IFS_fgets') D inBuffer * Value Options(*STRING) D inBufLen 10I 0 Value D pIfsFile * Value
These are available in several of my libraries, including RPGOPEN and RPG xTools. NOTE: If you are NOT yet on at least v5r3, then the *TRIM option on the IFSOPEN prototype will not compile. Instead, use %TRIMR around the IFS file name when you call the subprocedure.
To open a CSV file named SALES.CSV stored on the IFS in my home directory, the following example code could be used:
pFile = ifsOpen('/home/cozzi/sales.csv' : 'r'); if (pFile = *NULL); // Open failed? :( return; endif;
The second parameter on the fopen is the type of open request. This is a bit easier to understand than those goofy bit-flags on the open64() API. A value of 'r' means "open for read-only".
The next thing we do is enter a DOW loop, reading and then parsing the CSV "records", one at a time. For example:
dow (ifsReadLine( pLine : %len(szLine) : pFile) <> *NULL); nLen = %len(%trimR(%str(pLine))); fieldCnt = parseCSV( fields : %len(fields) : szLine : ','); pItems=%addr(fields.buffer); // The example continues...
On line 1 the record is read using fgets() an IFS API that reads a string of bytes from the file up to and including the CR/LF symbols, or until the maximum number of bytes were read (parameter 2). I've mapped this API to the IFSREADLINE prototype and am using the "inline" C APIs named _C_IFS_fgets and its prefix _C_IFS_ is added to the beginning of the fgets API name.
The record returned from the fgets API is stored in a variable named szLine. This variable is 4096 bytes in length so if the CSV "record" exceed 4k in length, you'll have issues--issues you can easily correct by fixing the data or increasing the length of the szLine field.
Before calling tthe ParseCSV API, we may calculate the length of the data returned in the szLine field. The %TRIMR and %LEN built-in functions help with that. ParseCSV doesn't care about the length as it uses the CR/LF sequence to terminate its search. Since fgets returns the CR/LF with the data, everything is wonderful.
ParseCSV is called with the szLine data as its 3rd parameter. The first two parameters are the return structure and length parameters. I also pass in a separator symbol (the comma) as the 4th parameter, although generally speaking, this is not necessary. Here's the call to ParseCSV from the previous example:
fieldCnt = parseCSV( fields : %len(fields) : szLine : ',');
The CSV parser does all the work, breaking up the comma separated values into individual text strings. Each text string is stored in an array element in the return structure. It is the caller's responsibility to convert the parsed data from a character string to the target format. The other parameter, named FIELDS is defined as a data structure somewhere else in the code. Here's how it and a few other variables would be declared:
.....D fields DS LikeDS(PARSE_CSV_T) Inz D szLine S 4096A D item S 256A Dim(64) Based(pItems) Varying
The data structure named FIELDS is based on the PARSE_CSV_T data structure template. That structure is pretty simple, it looks like this:
.....D parse_CSV_T DS Qualified /IF DEFINED(*V7R1M0) D TEMPLATE /ENDIF D bytesReturned 10I 0 inz D fieldCount 10I 0 inz D Data 32640A D buffer 32640A overlay(data:1) D rtnBuffer 32640A overlay(data:1)
To extract the data from the returned buffer of parsed values, I want to map the data portion of the structure to an array. Rather than do overlays and whatnot, I use a simple single-line pointer assignment, as follows:
..... pItems=%addr(fields.rtnBuffer);
This allows access the parsed values which are essentially stored in an array of 128 elements, each element is 255 characters in length. This means up to 128 fields per record can be parsed, and each field may be up to 255 characters in length.
Once I assign the address-of the return buffer to the pItems pointer variable, I can use the ITEM array to get to each field's value, and then convert it, as follows.
..... for i = 1 to fieldCnt; rcdCount += 1; MONITOR; if (i = 1); // Whole number if (item(i)=''); // %INT can't convert blanks custNo = 0; else; custNo = %int(item(1)); endif; if (i = 2); // Customer Name cstName = item(i); endif; if (i = 3); // Sales Figures monitor; item(i) = %scanRPL(',':'':item(i)); if (item(i)=''); // %INT can't convert blanks sales = 0; else; sales = %dec(item(i):15:2); endif; on-error; joblog('Warning! Conversion error for Sales in record %s': %char(rcdCount)); sales = 0; endMon; write CUSTSALES; // Write out database record or subfile or whatever endfor;
Using %INT to convert whole numbers to numeric is a nice feature, likewise the %DEC built-in function is also pretty convenient. A couple of caveats about these two built-in functions:
Here is the full source code for ParseCSV. It can also be found in the latest version of RPGOPEN and in January's release of RPG xTools Version 7.
.....H NOMAIN EXTBININT(*YES) H OPTION(*SRCSTMT) BNDDIR('QC2LE') H COPYRIGHT('(c) 2011 - Bob Cozzi - All rights reserved.') ****************************************************************** ** Parse CSV records into individual fields ** Written by Bob Cozzi, Fall 2011 ** For documentation and more information, see the article on ** on MidrangeNews.com (search CSV Parser) **---------------------------------------------------------------- ** (c) Copyright 2011 by R. Cozzi, Jr. - All rights reserved. ** Permission to use granted provided this copyright notice ** is included in all copies or derivations of this work. **---------------------------------------------------------------- ** field-count = parseCSV( struct : length : csv-data ** [ : separator ** [ : text-quote-symbol ** [ : escape-symbol ] ] ] ); ** ** returns number of fields detected/parsed ** PARAMETERS: ** struct A PARSE_CSV_T data structure (initialized) ** length Length of the data structure on parameter 1 ** inValue raw CSV data stream e.g., 123,"Hello World",iSeries ** separator OPTIONAL field separator override DFT(,) ** quote OPTIONAL text quote symbol DFT(") ** escape OPTIONAL escape symbol DFT(\) ** ** STRUCT - The parsed data is returned to the DATA subfield as ** an array of 255-character elements. Up to 128 fields ** per record are supported. ** The bytesReturned subfield contains the amount of ** data copied into the DATA subfield. ** The FIELDCOUNT receives count of the number of ** fields detected and parased from the CSV value. ** LENGTH Length of the structure passed to the parser on ** parameter 1. Use %SIZE( PARSE_CSV_T ) if passing a ** structure defined LIKEDS(PARSE_CSV_T). ** INVALUE The raw CSV data is specified on this parameter. ** One CSV "record" should be passed at a time. ** Up to 4096-bytes of CSV data may be passed. ** The CSV values are parsed and stored in an array ** that is copied to the DATA subfield of parameter 1. ** SEPCHAR The symbol used as the field separator. The default ** separator value is the comma. ** QUOTE The symbol used as the text field quotation symbol. ** The default is the double-quote symbol. ** NOTES: Any field (numeric or character) may be quoted. ** Quotes are only necessary if the field ** contains a separator or quotation symbol. ** ESCCHAR The symbol used as the escape character. When this ** symbol is detected, the next character in the CSV ** "record" is ignored. The default is the backslash(\) ****************************************************************** D parse_CSV_T DS Qualified Inz D bytesReturned 10I 0 D fieldCount 10I 0 D data 32640A D parseCSV PR 10I 0 extProc('RPGLIB_parseCSV') D rtnBuffer LIKEDS(PARSE_CSV_T) OPTIONS(*VARSIZE) D rtnLen 10I 0 Const D inValue 4096A Const Varying D sepChar 3A Const Varying OPTIONS(*NOPASS) D quoteChar 3A Const Varying OPTIONS(*NOPASS) D escChar 3A Const Varying OPTIONS(*NOPASS) D userBuffer DS LikeDS(PARSE_CSV_T) Inz D item S 255A Varying Dim(128) D consts DS Qualified D CR 1A Inz(X'0D') D LF 1A Inz(X'25') D ASCIILF 1A Inz(X'0A') D sepChar 3A Inz(',') Varying D quotes 3A Inz('"') Varying D apos 3A Inz('''') Varying D esc 3A Inz('\') Varying /INCLUDE rpgopen/qcpysrc,cprotos P parseCSV B EXPORT D parseCSV PI 10I 0 D rtnBuffer LIKEDS(PARSE_CSV_T) OPTIONS(*VARSIZE) D rtnLen 10I 0 Const D inValue 4096A Const Varying D sepChar 3A Const Varying OPTIONS(*NOPASS) D quoteChar 3A Const Varying OPTIONS(*NOPASS) D escChar 3A Const Varying OPTIONS(*NOPASS) D data S 4096A Varying D i S 10I 0 D byteCount S 10I 0 D itemCount S 10I 0 D itemLen S 10I 0 D len S 10I 0 D pos S 10I 0 D start S 10I 0 D shift S 10I 0 D searchChar S 3A Varying Inz(',') /free clear Item; clear userBuffer; if (rtnLen < %size(userBuffer.bytesReturned) + %size(userBuffer.fieldCount)); return 0; endif; if (%Parms() < 3); // Nothing in? Then nothing out. return 0; endif; if (%Parms() >= 4); // Separator specified? consts.sepChar = sepChar; endif; if (%Parms() >= 5); // Text Quote symbol specified? consts.quotes = quoteChar; endif; if (%Parms() >= 6); // Escape symbol specified? consts.esc = escChar; endif; data = %Trim( inValue ); len = %len( data ); pos = %scan(consts.CR : data); if (pos <= 0); // No CR found? pos = %scan(consts.LF : data); if (pos <= 0); // No LF found? pos = %scan(consts.ASCIILF: data); endif; endif; if (pos > 0); %len( data ) = pos-1; // Reduce "this" record's length to linefeed position endif; start = 1; shift = %check(' ': data:start); if (shift > 0); start = shift; endif; if (%subst(data:start:1) = consts.Quotes); // Enclosed in quotes? searchChar = consts.Quotes; // Then look for "next" quote start += 1; else; searchChar = consts.sepChar; // Otherwise search for comma/sep endif; pos = %scan( searchChar : data : start ); shift = 0; dow (pos >= 0); i += 1; if ((pos-start) > 0); item(i) = %Trim(%subst(data:start: pos - start)); elseif (pos = 0); item(i) = %Trim(%subst(data:start)); else; item(i) = ''; endif; // Exit loop when we've finished the entire line if (pos=0 or pos >= %len( data ) ); LEAVE; endif; if (shift > 0); start = shift + 1; else; start = pos + 1; // Start the scan at next character in string endif; pos = 0; shift = 0; if (start <= %len(data)); shift = %check(' ': data:start); // Find the non-blank start position if (shift > start); start = shift; endif; shift = 0; if (start <= %len(data)); if (%subst(data:start:1) = consts.Quotes); // Enclosed in quotes? searchChar = consts.Quotes; // Then look for "next" quote start += 1; else; searchChar = consts.sepChar; // Otherwise search for comma/sep endif; if (start <= %len(data)); // Scan for the next sep or quote pos = %scan( searchChar : data : start ); //////////////// if (pos > 0); // Found it? check if its a quote in the middle of a string if (searchChar = consts.Quotes); shift = %check(' ' : data : pos + 1); dow (shift > 0 and shift <= %len(data)); // Next char = ??? if (%subst(data:shift:%len(consts.sepChar)) = consts.sepChar); // If separator is next, we're good! leave; else; // If the next character is NOT a separator, then pos = %scan( searchChar : data : shift); shift = %check(' ' : data : pos +1); endif; enddo; endif; endif; //////////////// endif; endif; endif; enddo; itemCount = i; if ((%size(item) * itemCount) <= (rtnLen-8)); byteCount = %size(item) * itemCount; // Enough for everyone? else; byteCount = rtnLen-8; endif; if (rtnLen > %size(userBuffer.bytesReturned)); userBuffer.bytesReturned = byteCount; if (rtnLen > %size(userBuffer.bytesReturned) + %size(userBuffer.fieldCount)); userBuffer.fieldCount = itemCount; endif; endif; if (byteCount > 0); memcpy(%addr(userBuffer.data) : %addr(item) : byteCount); endif; return itemCount; /end-free P parseCSV E
If you find a bug or a better method for parsing CSV, leave a comment below. Thanks!
Bob Cozzi is the webmaster for MidrangeNews.com and has been providing the solutions to midrange problems, in the form or articles and books since 1983. He is available for consulting/contract development or on-site RPG IV, SQL, and CGI/Web training. To contact Cozzi, send an email to: bob at rpgworld.com
You can subscribe to RPG Report (we call it "follow") by visiting the RPG Report page on midrangeNews.com and then click the FOLLOW link in the table of contents for that page. To unsubscribe, simply click that same link. You must be signed up and signed in to midrangeNews.com to start following RPG Report.