Midrange News for the IBM i Community


How to Read CSV Data into RPG IV using the CSV Parser Published by: Bob Cozzi on 06 Dec 2011 view comments
© 2011 Robert Cozzi, Jr. All rights reserved.

© Robert Cozzi, Jr. All rights reserved. Reproduction/Redistribution prohibited.
A midrangeNews.com Publication

Parsing CSV Files in RPG IV

Using RPG Open to Parse Comma Separated Values in RPG

This is the season of giving, so I'm giving away another great tool for RPG IV programmers; CSV Parser for RPG. The design for this CSV Parser is that the RPG program reads a CSV file from the IFS, one "record" at a time, and calls the routine to split apart the comma-separated values.

CSV Parser doesn't try to get fancy and map the resulting data into packed, zoned, date, time, or character fields, but rather simply returns the individual fields are array elements. It is then up to the caller of the CSV Parser to copy that data into the database record's fields.

Command Line Alternatives

 The IBM-supplied CPYFRMIMPF does a pretty good job of converting and copying a CSV file to a DB2 database file; I use it all the time. But when I have an interactive program, whether it is a legacy green-screen or a cool browser-based interface, I sometimes need to ability to allow the end-user to:

  • Save an Excel File as CSV
  • Store the CSV on the IFS
  • Load the CSV data "into a program"
Sponsored by: BCD Software
Ad

Obviously loading data "into a program" is a PC design element that we, IBM i developers do not employ. But we can give our end-users the illusion of this capability by automating the import of data from the IFS as best we can. To that end, the CSV Parser can be used to extract data from a CSV file and store it into a database file, a subfile or a web page--whatever you prefer.

Using CSV Parser

There are three steps in using CSV Parser:

  1. Read the CSV data from the IFS
  2. Call the CSV Parser's API
  3. Convert the results from CSV Parser to the target format

The two things the RPG developer has to write theirself that CSV Parser will perform (read the CSV file and convert the parsed data) are relatively simple things to accomplish.

Reading "Records" From the IFS

I don't want to get into a whole IFS terminology/technology/API topic here. There are plenty of articles on this topic already available. But I will cover the basic interfaces that are helpful when using CSV Parser.

Open Stream Files in RPG IV using fopen

There are 2 types of file open routines for IFS files, open() and fopen(). The open() or open64() APIs are really intended for byte-style I/O, where you're reading a single or multiple bytes at a time and not up to a linefeed. The fopen reads up to a linefeed, making it perfect for our application. Here are the IFS APIs we will be using in the example application:

.....D ifsOpen         PR              *   extProc('_C_IFS_fopen')
     D  szFileName                     *   Value Options(*STRING:*TRIM)
     D  ifsMode                        *   Value Options(*STRING)

     D ifsClose        PR            10I 0 extProc('_C_IFS_fclose')
     D  pIfsFile                       *   Value

     D ifsEof          PR            10I 0 extProc('_C_IFS_feof')
     D  pIfsFile                       *   Value

     D ifsGets         PR              *   extProc('_C_IFS_fgets')
     D  inBuffer                       *   Value Options(*STRING)
     D  inBufLen                     10I 0 Value
     D  pIfsFile                       *   Value
   

These are available in several of my libraries, including RPGOPEN and RPG xTools. NOTE: If you are NOT yet on at least v5r3, then the *TRIM option on the IFSOPEN prototype will not compile. Instead, use %TRIMR around the IFS file name when you call the subprocedure.

To open a CSV file named SALES.CSV stored on the IFS in my home directory, the following example code could be used:

           pFile = ifsOpen('/home/cozzi/sales.csv' : 'r'); 
           if (pFile = *NULL);  // Open failed? :(
              return;
           endif;

The second parameter on the fopen is the type of open request. This is a bit easier to understand than those goofy bit-flags on the open64() API. A value of 'r' means "open for read-only".

The next thing we do is enter a DOW loop, reading and then parsing the CSV "records", one at a time. For example:

           dow (ifsReadLine( pLine : %len(szLine) : pFile) <> *NULL);
                nLen = %len(%trimR(%str(pLine)));
                fieldCnt = parseCSV( fields : %len(fields) : szLine : ',');

                pItems=%addr(fields.buffer);
                // The example continues... 

On line 1 the record is read using fgets() an IFS API that reads a string of bytes from the file up to and including the CR/LF symbols, or until the maximum number of bytes were read (parameter 2). I've mapped this API to the IFSREADLINE prototype and am using the "inline" C APIs named _C_IFS_fgets and its prefix _C_IFS_ is added to the beginning of the fgets API name.

Calling the PARSECSV API

The record returned from the fgets API is stored in a variable named szLine. This variable is 4096 bytes in length so if the CSV "record" exceed 4k in length, you'll have issues--issues you can easily correct by fixing the data or increasing the length of the szLine field.

Before calling tthe ParseCSV API, we may calculate the length of the data returned in the szLine field. The %TRIMR and %LEN built-in functions help with that. ParseCSV doesn't care about the length as it uses the CR/LF sequence to terminate its search. Since fgets returns the CR/LF with the data, everything is wonderful.

ParseCSV is called with the szLine data as its 3rd parameter. The first two parameters are the return structure and length parameters. I also pass in a separator symbol (the comma) as the 4th parameter, although generally speaking, this is not necessary. Here's the call to ParseCSV from the previous example:

           fieldCnt = parseCSV( fields : %len(fields) : szLine : ','); 

The CSV parser does all the work, breaking up the comma separated values into individual text strings. Each text string is stored in an array element in the return structure. It is the caller's responsibility to convert the parsed data from a character string to the target format. The other parameter, named FIELDS is defined as a data structure somewhere else in the code. Here's how it and a few other variables would be declared:

.....D fields          DS                  LikeDS(PARSE_CSV_T) Inz
     D szLine          S           4096A
     D item            S            256A   Dim(64) Based(pItems) Varying 

The data structure named FIELDS is based on the PARSE_CSV_T data structure template. That structure is pretty simple, it looks like this:

.....D parse_CSV_T     DS                  Qualified
      /IF DEFINED(*V7R1M0)
     D                                     TEMPLATE
      /ENDIF
     D  bytesReturned                10I 0 inz
     D  fieldCount                   10I 0 inz
     D  Data                      32640A
     D  buffer                    32640A   overlay(data:1)
     D  rtnBuffer                 32640A   overlay(data:1)

To extract the data from the returned buffer of parsed values, I want to map the data portion of the structure to an array. Rather than do overlays and whatnot, I use a simple single-line pointer assignment, as follows:

.....      pItems=%addr(fields.rtnBuffer); 

This allows access the parsed values which are essentially stored in an array of 128 elements, each element is 255 characters in length. This means up to 128 fields per record can be parsed, and each field may be up to 255 characters in length.

Once I assign the address-of the return buffer to the pItems pointer variable, I can use the ITEM array to get to each field's value, and then convert it, as follows.

.....      for i = 1 to fieldCnt;

                rcdCount += 1;

                MONITOR;
                if (i = 1); // Whole number
                   if (item(i)=''); // %INT can't convert blanks
                    custNo = 0;
                else;
                    custNo = %int(item(1));
                endif;
                if (i = 2); // Customer Name
                    cstName = item(i);
                endif;
                if (i = 3); // Sales Figures
                   monitor;
                     item(i) = %scanRPL(',':'':item(i));
                     if (item(i)=''); // %INT can't convert blanks
                        sales = 0;
                     else;
                        sales = %dec(item(i):15:2);
                     endif;
                   on-error;
                     joblog('Warning! Conversion error for Sales in record %s':
                              %char(rcdCount));
                     sales = 0;
                   endMon;
              write CUSTSALES;  // Write out database record or subfile or whatever
         endfor;

Using %INT to convert whole numbers to numeric is a nice feature, likewise the %DEC built-in function is also pretty convenient. A couple of caveats about these two built-in functions:

  • The character data being convert to numeric cannot be all blanks. If it is, the built-in functions blow up.
  • Spaces are ignored in the character data when converting.
  • If the target field of a %DEC is not the same as the built-in's 3rd and 4th parameters, it will be treated normally.

Here is the full source code for ParseCSV. It can also be found in the latest version of RPGOPEN and in January's release of RPG xTools Version 7.

.....H NOMAIN EXTBININT(*YES)
     H OPTION(*SRCSTMT) BNDDIR('QC2LE')
     H COPYRIGHT('(c) 2011 - Bob Cozzi - All rights reserved.')
      ******************************************************************
      ** Parse CSV records into individual fields
      **  Written by Bob Cozzi, Fall 2011
      **  For documentation and more information, see the article on
      **  on MidrangeNews.com (search CSV Parser)
      **----------------------------------------------------------------
      **  (c) Copyright 2011 by R. Cozzi, Jr. - All rights reserved.
      **      Permission to use granted provided this copyright notice
      **      is included in all copies or derivations of this work.
      **----------------------------------------------------------------
      **    field-count = parseCSV( struct : length : csv-data
      **                             [ : separator
      **                               [ :  text-quote-symbol
      **                                 [ : escape-symbol ] ] ] );
      **
      **    returns number of fields detected/parsed
      **    PARAMETERS:
      **       struct      A PARSE_CSV_T data structure (initialized)
      **       length      Length of the data structure on parameter 1
      **       inValue     raw CSV data stream e.g., 123,"Hello World",iSeries
      **       separator   OPTIONAL field separator override DFT(,)
      **       quote       OPTIONAL text quote symbol DFT(")
      **       escape      OPTIONAL escape symbol DFT(\)
      **
      **    STRUCT - The parsed data is returned to the DATA subfield as
      **             an array of 255-character elements. Up to 128 fields
      **             per record are supported.
      **             The bytesReturned subfield contains the amount of
      **             data copied into the DATA subfield.
      **             The FIELDCOUNT receives count of the number of
      **             fields detected and parased from the CSV value.
      **    LENGTH   Length of the structure passed to the parser on
      **             parameter 1. Use %SIZE( PARSE_CSV_T ) if passing a
      **             structure defined LIKEDS(PARSE_CSV_T).
      **    INVALUE  The raw CSV data is specified on this parameter.
      **             One CSV "record" should be passed at a time.
      **             Up to 4096-bytes of CSV data may be passed.
      **             The CSV values are parsed and stored in an array
      **             that is copied to the DATA subfield of parameter 1.
      **    SEPCHAR  The symbol used as the field separator. The default
      **             separator value is the comma.
      **    QUOTE    The symbol used as the text field quotation symbol.
      **             The default is the double-quote symbol.
      **             NOTES: Any field (numeric or character) may be quoted.
      **                    Quotes are only necessary if the field
      **                    contains a separator or quotation symbol.
      **    ESCCHAR  The symbol used as the escape character. When this
      **             symbol is detected, the next character in the CSV
      **             "record" is ignored. The default is the backslash(\)
      ******************************************************************
     D parse_CSV_T     DS                  Qualified Inz
     D  bytesReturned                10I 0
     D  fieldCount                   10I 0
     D  data                      32640A

     D parseCSV        PR            10I 0 extProc('RPGLIB_parseCSV')
     D  rtnBuffer                          LIKEDS(PARSE_CSV_T) OPTIONS(*VARSIZE)
     D  rtnLen                       10I 0 Const
     D  inValue                    4096A   Const Varying
     D  sepChar                       3A   Const Varying OPTIONS(*NOPASS)
     D  quoteChar                     3A   Const Varying OPTIONS(*NOPASS)
     D  escChar                       3A   Const Varying OPTIONS(*NOPASS)

     D userBuffer      DS                  LikeDS(PARSE_CSV_T) Inz
     D item            S            255A   Varying Dim(128)

     D consts          DS                  Qualified
     D  CR                            1A   Inz(X'0D')
     D  LF                            1A   Inz(X'25')
     D  ASCIILF                       1A   Inz(X'0A')
     D  sepChar                       3A   Inz(',') Varying
     D  quotes                        3A   Inz('"') Varying
     D  apos                          3A   Inz('''') Varying
     D  esc                           3A   Inz('\') Varying

      /INCLUDE rpgopen/qcpysrc,cprotos

     P parseCSV        B                   EXPORT
     D parseCSV        PI            10I 0
     D  rtnBuffer                          LIKEDS(PARSE_CSV_T) OPTIONS(*VARSIZE)
     D  rtnLen                       10I 0 Const
     D  inValue                    4096A   Const Varying
     D  sepChar                       3A   Const Varying OPTIONS(*NOPASS)
     D  quoteChar                     3A   Const Varying OPTIONS(*NOPASS)
     D  escChar                       3A   Const Varying OPTIONS(*NOPASS)


     D data            S           4096A   Varying
     D i               S             10I 0
     D byteCount       S             10I 0
     D itemCount       S             10I 0
     D itemLen         S             10I 0
     D len             S             10I 0
     D pos             S             10I 0
     D start           S             10I 0
     D shift           S             10I 0
     D searchChar      S              3A   Varying Inz(',')

      /free
           clear Item;
           clear userBuffer;
           if (rtnLen < %size(userBuffer.bytesReturned) +
                        %size(userBuffer.fieldCount));
              return 0;
           endif;

           if (%Parms() < 3);  // Nothing in? Then nothing out.
              return 0;
           endif;
           if (%Parms() >= 4);  // Separator specified?
              consts.sepChar = sepChar;
           endif;
           if (%Parms() >= 5);  // Text Quote symbol specified?
              consts.quotes = quoteChar;
           endif;
           if (%Parms() >= 6);  // Escape symbol specified?
              consts.esc = escChar;
           endif;

           data = %Trim( inValue );
           len  = %len( data );
           pos = %scan(consts.CR : data);
           if (pos <= 0);  // No CR found?
               pos = %scan(consts.LF : data);
              if (pos <= 0);  // No LF found?
                  pos = %scan(consts.ASCIILF: data);
              endif;
           endif;
           if (pos > 0);
              %len( data ) = pos-1;  // Reduce "this" record's length to linefeed position
           endif;

           start = 1;
           shift = %check(' ': data:start);
           if (shift > 0);
              start = shift;
           endif;
           if (%subst(data:start:1) = consts.Quotes);  // Enclosed in quotes?
               searchChar = consts.Quotes;         // Then look for "next" quote
               start += 1;
           else;
               searchChar = consts.sepChar;        // Otherwise search for comma/sep
           endif;
           pos   = %scan( searchChar : data : start );
           shift = 0;

           dow  (pos >= 0);

              i += 1;
              if ((pos-start) > 0);
                 item(i) = %Trim(%subst(data:start: pos - start));
              elseif (pos = 0);
                 item(i) = %Trim(%subst(data:start));
              else;
                 item(i) = '';
              endif;
                   // Exit loop when we've finished the entire line
              if (pos=0 or pos >= %len( data ) );
                  LEAVE;
              endif;
              if (shift > 0);
                  start = shift + 1;
              else;
                 start = pos + 1;  // Start the scan at next character in string
              endif;

              pos   = 0;
              shift = 0;
              if (start <= %len(data));
                 shift = %check(' ': data:start); // Find the non-blank start position
                 if (shift > start);
                    start = shift;
                 endif;
                 shift = 0;
                 if (start <= %len(data));
                    if (%subst(data:start:1) = consts.Quotes);  // Enclosed in quotes?
                        searchChar = consts.Quotes;         // Then look for "next" quote
                        start += 1;
                    else;
                        searchChar = consts.sepChar;        // Otherwise search for comma/sep
                    endif;
                    if (start <= %len(data));  // Scan for the next sep or quote
                       pos = %scan( searchChar : data : start );
              ////////////////
                       if (pos > 0);  // Found it? check if its a quote in the middle of a string
                          if (searchChar = consts.Quotes);
                             shift = %check(' ' : data : pos + 1);
                             dow (shift > 0 and shift <= %len(data)); // Next char = ???
                                if (%subst(data:shift:%len(consts.sepChar)) =
                                      consts.sepChar); // If separator is next, we're good!
                                   leave;
                                else;  // If the next character is NOT a separator, then
                                   pos = %scan( searchChar : data : shift);
                                   shift = %check(' ' : data : pos +1);
                                endif;
                             enddo;
                          endif;
                       endif;
              ////////////////
                    endif;
                 endif;
              endif;
           enddo;

           itemCount = i;

           if ((%size(item) * itemCount) <= (rtnLen-8));
               byteCount = %size(item) * itemCount; // Enough for everyone?
           else;
               byteCount = rtnLen-8;
           endif;

           if (rtnLen > %size(userBuffer.bytesReturned));
              userBuffer.bytesReturned = byteCount;
              if (rtnLen > %size(userBuffer.bytesReturned) +
                           %size(userBuffer.fieldCount));
              userBuffer.fieldCount = itemCount;
              endif;
           endif;

           if (byteCount > 0);
              memcpy(%addr(userBuffer.data) : %addr(item) : byteCount);
           endif;

           return itemCount;
      /end-free
     P parseCSV        E 

If you find a bug or a better method for parsing CSV, leave a comment below. Thanks!

Call Me

Bob Cozzi is the webmaster for MidrangeNews.com and has been providing the solutions to midrange problems, in the form or articles and books since 1983. He is available for consulting/contract development or on-site RPG IV, SQL, and CGI/Web training. To contact Cozzi, send an email to: bob at rpgworld.com

You can subscribe to RPG Report (we call it "follow") by visiting the RPG Report page on midrangeNews.com and then click the FOLLOW link in the table of contents for that page. To unsubscribe, simply click that same link. You must be signed up and signed in to midrangeNews.com to start following RPG Report.

Follow Bob Cozzi on Twitter

Return to midrangenews.com home page.
Sort Ascend | Descend

COMMENTS