Midrange News for the IBM i Community


ASCII to EBCDIC Conversion in RPG IV Published by: Bob Cozzi on 07 Jun 2011 view comments
© 2011 Robert Cozzi, Jr. All rights reserved.

© Robert Cozzi, Jr. All rights reserved. Reproduction/Redistribution prohibited.
A midrangeNews.com Publication

ASCII to EBCDIC Conversion in RPG IV

There are so many wrong ways to convert data on System i that I wouldn't know where to begin. The primary flaw is, as usual, in legacy habits that utilize interfaces built for System/38. This interfaces, such as QDCXLATE or the alternative QTBXLATE were built to provide a way to read ASCII data from a tape or 8-inch Diskette. They were never intended to be used for today's pervasive ASCII->EBCDIC->ASCII processes.

At a basic level, RPG already handles ASCII to EBCDIC via the %UCS2 built-in function. In addition, if you're on the latest releases of IBM i operating system, RPG IV supports conversion from UTF-16 to your job's CCSID (probably EBCDIC) automatically.  Just move one field to the other and it works. But I still can't comprehend why we don't have %TOASCII or %TOEBCDIC built-functions in RPG today.

Sponsored by: BCD Catapult

You Convert, I Convert

There is a native API included with IBM i and most other operating systems that provides conversion between CCSIDs. That API is iconv and dynamically builds a conversion table if one does not exist, and converts from any CCSID to any CCSID. Unlike the passé QDCXLATE, iconv is more dynamic and its conversion is up to date. For example, if you receive an XML document coded in UTF-8, QDCXLATE will often convert it correctly. Note that I said "often" and did not say "always". Whereas the iconv API always converts UTF-8 correctly.

iconv API Family

The iconv API converts data from one CCSID to another CCSID. The converted data is stored in a second parameter, that is the data is implicitly converted in-place. The iconv API is complex and works different from the way RPG programmers are use to. For example, the input data is a pointer that is incremented as each character is converted--moving the pointer to the next character to be converted. The process is repeated until the entire string is converted. The output parameter is also a pointer where the converted data is stored. It too is incremented each time the converted data is stored in the output buffer. So when the conversion is complete, the output pointer points after the output data. Odd, right? Here's the prototype in RPG IV for iconv:

.....D iconv           PR            10U 0 extProc('iconv')                                         
     D  hConv                              LikeDS(iconv_t) Value                                    
     D  pInput                         *                                                            
     D  nInLen                       10U 0                                                          
     D  pOutput                        *                                                            
     D  nOutLen                      10U 0                             
 

It is very important to note that the hConv parameter (parameter 1) is passed by value. The VALUE keyword must be included on that parameter or you'll have a learning experience that could last for hours or in some cases, days.

The API or function needed to open an iconv "session" is either iconv_open or the IBM i native QtqIconvOpen API. The iconv_open seems easier to use on the surface, and it is cross-platform, but it's parameters are "implementation defined" meaning that while the interface itself is portable, the actual parameters vary from OS to OS. I prefer to use the QtqIconvOpen API instead of iconv_open, but either one will work.

The prototype for iconv_open is as follows:

.....D iconv_open      PR                  extProc('iconv_open') LikeDS(iconv_t)                                   
     D  toCode                             LikeDS(toCVTCCSID_T)                                     
     D  fromCode                           LikeDS(fromCVTCCSID_T)

The data structures TOCVTCCSID_T and FROMCVTCCSID_T are included in the downloadable code, but are not necessary for this week's example or the CVTCHAR function to work, so they are omitted from this article.

The prototype for QtqIconvOpen is as follows:

.....D QtqIconvOpen    PR                  extProc('QtqIconvOpen') LikeDS(iconv_t)
     D  toCCSID                            LikeDS(QtqCode_T)
     D  fromCCSID                          LikeDS(QtqCode_T)

The both support the iconv_t structure as a return value. It is returned to the caller if the conversion table is successfully open. The programmer never changes anything in this structure--it acts as a handle that is passed from the open, to the conversion and then to the close routines. The reason we have an open, convert and close, is so that we can open a conversion table, and then use it to convert lots of data. Then closing the handle when we're finished. This would apply if you wanted to convert a large number of transactions or perhaps a large amount of database records for direct storage on the IFS. Calling iconv directly rather than through the cvtChar subprocedure when repeated conversions are needed would be faster. Here is the iconv_t structure:

.....D iconv_t         DS                  Qualified Inz                                            
     D  rtn_value                    10I 0                                                          
     D  cd                           10I 0 Dim(12)

Again, you declare a data structure like this data structure template, and use it as the target of an iconv_open or QtqIconvOpen API and then pass it to iconv itself and eventually to iconv_close.

The QtqCode_T data structure contains several fields that, among other things, identifies the conversion CCSIDs. That data structure looks like this:

.....D QtqCode_T       DS                  Qualified Inz                                               
     D  QTQCCSID                     10I 0                                                       
     D  QTQCA                        10I 0                                                       
     D  QTQSA                        10I 0                                                    
     D  QTQSA00                      10I 0                                                      
     D  QTQLO                        10I 0                                                       
     D  QTQMEO                       10I 0                                                       
     D  QTQERVED02                    8A   Inz(*ALLX'00')                                           
     D  ccsid                        10I 0 Overlay(qtqccsid)                                        
     D  cnv_alternative...                                                                          
     D                               10i 0 Overlay(QTQCA)                                           
     D  subs_alternative...                                                                         
     D                               10i 0 Overlay(QTQSA)                                           
     D  shift_alternative...                                                                        
     D                               10i 0 Overlay(QTQSA00)                                         
     D  length_option...                                                                            
     D                               10i 0 Overlay(QTQLO)                                           
     D  mx_error_option...                                                                          
     D                               10i 0 Overlay(QTQMEO)                                          
     D  reserved                      8A   Overlay(QTQERVED02)

The first group of subfields are the IBM provided names. The second group are the names I've added and match that of the structures in the C language. Since I used the OVERLAY keyword and a Qualified data structure either set of names will work.

The only subfield we really care about on a regular basis is CCSID. This is the subfield where, obviously, the from or to CCSID values are inserted.

When you are finished converting data from one CCSID to another (for example, converting a large EDI or XML file from ASCII to EBCDIC) the iconv session needs to be closed. Pass the hConv handle returned from the QtqIconvOpen API to the iconv_close API. Here's the prototype for iconv_close:

.....D iconv_close     PR            10I 0 extProc('iconv_close')                                   
     D  hConv                              LikeDS(iconv_t) VALUE

Example Preparing for ASCII to EBCDIC Conversion

 Here is an example of how I use these data structures when calling QtqIconvOpen (in preparation for calling iconv):

(1)  D from_CCSID      DS                  LikeDS(QtqCode_T) Inz(*LIKEDS)                           
(2)  D to_CCSID        DS                  LikeDS(QtqCode_T) Inz(*LIKEDS)                           
(3)  D hConv           DS                  LikeDS(iconv_T) Inz(*LIKEDS) 
      /free  
(4)        to_CCSID.ccsid = 0;      // Convert to Job CCSID (typically 37 in the USA)
(5)        from_CCSID.ccsid = 819;  // Convert fron PC ASCII                          
(6)        hConv = QtqIconvOpen(to_CCSID:from_CCSID);                                               
(7)        if ( hConv.rtn_value < 0);  // -1 means it failed.                                                             
             pErrNo = errno();                                                                      
             joblog('QtqiConvOpen() returned %s %s':%char(nErrNo) :                                 
                                            %str(strerror(nErrNo)));                                
             return -1;                                                                             
           endif; 
           // Continue by accessing the data to be converted and then call iconv().                                                                                  
       /end-free 

The first two data structures (lines 1 and 2) contain the from and to CCSIDs respectively. The third data structure on line 3 is hConv and acts as the handle to the conversion table. This data structure is populated by the call to the QtqIconvOpen (or iconv_open) API and is subsequently passed by value to iconv to perform translation.

If the open is successful, the rtn_value subfield of the iconv_t structure (line 7) will be zero, if it is -1 then an error occurred and the open failed.

Now let's look at a more complete example.

Example EBCDIC 37 to ASCII 819

We often need to convert from EBCDIC to ASCII. If we're reading text data from the IFS, we don't have to worry about conversion as the IFS open API has controls on it to cause automatic conversion to be performed. But there are times when converting from and to ASCII (such as doing web development) is necessary.

The following illustrates how to set up iconv to open an EBCDIC to ASCII conversion table and issue a conversion. This is not a complete example, but all the necessary pieces are here.

.....D from_CCSID      DS                  LikeDS(QtqCode_T) Inz(*LIKEDS)                           
     D to_CCSID        DS                  LikeDS(QtqCode_T) Inz(*LIKEDS)                           
     D hConv           DS                  LikeDS(iconv_T) Inz(*LIKEDS)                             
     D inputLen        S             10U 0                                                          
     D outputLen       S             10U 0                                                          
     D pInput          S               *                                                            
     D pOutput         S               *                                                            
     D bytes           DS                  Qualified                                                
     D  uBytes                       10U 0 Inz(0)                                                   
     D  nBytes                       10I 0 Overlay(uBytes)                                          
      /free                                                                                                    
           to_CCSID = *ALLX'00';
           from_CCSID = *ALLX'00'; 

           to_CCSID.ccsid = 0;      // Convert to Job CCSID (typically 37 in the USA)
           from_CCSID.ccsid = 819;  // Convert fron PC ASCII                          

           hConv = QtqIconvOpen(to_CCSID : from_CCSID);                                               
           if ( hConv.rtn_value < 0);  // -1 means it failed.                                                             
             pErrNo = errno();                                                                      
             joblog('QtqiConvOpen() returned %s %s':%char(nErrNo) :                                 
                                            %str(strerror(nErrNo)));                                
             return -1;                                                                             
           endif;                                                                                   
                                                                                                    
             // CONVERT data from ASCII to EBCDIC                                                               
             pInput = %addr(input);                                                                 
             pOutput = %addr(output);  
             inputLen = %len(%trimR(input));
             outputLen = %size(output);                                                             
             bytes.uBytes = iconv(hConv : pInput  : inputLen :                                      
                                          pOutput : outputLen);                                     
                                                                                                    
             if (bytes.nBytes < 0);  // Was there an error?                               
                pErrNo = errno();                                                                   
                joblog('iconv() returned %s %s':%char(nErrNo) :                                     
                                        %str(strerror(nErrNo)));                                    
             endif;                                                                                 
                                                                            
             iconv_close(hConv);  // Always close the iconv session handle when finished.

      /end-free                                                                                                    

This is a lot of code to have to insert just to do something that should be implemented by IBM as a built-in function. So I've wrapped up this code into a nice little subprocedure that reduces the amount of coding required.

cvtChar Subprocedure Description

The iconv wrapper subprocedure can be easily incorporated into your own code. Just include the /COPY (or /INCLUDE) statement along with any necessary binding directory keyword, and you're good to go!

The subprocedure is named cvtChar (convert character) and it calls iconv to perform the conversion. You can use cvtChar to convert from ASCII to EBCDIC, EBCDIC to ASCII or convert between any two compatible CCSIDs such as EBCDIC 37 and UTF-8, UTF-16 or any other CCSID. Remember, this is character-conversion not language-conversion, so converting from the U.S. EBCDIC CCSID 37 to the Italian EBCDIC CCSID does not translate the English words into Italian. (Yes, I have actually had this question during training classes, so I thought I'd make it clear before I continue.)

The cvtChar subprocedure accepts 4 to 6 parameters and converts data between CCSIDs. By default it converts ASCII to EBCDIC. Specifically its default behavior is to convert from UTF-8 (the web standard) to the CCSID of the job. So if you're in the U.S., Canada, Italy or anywhere else, it should work just fine.

If you don't like ASCII to EBCDIC as the default, since you have the source code, you can certainly override that behavior. But explicitly specifying the FROM/TO CCSIDs also works if you don't want to change the subprocedure's default implementation.

 cvtChar Parameters

input - A character string of data that will be processed by iconv and its converted data placed in the output parameter. Up to 16 megabytes may be converted at a time for this parameter (the iconv limit).

inputLen - The length of the data specified on the input parameter (parameter 1) that is to be converted. Remember if you do not want trailing blank in the data to be converted, use %TRIMR to calculate the length of the data itself verses the declared length of the variable in which it is stored.

 output - A character variable that receives the converted data. For certain CCSIDs, this variable may need to be twice the length of the input data. For example, if converted a 32-byte character field that contains data in EBCDIC 37 to UTF-16, the output variable must be able to receive up to 64 bytes of data. Some CCSIDs, such as UTF-8 have a mixture of 8 and 16-bit characters so conversion to UTF-8 would also require an output variable that is two times the length of the input data, but it may only use the same number of characters as the input parameter--it all depends on the characters begin converted.

outputLen - The length of the variable specified to receive the converted data (parameter 3).

Optional Parameters

fromCCSID - The CCSID for the data contained in the input parameter (parameter 1). If this parameter is not specified, CCSID 1208 is used indicating that the data is converted from UTF-8.

toCCSID - The CCSID that the data specified on the input parameter (parameter 1) is converted to and is stored in on the output parameter (parameter 3).  If this parameter is not specified, CCSID 0 is used indicating that the data is converted to the Job's CCSID.

cvtChar Source code

The source code for the cvtChar subprocedure uses techniques that I pioneered in RPG IV to get around what seemed like shortcomings in the language. I use 1-byte character definitions for both the input and output parameters. But since they address of the parameter is retrieved, the original data passed to the subprocedure is accessed in its entirety. Hence, even though the parameter is defined as 1A, I am able to access all the way up to 16MB for the parameter. Which also means you can pass up to a 16MB value for the parameter.

To use iconv, a conversion handle must be opened. This is similar to opening a file on the IFS. And as with an IFS file, when you are finished with iconv, you must close the conversion handle. Some people ask me "what happens if I don't close the handle?' To them, I respond "What happens when you leave your front door open and go to work?" Most of the time, nothing happens--but eventually you will regret having left it open.

To avoid this issue, I wrap the open, convert and close functions of iconv in the cvtChar subprocedure. Here's the Prototype for the cvtChar subprocedure:

.....D cvtChar         PR            10I 0 extProc('RPGOPEN_convertCharacter')                      
     D  input                         1A   OPTIONS(*VARSIZE)                                        
     D  inLen                        10I 0 Const                                                    
     D  output                        1A   OPTIONS(*VARSIZE)                                        
     D  outLen                       10I 0 Const                                                    
     D  fromCCSID                    10I 0 Const OPTIONS(*NOPASS)                                   
     D  toCCSID                      10I 0 Const OPTIONS(*NOPASS)                                   

As you can probably see from the EXTPROC keyword, I have incorporated this subprocedure into the midrangeNews.com RPG Open service program. This is a free service program that contains several RPG IV functions not currently supported as native RPG interfaces. It is free to midrangeNews.com subprocedures and can be downloaded at www.RPGOpen.com You do not need to install RPGOPEN to use cvtChar (each RPG Open function is stand-alone) but installing RPGOPEN's *SRVPGM does make it simple.

To avoid name-collision, my naming scheme is to prefix the exported function name with the name of the service program. So this particular function is named CVTCHAR, however it is exported as RPGOPEN_convertCharacter (case sensitive). The EXTPROC keyword allows us to reference the exported name while using a shorter, more conveniently name, such as CVTCHAR. This technique also allows you to avoid name-collision with your own in-house subprocedures. If you already have a "CVTCHAR" subprocedure, you can simply change the prototype name from "CVTCHAR" to something else, and it just works. 

cvtChar Source Member

For many conversions I find that typically we often need to convert one set of data, a field, a string of XML or EDI data, or something retrieved from the IFS. There are, however situations where the iconv cycle of open, read data, convert, read data convert... close iconv is necessary. You can use this CVTCHAR subprocedure for either purpose but it may be more efficient to call iconv directly when the read/convert/read/convert... cycle is required. For everything else, I use cvtChar.

Here is the full source code for cvtChar. You can is download from this link. It is free if your are a MidrangeNews.com Premium Member. This link is the full RPG Open service program in savefile format. We are now running IBM i v7.1 so the earliest release we can compile back to is v5r4, so RPG Open's savefile is v5r4m0 compatible. If you need help uploading a savefile to your System i, read this article first. If you want to learn more about subprocedures and how to code them, I suggest ordering my "Subprocedures and Service Programs" training seminar 3-DISC DVD set. It is on sale for 50% off to MidrangeNews.com members only, until the 4th of July holiday weekend. 

.....H NOMAIN OPTION(*NODEBUGIO:*SRCSTMT)                                                           
     H Copyright('RPG OPEN - (c) 2009 Robert Cozzi, Jr. All rights reserved.')                      
      /IF NOT DEFINED(*V6R1M0)                                                                      
     H BNDDIR('QC2LE')                                                                              
      /ENDIF                                                                                        
                                                                                                    
      /include RPGOPEN/QCPYSRC,cprotos                                                              
      /include RPGOPEN/QCPYSRC,iconv                                                                
      /include RPGOPEN/qcpysrc,joblog                                                               
                                                                                                    
     D true            C                   Const(*ON)                                               
     D false           C                   Const(*OFF)                                              
                                                                                                    
     P cvtChar         B                   EXPORT                                                   
     D cvtChar         PI            10I 0                                                          
     D  input                         1A   OPTIONS(*VARSIZE)                                        
     D  inLen                        10I 0 Const                                                    
     D  output                        1A   OPTIONS(*VARSIZE)                                        
     D  outLen                       10I 0 Const                                                    
     D  fromCCSID                    10I 0 Const OPTIONS(*NOPASS)                                   
     D  toCCSID                      10I 0 Const OPTIONS(*NOPASS)                                   
                                                                                                    
     D inputLen        S             10U 0                                                          
     D outputLen       S             10U 0                                                          
     D pInput          S               *                                                            
     D pOutput         S               *                                                            
     D from_CCSID      DS                  LikeDS(QtqCode_T) Inz(*LIKEDS)                           
     D to_CCSID        DS                  LikeDS(QtqCode_T) Inz(*LIKEDS)                           
     D hConv           DS                  LikeDS(iconv_T) Inz(*LIKEDS)                             
     D CCSID           DS                  Qualified                                                
     D  UTF8                         10I 0 Inz(1208)                                                
     D  ASCII                        10I 0 Inz(819)                                                 
     D  JOB                          10I 0 Inz(0)                                                   
                                                                                                    
     D nErrNo          S             10I 0 Based(pErrNo)                                            
     D bytes           DS                  Qualified                                                
     D uBytes                        10U 0 Inz(0)                                                   
     D nBytes                        10I 0 Overlay(uBytes)                                          
                                                                                                    
      /free                                                                                         
           if (inLen <= 0 or outLen <= 0);                                                          
              return 0;                                                                             
           endif;                                                                                   

           from_CCSID = *ALLX'00';
           if (%parms() >= 5);
              from_CCSID.ccsid = fromCCSID;
           else;
              from_CCSID.ccsid = ccsid.UTF8; // Default to UTF-8 ASCII
           endif;

           to_CCSID = *ALLX'00';
           if (%parms() >= 6);
              to_CCSID.ccsid = toCCSID;
           else;
              to_CCSID.ccsid = ccsid.JOB;  // Default to JOB CCSID EBCDIC
           endif;
                                                                                                              
           hConv = QtqIconvOpen(to_CCSID:from_CCSID);                                               
           if ( hConv.rtn_value < 0);                                                               
             pErrNo = errno();                                                                      
             joblog('QtqiConvOpen() returned %s %s':%char(nErrNo) :                                 
                                            %str(strerror(nErrNo)));                                
             return -1;                                                                             
           endif;                                                                                   
                                                                                                    
             // CONVERT data to UTF-8                                                               
          if (inLen > 0);                                                                           
             inputLen = inLen;                                                                      
             outputLen= outLen;                                                                     
             pInput = %addr(input);                                                                 
             pOutput = %addr(output);                                                               
             bytes.uBytes = iconv(hConv : pInput  : inputLen :                                      
                                          pOutput : outputLen);                                     
                                                                                                    
             if (bytes.uBytes = ICONV_ERROR);  // Was there an error?                               
                pErrNo = errno();                                                                   
                joblog('iconv() returned %s %s':%char(nErrNo) :                                     
                                        %str(strerror(nErrNo)));                                    
             endif;                                                                                 
          endif;                                                                                    
             // CLOSE iconv() handle                                                                
          iconv_close(hConv);                                                                       
                                                                                                    
            // Return bytes remaining (bytes not converted)                                         
          joblog('cvtToUTF8 converted %s of %s characters':                                         
                                      %char(inLen - inputLen) :                                     
                                      %char(inLen));                                                
          joblog('cvtToUTF8 output buffer contains %s CCSID(%s) characters; +                       
                                   %s extra return buffer bytes unaltered.':                        
                                      %char(outLen - outputLen) :                                   
                                      %char(to_CCSID.ccsid) :                                       
                                      %char(outputLen));                                            
          return (inputLen - inLen);                                                                
                                                                                                    
      /end-free                                                                                     
     P cvtChar         E                                                                             

Below is an example implementation calling the CVTCHAR subprocedure. This example converts from EBCDIC to UTF-8 (a reverse in direction of the default behavior). It declares the binding directory for the RPGOPEN *SRVPGM and uses /INCLUDE to import the prototype for iconv and the CVTCHAR subprocedures.

.....H OPTION(*SRCSTMT) DFTACTGRP(*NO)                                         
     H BNDDIR('RPGOPEN/RPGOPEN')                                               
      /include rpgopen/qcpysrc,iconv                                           
                                                                               
     D szData          S            100A                                       
     D szUTF8          S            200A                                       
     C                   eval      *INLR = *ON                                 
      /free                                                                    
           szData = 'abcdefgABCDEFG0123456789' +                               
                    '!@#$%&*()<>,.:~;`/\|''"{}';                              
           cvtChar(szData : %len(%trimR(szData))+2 : szUTF8 : %size(szUTF8) : 0 : 1208);
          return;                                                              
      /end-free                                                                

You're welcome!

Check out my latest Blog entry "Things I would Do to Make an 'RPG5' Language Better than RPG IV"

Call Me

Bob Cozzi is available to train your staff, on-site in RPG IV or SQL as well as perform consulting or contract programming. Currently many shops are asking Cozzi to join them for 1 to 3 days of Q&A and consulting with their RPG staff. The Staff gets to ask real-world questions that apply to their unique situations. Contact Cozzi by sending him an email at: bob at rpgworld.com

Bob also accepts your questions for use in future RPG Report articles or content for midrangeNews.com. Topics of interest include: RPG IV, Web development with RPG IV, APIs, C/C++ or anything else IBM i development related (except subfiles, data areas and RPGII/III because Bob doesn't care about that stuff) write your questions using the Feedback link on the midrangeNews.com website. 

You can subscribe to RPG Report (we call it "follow") by visiting the RPG Report page on midrangeNews.com and then click the FOLLOW link in the table of contents for that page. To unsubscribe, simply click that same link. You must be signed up and signed in to midrangeNews.com to start following RPG Report.

Follow Bob Cozzi on Twitter

Return to midrangenews.com home page.
Sort Ascend | Descend

COMMENTS