Using IBM i? Need to create Excel, CSV, HTML, JSON, PDF, SPOOL reports? Learn more about the fastest and least expensive tool for the job: SQL iQuery.
There are so many wrong ways to convert data on System i that I wouldn't know where to begin. The primary flaw is, as usual, in legacy habits that utilize interfaces built for System/38. This interfaces, such as QDCXLATE or the alternative QTBXLATE were built to provide a way to read ASCII data from a tape or 8-inch Diskette. They were never intended to be used for today's pervasive ASCII->EBCDIC->ASCII processes.
At a basic level, RPG already handles ASCII to EBCDIC via the %UCS2 built-in function. In addition, if you're on the latest releases of IBM i operating system, RPG IV supports conversion from UTF-16 to your job's CCSID (probably EBCDIC) automatically. Just move one field to the other and it works. But I still can't comprehend why we don't have %TOASCII or %TOEBCDIC built-functions in RPG today.
There is a native API included with IBM i and most other operating systems that provides conversion between CCSIDs. That API is iconv and dynamically builds a conversion table if one does not exist, and converts from any CCSID to any CCSID. Unlike the passé QDCXLATE, iconv is more dynamic and its conversion is up to date. For example, if you receive an XML document coded in UTF-8, QDCXLATE will often convert it correctly. Note that I said "often" and did not say "always". Whereas the iconv API always converts UTF-8 correctly.
The iconv API converts data from one CCSID to another CCSID. The converted data is stored in a second parameter, that is the data is implicitly converted in-place. The iconv API is complex and works different from the way RPG programmers are use to. For example, the input data is a pointer that is incremented as each character is converted--moving the pointer to the next character to be converted. The process is repeated until the entire string is converted. The output parameter is also a pointer where the converted data is stored. It too is incremented each time the converted data is stored in the output buffer. So when the conversion is complete, the output pointer points after the output data. Odd, right? Here's the prototype in RPG IV for iconv:
.....D iconv PR 10U 0 extProc('iconv') D hConv LikeDS(iconv_t) Value D pInput * D nInLen 10U 0 D pOutput * D nOutLen 10U 0
It is very important to note that the hConv parameter (parameter 1) is passed by value. The VALUE keyword must be included on that parameter or you'll have a learning experience that could last for hours or in some cases, days.
The API or function needed to open an iconv "session" is either iconv_open or the IBM i native QtqIconvOpen API. The iconv_open seems easier to use on the surface, and it is cross-platform, but it's parameters are "implementation defined" meaning that while the interface itself is portable, the actual parameters vary from OS to OS. I prefer to use the QtqIconvOpen API instead of iconv_open, but either one will work.
The prototype for iconv_open is as follows:
.....D iconv_open PR extProc('iconv_open') LikeDS(iconv_t) D toCode LikeDS(toCVTCCSID_T) D fromCode LikeDS(fromCVTCCSID_T)
The data structures TOCVTCCSID_T and FROMCVTCCSID_T are included in the downloadable code, but are not necessary for this week's example or the CVTCHAR function to work, so they are omitted from this article.
The prototype for QtqIconvOpen is as follows:
.....D QtqIconvOpen PR extProc('QtqIconvOpen') LikeDS(iconv_t) D toCCSID LikeDS(QtqCode_T) D fromCCSID LikeDS(QtqCode_T)
The both support the iconv_t structure as a return value. It is returned to the caller if the conversion table is successfully open. The programmer never changes anything in this structure--it acts as a handle that is passed from the open, to the conversion and then to the close routines. The reason we have an open, convert and close, is so that we can open a conversion table, and then use it to convert lots of data. Then closing the handle when we're finished. This would apply if you wanted to convert a large number of transactions or perhaps a large amount of database records for direct storage on the IFS. Calling iconv directly rather than through the cvtChar subprocedure when repeated conversions are needed would be faster. Here is the iconv_t structure:
.....D iconv_t DS Qualified Inz D rtn_value 10I 0 D cd 10I 0 Dim(12)
Again, you declare a data structure like this data structure template, and use it as the target of an iconv_open or QtqIconvOpen API and then pass it to iconv itself and eventually to iconv_close.
The QtqCode_T data structure contains several fields that, among other things, identifies the conversion CCSIDs. That data structure looks like this:
.....D QtqCode_T DS Qualified Inz D QTQCCSID 10I 0 D QTQCA 10I 0 D QTQSA 10I 0 D QTQSA00 10I 0 D QTQLO 10I 0 D QTQMEO 10I 0 D QTQERVED02 8A Inz(*ALLX'00') D ccsid 10I 0 Overlay(qtqccsid) D cnv_alternative... D 10i 0 Overlay(QTQCA) D subs_alternative... D 10i 0 Overlay(QTQSA) D shift_alternative... D 10i 0 Overlay(QTQSA00) D length_option... D 10i 0 Overlay(QTQLO) D mx_error_option... D 10i 0 Overlay(QTQMEO) D reserved 8A Overlay(QTQERVED02)
The first group of subfields are the IBM provided names. The second group are the names I've added and match that of the structures in the C language. Since I used the OVERLAY keyword and a Qualified data structure either set of names will work.
The only subfield we really care about on a regular basis is CCSID. This is the subfield where, obviously, the from or to CCSID values are inserted.
When you are finished converting data from one CCSID to another (for example, converting a large EDI or XML file from ASCII to EBCDIC) the iconv session needs to be closed. Pass the hConv handle returned from the QtqIconvOpen API to the iconv_close API. Here's the prototype for iconv_close:
.....D iconv_close PR 10I 0 extProc('iconv_close') D hConv LikeDS(iconv_t) VALUE
Here is an example of how I use these data structures when calling QtqIconvOpen (in preparation for calling iconv):
(1) D from_CCSID DS LikeDS(QtqCode_T) Inz(*LIKEDS) (2) D to_CCSID DS LikeDS(QtqCode_T) Inz(*LIKEDS) (3) D hConv DS LikeDS(iconv_T) Inz(*LIKEDS) /free (4) to_CCSID.ccsid = 0; // Convert to Job CCSID (typically 37 in the USA) (5) from_CCSID.ccsid = 819; // Convert fron PC ASCII (6) hConv = QtqIconvOpen(to_CCSID:from_CCSID); (7) if ( hConv.rtn_value < 0); // -1 means it failed. pErrNo = errno(); joblog('QtqiConvOpen() returned %s %s':%char(nErrNo) : %str(strerror(nErrNo))); return -1; endif; // Continue by accessing the data to be converted and then call iconv(). /end-free
The first two data structures (lines 1 and 2) contain the from and to CCSIDs respectively. The third data structure on line 3 is hConv and acts as the handle to the conversion table. This data structure is populated by the call to the QtqIconvOpen (or iconv_open) API and is subsequently passed by value to iconv to perform translation.
If the open is successful, the rtn_value subfield of the iconv_t structure (line 7) will be zero, if it is -1 then an error occurred and the open failed.
Now let's look at a more complete example.
We often need to convert from EBCDIC to ASCII. If we're reading text data from the IFS, we don't have to worry about conversion as the IFS open API has controls on it to cause automatic conversion to be performed. But there are times when converting from and to ASCII (such as doing web development) is necessary.
The following illustrates how to set up iconv to open an EBCDIC to ASCII conversion table and issue a conversion. This is not a complete example, but all the necessary pieces are here.
.....D from_CCSID DS LikeDS(QtqCode_T) Inz(*LIKEDS) D to_CCSID DS LikeDS(QtqCode_T) Inz(*LIKEDS) D hConv DS LikeDS(iconv_T) Inz(*LIKEDS) D inputLen S 10U 0 D outputLen S 10U 0 D pInput S * D pOutput S * D bytes DS Qualified D uBytes 10U 0 Inz(0) D nBytes 10I 0 Overlay(uBytes) /free to_CCSID = *ALLX'00'; from_CCSID = *ALLX'00'; to_CCSID.ccsid = 0; // Convert to Job CCSID (typically 37 in the USA) from_CCSID.ccsid = 819; // Convert fron PC ASCII hConv = QtqIconvOpen(to_CCSID : from_CCSID); if ( hConv.rtn_value < 0); // -1 means it failed. pErrNo = errno(); joblog('QtqiConvOpen() returned %s %s':%char(nErrNo) : %str(strerror(nErrNo))); return -1; endif; // CONVERT data from ASCII to EBCDIC pInput = %addr(input); pOutput = %addr(output); inputLen = %len(%trimR(input)); outputLen = %size(output); bytes.uBytes = iconv(hConv : pInput : inputLen : pOutput : outputLen); if (bytes.nBytes < 0); // Was there an error? pErrNo = errno(); joblog('iconv() returned %s %s':%char(nErrNo) : %str(strerror(nErrNo))); endif; iconv_close(hConv); // Always close the iconv session handle when finished. /end-free
This is a lot of code to have to insert just to do something that should be implemented by IBM as a built-in function. So I've wrapped up this code into a nice little subprocedure that reduces the amount of coding required.
The iconv wrapper subprocedure can be easily incorporated into your own code. Just include the /COPY (or /INCLUDE) statement along with any necessary binding directory keyword, and you're good to go!
The subprocedure is named cvtChar (convert character) and it calls iconv to perform the conversion. You can use cvtChar to convert from ASCII to EBCDIC, EBCDIC to ASCII or convert between any two compatible CCSIDs such as EBCDIC 37 and UTF-8, UTF-16 or any other CCSID. Remember, this is character-conversion not language-conversion, so converting from the U.S. EBCDIC CCSID 37 to the Italian EBCDIC CCSID does not translate the English words into Italian. (Yes, I have actually had this question during training classes, so I thought I'd make it clear before I continue.)
The cvtChar subprocedure accepts 4 to 6 parameters and converts data between CCSIDs. By default it converts ASCII to EBCDIC. Specifically its default behavior is to convert from UTF-8 (the web standard) to the CCSID of the job. So if you're in the U.S., Canada, Italy or anywhere else, it should work just fine.
If you don't like ASCII to EBCDIC as the default, since you have the source code, you can certainly override that behavior. But explicitly specifying the FROM/TO CCSIDs also works if you don't want to change the subprocedure's default implementation.
input - A character string of data that will be processed by iconv and its converted data placed in the output parameter. Up to 16 megabytes may be converted at a time for this parameter (the iconv limit).
inputLen - The length of the data specified on the input parameter (parameter 1) that is to be converted. Remember if you do not want trailing blank in the data to be converted, use %TRIMR to calculate the length of the data itself verses the declared length of the variable in which it is stored.
output - A character variable that receives the converted data. For certain CCSIDs, this variable may need to be twice the length of the input data. For example, if converted a 32-byte character field that contains data in EBCDIC 37 to UTF-16, the output variable must be able to receive up to 64 bytes of data. Some CCSIDs, such as UTF-8 have a mixture of 8 and 16-bit characters so conversion to UTF-8 would also require an output variable that is two times the length of the input data, but it may only use the same number of characters as the input parameter--it all depends on the characters begin converted.
outputLen - The length of the variable specified to receive the converted data (parameter 3).
fromCCSID - The CCSID for the data contained in the input parameter (parameter 1). If this parameter is not specified, CCSID 1208 is used indicating that the data is converted from UTF-8.
toCCSID - The CCSID that the data specified on the input parameter (parameter 1) is converted to and is stored in on the output parameter (parameter 3). If this parameter is not specified, CCSID 0 is used indicating that the data is converted to the Job's CCSID.
The source code for the cvtChar subprocedure uses techniques that I pioneered in RPG IV to get around what seemed like shortcomings in the language. I use 1-byte character definitions for both the input and output parameters. But since they address of the parameter is retrieved, the original data passed to the subprocedure is accessed in its entirety. Hence, even though the parameter is defined as 1A, I am able to access all the way up to 16MB for the parameter. Which also means you can pass up to a 16MB value for the parameter.
To use iconv, a conversion handle must be opened. This is similar to opening a file on the IFS. And as with an IFS file, when you are finished with iconv, you must close the conversion handle. Some people ask me "what happens if I don't close the handle?' To them, I respond "What happens when you leave your front door open and go to work?" Most of the time, nothing happens--but eventually you will regret having left it open.
To avoid this issue, I wrap the open, convert and close functions of iconv in the cvtChar subprocedure. Here's the Prototype for the cvtChar subprocedure:
.....D cvtChar PR 10I 0 extProc('RPGOPEN_convertCharacter') D input 1A OPTIONS(*VARSIZE) D inLen 10I 0 Const D output 1A OPTIONS(*VARSIZE) D outLen 10I 0 Const D fromCCSID 10I 0 Const OPTIONS(*NOPASS) D toCCSID 10I 0 Const OPTIONS(*NOPASS)
As you can probably see from the EXTPROC keyword, I have incorporated this subprocedure into the midrangeNews.com RPG Open service program. This is a free service program that contains several RPG IV functions not currently supported as native RPG interfaces. It is free to midrangeNews.com subprocedures and can be downloaded at www.RPGOpen.com You do not need to install RPGOPEN to use cvtChar (each RPG Open function is stand-alone) but installing RPGOPEN's *SRVPGM does make it simple.
To avoid name-collision, my naming scheme is to prefix the exported function name with the name of the service program. So this particular function is named CVTCHAR, however it is exported as RPGOPEN_convertCharacter (case sensitive). The EXTPROC keyword allows us to reference the exported name while using a shorter, more conveniently name, such as CVTCHAR. This technique also allows you to avoid name-collision with your own in-house subprocedures. If you already have a "CVTCHAR" subprocedure, you can simply change the prototype name from "CVTCHAR" to something else, and it just works.
For many conversions I find that typically we often need to convert one set of data, a field, a string of XML or EDI data, or something retrieved from the IFS. There are, however situations where the iconv cycle of open, read data, convert, read data convert... close iconv is necessary. You can use this CVTCHAR subprocedure for either purpose but it may be more efficient to call iconv directly when the read/convert/read/convert... cycle is required. For everything else, I use cvtChar.
Here is the full source code for cvtChar. You can is download from this link. It is free if your are a MidrangeNews.com Premium Member. This link is the full RPG Open service program in savefile format. We are now running IBM i v7.1 so the earliest release we can compile back to is v5r4, so RPG Open's savefile is v5r4m0 compatible. If you need help uploading a savefile to your System i, read this article first. If you want to learn more about subprocedures and how to code them, I suggest ordering my "Subprocedures and Service Programs" training seminar 3-DISC DVD set. It is on sale for 50% off to MidrangeNews.com members only, until the 4th of July holiday weekend.
.....H NOMAIN OPTION(*NODEBUGIO:*SRCSTMT) H Copyright('RPG OPEN - (c) 2009 Robert Cozzi, Jr. All rights reserved.') /IF NOT DEFINED(*V6R1M0) H BNDDIR('QC2LE') /ENDIF /include RPGOPEN/QCPYSRC,cprotos /include RPGOPEN/QCPYSRC,iconv /include RPGOPEN/qcpysrc,joblog D true C Const(*ON) D false C Const(*OFF) P cvtChar B EXPORT D cvtChar PI 10I 0 D input 1A OPTIONS(*VARSIZE) D inLen 10I 0 Const D output 1A OPTIONS(*VARSIZE) D outLen 10I 0 Const D fromCCSID 10I 0 Const OPTIONS(*NOPASS) D toCCSID 10I 0 Const OPTIONS(*NOPASS) D inputLen S 10U 0 D outputLen S 10U 0 D pInput S * D pOutput S * D from_CCSID DS LikeDS(QtqCode_T) Inz(*LIKEDS) D to_CCSID DS LikeDS(QtqCode_T) Inz(*LIKEDS) D hConv DS LikeDS(iconv_T) Inz(*LIKEDS) D CCSID DS Qualified D UTF8 10I 0 Inz(1208) D ASCII 10I 0 Inz(819) D JOB 10I 0 Inz(0) D nErrNo S 10I 0 Based(pErrNo) D bytes DS Qualified D uBytes 10U 0 Inz(0) D nBytes 10I 0 Overlay(uBytes) /free if (inLen <= 0 or outLen <= 0); return 0; endif; from_CCSID = *ALLX'00'; if (%parms() >= 5); from_CCSID.ccsid = fromCCSID; else; from_CCSID.ccsid = ccsid.UTF8; // Default to UTF-8 ASCII endif; to_CCSID = *ALLX'00'; if (%parms() >= 6); to_CCSID.ccsid = toCCSID; else; to_CCSID.ccsid = ccsid.JOB; // Default to JOB CCSID EBCDIC endif; hConv = QtqIconvOpen(to_CCSID:from_CCSID); if ( hConv.rtn_value < 0); pErrNo = errno(); joblog('QtqiConvOpen() returned %s %s':%char(nErrNo) : %str(strerror(nErrNo))); return -1; endif; // CONVERT data to UTF-8 if (inLen > 0); inputLen = inLen; outputLen= outLen; pInput = %addr(input); pOutput = %addr(output); bytes.uBytes = iconv(hConv : pInput : inputLen : pOutput : outputLen); if (bytes.uBytes = ICONV_ERROR); // Was there an error? pErrNo = errno(); joblog('iconv() returned %s %s':%char(nErrNo) : %str(strerror(nErrNo))); endif; endif; // CLOSE iconv() handle iconv_close(hConv); // Return bytes remaining (bytes not converted) joblog('cvtToUTF8 converted %s of %s characters': %char(inLen - inputLen) : %char(inLen)); joblog('cvtToUTF8 output buffer contains %s CCSID(%s) characters; + %s extra return buffer bytes unaltered.': %char(outLen - outputLen) : %char(to_CCSID.ccsid) : %char(outputLen)); return (inputLen - inLen); /end-free P cvtChar E
Below is an example implementation calling the CVTCHAR subprocedure. This example converts from EBCDIC to UTF-8 (a reverse in direction of the default behavior). It declares the binding directory for the RPGOPEN *SRVPGM and uses /INCLUDE to import the prototype for iconv and the CVTCHAR subprocedures.
.....H OPTION(*SRCSTMT) DFTACTGRP(*NO) H BNDDIR('RPGOPEN/RPGOPEN') /include rpgopen/qcpysrc,iconv D szData S 100A D szUTF8 S 200A C eval *INLR = *ON /free szData = 'abcdefgABCDEFG0123456789' + '!@#$%&*()<>,.:~;`/\|''"{}'; cvtChar(szData : %len(%trimR(szData))+2 : szUTF8 : %size(szUTF8) : 0 : 1208); return; /end-free
You're welcome!
Check out my latest Blog entry "Things I would Do to Make an 'RPG5' Language Better than RPG IV"
Bob Cozzi is available to train your staff, on-site in RPG IV or SQL as well as perform consulting or contract programming. Currently many shops are asking Cozzi to join them for 1 to 3 days of Q&A and consulting with their RPG staff. The Staff gets to ask real-world questions that apply to their unique situations. Contact Cozzi by sending him an email at: bob at rpgworld.com
Bob also accepts your questions for use in future RPG Report articles or content for midrangeNews.com. Topics of interest include: RPG IV, Web development with RPG IV, APIs, C/C++ or anything else IBM i development related (except subfiles, data areas and RPGII/III because Bob doesn't care about that stuff) write your questions using the Feedback link on the midrangeNews.com website.
You can subscribe to RPG Report (we call it "follow") by visiting the RPG Report page on midrangeNews.com and then click the FOLLOW link in the table of contents for that page. To unsubscribe, simply click that same link. You must be signed up and signed in to midrangeNews.com to start following RPG Report.