Midrange News for the IBM i Community


Posted by: TFisher
How to know if data in an IFS file is ASCII or not
has no ratings.
Published: 23 Jan 2015
Revised: 26 Jan 2015 - 3377 days ago
Last viewed on: 25 Apr 2024 (10986 views) 

Using IBM i? Need to create Excel, CSV, HTML, JSON, PDF, SPOOL reports? Learn more about the fastest and least expensive tool for the job: SQL iQuery.

How to know if data in an IFS file is ASCII or not Published by: TFisher on 23 Jan 2015 view comments(13)

Does anyone know of a way to know if the content of a file on the IFS is EBCDIC or ASCII?  I know that I can assume that if the CCSID isn't 819 that is probably isn't EBCDIC.  However, if the CCSID is 819 it can contain either.   I cannot find an easy way to know if the content is EBCDIC or ASCII.

 

Here is what I am trying to do.  I have to change our email program to alway encode attachments using base64.  If I encode a file that contains EBCDIC data it is garbage when the email client decodes it.  I have to first convert the data to ASCII before I encode it.  If the file that I am attaching contains ASCII data it works great if I do not convert it to ASCII.  I need to know when to convert EBCDIC to ASCII and there really isn't anything that tells me that file 'XYZ' contains ASCII data or EBCDIC data.

 

Does anyone have any ideas?  I am thinking I could write a program that decides if a file is ASCII using discrimination. That is, read some number of characters from the file and if some percentage of those characters are not the typical characters used then make an assumption.  

Return to midrangenews.com home page.
Sort Ascend | Descend

COMMENTS

(Sign in to Post a Comment)
Posted by: starbuck5250
Premium member *
Comment on: How to know if data in an IFS file is ASCII or not
Posted: 9 years 3 months 2 days 21 hours 36 minutes ago

IBM doesn't list 810 on the CCSID list.  When I do a CHGJOB, I don't see 810 as a possible CCSID (v7.2)  What process is putting a CCSID(810) file into the IFS?  And what process is sometimes putting 'EBCDIC' data into an 810 file and sometimes 'ASCII'?

It is possible to declare a file as one CCSID and deliberately store data in a different CCSID, but that is most often a programming error.  The whole point of the CCSID is that by knowing that, one knows the encoding of the characters stored within.  There is no foolproof way to be able to read 8 bits of data and tell what character set it is encoded in.  You can make some guesses by looking for x'20' vs x'40' (spaces) but that doesn't help when looking at multi-byte character encodings like UTF-16.

The stream file support is really good at automatically doing the text conversions if the file is tagged with the correct CCSID.  I don't have any translation routine for my IFS-reading RPG programs.  All I do is open with O_TEXTDATA set and the system does the translation from the stream file CCSID to my job's CCSID.

Posted by: TFisher
Premium member *
Comment on: How to know if data in an IFS file is ASCII or not
Posted: 9 years 3 months 2 days 21 hours 31 minutes ago
Edited: Fri, 23 Jan, 2015 at 09:29:57 (3380 days ago)

Sorry, my mistake.  I meant 819.  I changed this in my original post.

Posted by: TFisher
Premium member *
Comment on: How to know if data in an IFS file is ASCII or not
Posted: 9 years 3 months 2 days 21 hours 26 minutes ago

I am thinking that I will have to read a block of data and "assuming" based on what characters I find the most of.  In other words, if I read a block that as "B0077,Y,123,25%,$122.00" then I can assume it's EBCDIC.  If I read a block that has "K-*ÎÌϳR0Ô3àåâå  PK  ²" then I can assume it's ASCII data.  That is, if both files have a CCSID of 819.  I am still thinking that if the CCSID is something other than 819 then it's always ASCII.

Posted by: Ringer
Premium member *
Comment on: How to know if data in an IFS file is ASCII or not
Posted: 9 years 3 months 2 days 17 hours 19 minutes ago

Call the stat() API. 

http://www.scottklement.com/rpg/ifs_ebook/stat.html

CCSID is 2 unsigned integer bytes starting at byte 59 in the stat data structure. 

binary 0333 = dec 819 Latin-1
binary 0025 = dec 37 EBCDIC 
binary 04E4 = dec 1252 Windows

etc 

Chris Ringer

Posted by: TFisher
Premium member *
Comment on: How to know if data in an IFS file is ASCII or not
Posted: 9 years 3 months 2 days 16 hours 12 minutes ago

The problem is 819 can contain EBCDIC or ASCII data.  There are several programs creating stream files with 819 specified on the code page parameter of the Open() function.  Some are using the Convert text by code-page option which means the data is in ASCII and some don't specify that option meaning the data is EBCDIC.  Either way, the CCSID shows 819.

Now I am faced with the task of knowing whether or not the data is ASCII or EBCDIC in our email program so I know whether I need to convert the data to ASCII prior to base64 encoding.   I am thinking that I am going to have to change every program on the system that creates files on the IFS to assure that the code page is being set correctly.  I was hoping there would be an easier way.

Posted by: starbuck5250
Premium member *
Comment on: How to know if data in an IFS file is ASCII or not
Posted: 9 years 3 months 2 days 15 hours 52 minutes ago

If the CCSID is 819 then the contents are encoded as single byte ASCII unless someone malicious has been at the data.  How are you 'reading' your stream file?  Are you using the Unix APIs [open(), read() etc]?  Chris posted a very useful link to Scott Klement's IFS API wrappers.  Highly recommended.

Using the Unit open() API, simply using the flag called O_TEXTDATA will signal the operating system to convert the stream file CCSID(819) characters to your job's CCSID(37) characters.  You won't need iconv() or anything else to translate for you - the operating system will do it automatically.

So here are two questions:

1) What code are you using to read the stream file?  Paste it here.

2) Do a DSPF of a CCSID(819) stream file.  Press F10 to see it in hex.  Paste a dozen or so characters here.

Here's an example from my system:

dspf .sh_history

00736574 204C4942 50415448 202F7573 722F6C69    set LIBPATH /usr/li

// Translate unprintable characters in files to an eyecatcher
nonprint = x'0d' + x'25';                                     
for i = 1 to %size(nonprint);                                 
  eyecatcher += '^';                                          
endfor;                                                      

//   Open text                                                 
oflag=o_rdonly + o_textdata;                                   
                                                               
// Open the file; fp is the open handle number                 
fp=open(%addr(dir_full): oflag);                               
                                                               
// Read the file's stats; rc is the return code (success/fail)
// rc=fstat(fp:%addr(info));                                   
                                                               
// Read the file one buffer at a time                          
SizeSoFar=0;                                                   
rcvvarsiz = %size(rcvvar);                                     
                                                               
DoW SizeSoFar < h_size;                                        
  BytesReturned=read(fp: %addr(rcvvar):                        
      rcvvarsiz);                                              
                                                               
  // If a partial buffer, trim the buffer to                   
  // match the received size                                   
  rcvvar=%subst(rcvvar:1:BytesReturned);                       
  SizeSoFar = SizeSoFar + BytesReturned;    

  // write the subfile record                     
  dsptext = %xlate(nonprint: eyecatcher: rcvvar);
  stmrrn += 1;                                    
  stmrrnMax += 1;                                 
  *in35 = *on;  // sfldsp                         
  write stms;                                     
                                                  
EndDO;                                                              

This code writes the raw stream file data to a DDS subfile.  The only translation this code does is to show whe CR/LF with an eyecatcher so I can see where they fall in the data.  That's it.  And, you can see that the hex value of the line feed is not the ASCII x'0a' it is the EBCDIC x'25'.  This demonstrates the point that by the time the RPG program sees the stream file data [after the read()] it is encoded in the CCSID of the RPG job, not the stream file.

  --buck

Posted by: TFisher
Premium member *
Comment on: How to know if data in an IFS file is ASCII or not
Posted: 9 years 3 months 2 days 15 hours 27 minutes ago

Yes, my email program opens the files using Open() and the O_RDONLY + O_TEXTDATA options.  What I am saying is that we have a few programs that create files using code page 819 and some files are created with the " Convert text by code-page" option (O_CodePage) and some don't.  

This option is what tells the system to convert the data being written to the file to whatever the code page is being set to.  So when O_CodePage isn't specified when the file is opened, it appears the data is EBCDIC.  When I encode these files to base64 in my email program they contain garbage once they are received and opened by the email client.  Futhermore, if I convert the data to ASCII prior to encoding it works great.

When O_CodePage is specified the data appears to be converted to ASCII and when I encode these files to base64 in my email program they look perfect when they are received and opened by the email client.  In addition, if I try to convert this data to ASCII prior to encoding I get garbage.

So I am looking for something other than CCSID 819 to know if I need to convert EBCDIC to ASCII prior to encoding the data being placed in my MIME file to be emailed.

I am thinking that the solution is going to be to find all programs that are creating files on the IFS using Open() and Write() and always specify O_CodePage.  I just don't know why this option was used for some types of files and not used for others.  

Posted by: starbuck5250
Premium member *
Comment on: How to know if data in an IFS file is ASCII or not
Posted: 9 years 3 months 2 days 15 hours ago
Edited: Fri, 23 Jan, 2015 at 15:55:23 (3380 days ago)

> So I am looking for something other than CCSID 819 to know if I need to convert EBCDIC to ASCII prior to encoding the data being placed in my MIME file to be emailed.

It's guess work. 

> I am thinking that the solution is going to be to find all programs that are creating files on the IFS using Open() and Write() and always specify O_CodePage.  I just don't know why this option was used for some types of files and not used for others.  

Yes, this is the right answer.  The more interesting question is this: If you have programs dumping EBCDIC characters into a file tagged with CCSID(819), what PC application is being used to read these files?

Oh, and don't use code page.  Use O_CCSID.  Here is some code that I use to make UTF16-encoded stream files:

D*                                            Reading Only             
D O_RDONLY        C                   1                                
D*                                            Writing Only             
D O_WRONLY        C                   2                                
D*                                            Reading & Writing        
D O_RDWR          C                   4                                
D*                                            Create File if not exist
D O_CREAT         C                   8                                
D*                                            Exclusively create       
D O_EXCL          C                   16                               
D*                                            Assign a CCSID           
D O_CCSID         C                   32                               
D*                                            Truncate File to 0 bytes
D O_TRUNC         C                   64                               
D*                                            Append to File           
D O_APPEND        C                   256                              
D*                                            Synchronous write        
D O_SYNC          C                   1024                             
D*                                            Sync write, data only    
D O_DSYNC         C                   2048                            
D*                                            Sync read               
D O_RSYNC         C                   4096                            
D*                                            No controlling terminal
D O_NOCTTY        C                   32768                           
D*                                            Share with readers only
D O_SHARE_RDONLY  C                   65536                           
D*                                            Share with writers only
D O_SHARE_WRONLY  C                   131072                          
D*                                            Share with read & write
D O_SHARE_RDWR    C                   262144                          
D*                                            Share with nobody.      
D O_SHARE_NONE    C                   524288                          
D*                                            Assign a code page      
D O_CODEPAGE      C                   8388608                         
D*                                            Open in text-mode       
D O_TEXTDATA      C                   16777216                        
D*                                            Allow text translation  
D*                                            on newly created file.  
D* Note: O_TEXT_CREAT requires all of the following flags to work:    
D*           O_CREAT+O_TEXTDATA+(O_CODEPAGE or O_CCSID)               
D O_TEXT_CREAT    C                   33554432                       
D*                                            Inherit mode from dir
D O_INHERITMODE   C                   134217728                    
D*                                            Large file access    
D*                                            (for >2GB files)     
D O_LARGEFILE     C                   536870912      

dcl-s utfline varucs2(16383) ccsid(1200);
dcl-s CCSID_MS_ASCII uns(10) inz(1252);   
dcl-s CCSID_UTF16BE  uns(10) inz(1200);   
dcl-s CCSID_EBCDIC   uns(10) inz(37);                  

bufOutfd = open( ifsOutFile : O_WRONLY + O_CREAT + O_INHERITMODE +
                              O_TRUNC + O_CCSID                   
                            : S_IRWXU                             
                            : CCSID_UTF16BE);                    


In older code, sometimes we used two open()s to set the code page.  One with O_CREAT and close() then another open() with O_WRONLY.  With newer IBM i releases this is not necessary - the code posted above creates a stream file with the proper CCSID using just the one open().

  --buck

Posted by: TFisher
Premium member *
Comment on: How to know if data in an IFS file is ASCII or not
Posted: 9 years 3 months 2 days 13 hours 36 minutes ago
Edited: Fri, 23 Jan, 2015 at 19:41:59 (3380 days ago)

These files are strickly for emailing.  So the applications convert the data or spool file to a file on the IFS and then they call some old email program that will read these files and build a MIME file.  So no PC program ever sees these files in /TMP.

I have till the end of the month to get ALL email attachments being sent from our AS/400 encoded.  We have a problem where attachments that are not encoded are being altered and will not display right if it's HTML and will not open correctly in Excel if it's a CSV file.  Adding another 50+ programs to my list of changes is going to be a challenge....even then I may not find them all.  This is why I was hoping to be able to find another way to know if the content is truely ASCII or if it's EBCDIC.  

Thanks!

 

I am still going to consider reading a block of the data and looking at the characters I find to make a good assumption. 

Posted by: starbuck5250
Premium member *
Comment on: How to know if data in an IFS file is ASCII or not
Posted: 9 years 3 months 1 days 23 hours 12 minutes ago

I'm not trying to be argumentative.  I'm trying to tease out the assumptions.  The new program is having issues with character encoding.  You found that out because the end user (the PC email client) could not read some of the emails.  The best we can figure is that's because sometimes the source data is in EBCDIC and some is in ASCII - the CCSID tag is incorrect.  Why does the current application chain not have the same problem?  If we can figure out what the difference is, we can might be able to apply that to the new process.  In fact, this differenc may be the root problem that caused you to embark on switching from MIME to BASE64.  Because frankly, if some downstream process is altering MIME data, it will do the same thing to BASE64.

With respect to a guessing algorithm, I myself assess the content with numbers and spaces.  ASCII numbers are x'30', 31, 32.  EBCDIC are x'f0', f1, f2, etc.  You are very unlikely to see x'F0' in an ASCII text file because ASCII nominally ends at x'7f'.  ASCII space is x'20'; EBCDIC x'40'.  If I see a lot of x'40' (which show up in Notepad as @) then I'm pretty sure I have an EBCDIC encoding.  Likewise, you aren't likelt to see a lot of x'20' in an EBCDIC file because that's below the range we can enter on our keyboard.

1) Sort characters into bins (an array of 256 characters)

2) Count the characters between F0 and F9

3) Count the characters between 30 an 39

4) Count the number of 20

5) count the number of 40

If you have a lot of 2 and 5, it's probably EBCDIC.  A lot of 3 and 4, probably ASCII. 

Posted by: TFisher
Premium member *
Comment on: How to know if data in an IFS file is ASCII or not
Posted: 9 years 3 months 23 hours 55 minutes ago

Thanks, I got this working Friday night.  Now I can let the server guys make the change from Domino to Exchange.

Posted by: Ringer
Premium member *
Comment on: How to know if data in an IFS file is ASCII or not
Posted: 9 years 2 months 30 days 16 hours 16 minutes ago
Edited: Mon, 26 Jan, 2015 at 14:39:19 (3377 days ago)

Ah so the CCSID is wrong sometimes. That's a good sniff Buck. 

Posted by: TFisher
Premium member *
Comment on: How to know if data in an IFS file is ASCII or not
Posted: 9 years 2 months 30 days 14 hours 13 minutes ago

Yes, I thought I made that clear originally...sorry if I didn't.