Using IBM i? Need to create Excel, CSV, HTML, JSON, PDF, SPOOL reports? Learn more about the fastest and least expensive tool for the job: SQL iQuery.
I get in an archive that has for example & and &#E8; – etc. and I need them converted to the appropriate ebcdic values. Is there a function that unencodes the values in a string that is passed to it? currently I have written logic to do the most common I get but...
Eample:
Jack & Jill would become Jack & Jill
You can use the unescape() function in COZTOOLS.
cleanData = unEscape( 'Jack & Jill');
Try the RPG xml-sax opcode. It unencodes all the entities including < > + (hex), ' (decimal) and unicode too (2 byte chars).
Chris Ringer
I wouldn't bother with XML-SAX, I'd use XML-INTO. If you are on v6.1 or later there is almost never a reason to bother with the complexity of XML-SAX. It is there (mostly) for folks who had previous experience with XML parsers in other languages and want to continue to use it.
Don't get me wrong, if you know it, use it. But if you're an RPG programmer in the 99.9999% bracket (meaning you probably are or wouldn't be asking about escape characters) then I'd go with XML-INTO. Here's an example the does what you want it to do (note you did have some errors in your escape characters. For exampl &#E8; is wrong, it should be è (hex form). But anyway, here's an RPG solution not using any 3rd-party software at all:
D szXML S 512A Inz('<myxml> + D <encoded> Bob Cozzi + D likes – pickles + D è –+ D – pickles+ D </encoded> + D </myxml>') D encoded S 512A C MOVE *ON *INLR /free xml-into encoded %xml( szXML : 'path=myxml/encoded'); /end-free
Maybe we need clarification from clbirk, but his topic was that this is html, not xml.
As far as I know, HTML and XML share the same entitites.
http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
Chris R
Yes, but using RPG XML-xxx opcodes probably won't work on HTML (though I've never tried it myself, so can't say for sure).
Well formed HTML can be parsed (which was the point of XHTML a few years ago), such as every <p> having a </p> and <br> instead being <br />, etc. If clbirk already has the extracted string, then wrap any <tag> around it and parse it as XML.
The XHTML !DOCTYPE spec died because most HTML web pages are not well formed and would break them (not parse correctly).
http://en.wikipedia.org/wiki/XHTML
Chris R
I actually tested my example (above) and it works--doesn't care what it is just has to be well formed XML-like HTML if it is HTML. But it sounded like he just wanted to un-escape the URL-encoded data stream. To do that, he can smash it between an XML tag and let XML-INTO have at it.
Further testing shows that XML-INTO (and I assume XML-SAX) will not un-escape (decode) a URL-encoded string. Therefore, URL encoding such as %23 the plus sign and so on, are not decoded by XML-INTO. Only stuff embedded in XML using full escape codes (not URL-encoded shortcuts) will be decoded.
It is xml (if you can call it that). It is an archive that I get with orders and the person who "designed" the "xml", well I shall byte my tongue.
These are typically the & ; sort of things, and of course many like spaces are not encoded.
<info1>Jack & Jill</info1>
Some times they double encode it, like:
<info1>Jack &amp; Jill</info1> They are not smart enough to realize they are calling the encoding function twice and I haven't been able to get that fixed by the outside developer.
Thanks for the information.