Midrange News for the IBM i Community


Posted by: aahhdd
API to extract email address from a given string
has no ratings.
Published: 15 Jun 2012
Revised: 23 Jan 2013 - 4108 days ago
Last viewed on: 22 Apr 2024 (6083 views) 

Using IBM i? Need to create Excel, CSV, HTML, JSON, PDF, SPOOL reports? Learn more about the fastest and least expensive tool for the job: SQL iQuery.

API to extract email address from a given string Published by: aahhdd on 15 Jun 2012 view comments(13)

Return to midrangenews.com home page.
Sort Ascend | Descend

COMMENTS

(Sign in to Post a Comment)
Posted by: bobcozzi
Site Admin ****
Chicagoland
Comment on: API to extract email address from a given string
Posted: 11 years 10 months 4 days 17 hours 21 minutes ago

I'm not clear what you're looking for... is the string of data an XML string or is it simply an email address within a long string that may also contain XML?

If it is withing an XML string, such as <email>bob@ibm.com</email> then using the native RPG IV XML-INTO opcode will parse the string and extract the data you need into a field.

Posted by: aahhdd
Premium member *
Comment on: API to extract email address from a given string
Posted: 11 years 10 months 4 days 16 hours 22 minutes ago

Thanks bob,

 

I will Provide some more light on the questions:

 

It is an XML string that will be of a lenght more than 1024. Let us take it is of lenght 2048 for now. So in PF it will be in FLD 1 on RRN 1 and RRN 2 (as FLD 1 is of 1024 lenght).

So chanllenge here is:

(a) it is possible that XML may contain an email address in some other tag also (i.e. other than <Email> tag which is a Known Places for Existance of email address in XML) So we need to search for email address at all places, One approach can be identifying '@' symbol and picking information from ther, but it may be a challange as we can not be sure on number of characters in email address before and after '@'.

 

(b) It is also possible that email address spreads from FLD1 at RRN1 to FLD1 at RRn2.

example:nemail address ABC@IBM.COM can be presen as:
FLD1 at RRN1= '.......ABC@'

FLD1 At RRN2 = 'IBM.com....'

 

We need to take care for such cases also and furhter we need to take user inputs to replace these addresses with the New address at the same location so that other fields in XML should not get disturbed.

 

 

 

Posted by: bobcozzi
Site Admin ****
Chicagoland
Comment on: API to extract email address from a given string
Posted: 11 years 10 months 4 days 15 hours 33 minutes ago

Your choice is to use a defined XML string, and use XML-INTO to extract the email addresses from "known" XML tokens/nodes, or use %SCAN('@') and do exactly what you suggested, look for the start and end of the email address(es) you detect.

Posted by: neilrh
Premium member *
Jackson, MI
Comment on: API to extract email address from a given string
Posted: 11 years 10 months 4 days 15 hours 26 minutes ago

I suppose one option is to scan for "@", and then scan for next end tag "</", extract the end tag name </["tagname"]>, then XML-INTO for that identified tag name.

As to folding data from 2 records, read RRN1, read RRN2 - concat the RRN1.FLD1 + RRN2.FLD1, scan that.  When you move to RRN3 change your scan field to concat RRN2.FLD1 + RRN3.FLD1, and just keep going.  Make sure to compare what you extract to stuff you already extracted.

Posted by: clbirk
Premium member *
Comment on: API to extract email address from a given string
Posted: 11 years 10 months 4 days 7 hours 44 minutes ago

There is another approach, one that I learned back in the 1970's when writing a text editor (or word processing) program.

 

Doesn't matter if the info is in 1 record, 2 records, if the information is in 1024 byte long records, or 128 byte long records, etc.

 

What you do is something of this nature:

Get a word

examine the word to see if it is an email address, if so write it out.

 

so what is a word?

when you call get a word, it calls repeatedly get a character and get a character takes one character at a time out of the input "array" and when necessary reads the next character.

 

You can define what the delimitation of a word is. And you can pass back control of such so that you know you are done getting a word.

 

For example, say you have this string
<name>John</name><email>john@doe.com</email><address>100 main street</address>....

 

FIrst word you could get could be <name> 

second word you get could be John

third word </name>

fourth word <email>

fifth word john@doe.com

sixth word </email>

etc.

 

when you examine each word to look for an @ sign as one criteria, all fail but the fifth word. You might check further some rules like a period after the @ sign, etc.

 

It is a structured approach that you call subroutines (for example).

 

The book is called Software Tools written by Brian Kernighan (1976) as I recall, they were all connected with bell labs and unix and even ratfor. It is about structured programming and getword and getchar, etc. were part of the concepts.

 

I had it as part of a course in computer science at Purdue in 1977. I can tell you that I use this concept many many times over the years. I get some poor excuse for xml and so I take it and break it into fields like:

<name>             John Smith

<address>          100 Main St

<phone>           111-111-1111

 

etc. so that I can uitlize it in a program (because there is no rhyme/reason to how some of it is). and I actually download it from the website through a webservice, go and write it into a file 1 byte long and yep, do a get character and get word sort of thing.

Yes the book was originally written with coding that is fortran or ratfor (rational fortran) and it is a process that they really go and build a "precompiler" to add structured programming to fortran. But the concepts are great. I haven't read the book since 1977, but like I said I have used the concepts many times over the years of taking some less than desireable input and turning it into what is needed.

 

So back to what I was saying, while one could take the 1024 byte records and write them out to a file 1 byte long (which would be "easiest"), you can control that all in getchar routine which is called by getword and once you get a word, you examine it to see if it is an email address.

 

You could take it a bit further if you wanted once you think you got an email address and go call a webservice (maybe host a php site on your i or on a $4.95/month site) that the php page simply calls the checkdnsrr record to verify that it is a valid domain name (that it has an MX record or an A record).

 

chris

 

 

     

Posted by: TFisher
Premium member *
Comment on: API to extract email address from a given string
Posted: 11 years 10 months 3 days 20 hours 30 minutes ago

For the most part I agree with clbirk.  However, you do not need to examine each and every "word" to find email addresses.  Scan for '@' to find possible email addresses, then extract the word and see if it's an email address.  Then continue scanning from the end of the previous "word".

Posted by: DaleB
Premium member *
Reading, PA
Comment on: API to extract email address from a given string
Posted: 11 years 10 months 3 days 18 hours 8 minutes ago

You can find e-mail embedded in an arbitrary string using a regular expression. One way to use regular expressions is with the C/C++ Run Time Library Functions (SC41-5607). Look at <regex.h>, which references four functions (compile, execute, return error message, and free memory). For a discussion of what makes a good regex for validating e-mail addresses, see http://www.regular-expressions.info/email.html.

Posted by: Ringer
Premium member *
Comment on: API to extract email address from a given string
Posted: 11 years 10 months 3 days 16 hours 24 minutes ago
Edited: Tue, 19 Jun, 2012 at 09:09:20 (4326 days ago)

You really need to use an XML parser. Why? Because the email address could have CR/LFs in them. And the '@' and other characters might have certain entity characters that are encoded like this. An XML parser already handles/unencodes these.

Bob.O&apos;Reilly&#64;somewhere.com

Chris Ringer

Posted by: aahhdd
Premium member *
Comment on: API to extract email address from a given string
Posted: 11 years 10 months 2 days 59 minutes ago
Thank you all for valuable suggestions. When I have started this task and was preparing blue print for coding, I thought to use Search ‘@’ on concatenated (FLD1 at RRN1 + FLD1 at RRN2) . Once we get ‘@’ we will try to read the characters prior and after the ‘@’ to gather email id info. But challenge comes to delimit the scope of reading characters prior and after the ‘@’. Because the base string on which we are running the Search for ‘@’ is a employee data string where all information are rammed in to each other (means there is no blank space after each set of information, i.e. last name will start just after first name ends, mail id will start just after first name ends and there are no spaces in between last name and first name , Last name and email ids and so on.) so issue here is with the scope of reading characters before and after ‘@’ string. Example: BOBcozziBOB@ABC.COMrpgexpert Here, First name , last name, Email Id, Designation etc all fields are adjacent to each other and no spaces in between. So once I will get ‘@’, how to decide that I have to stop before ‘i’ of cozzi. Is there any way to sort this issue. Else I think I should recommend to change the way this string is sent. It should have blanks after each field.
Posted by: DaleB
Premium member *
Reading, PA
Comment on: API to extract email address from a given string
Posted: 11 years 10 months 1 days 18 hours 6 minutes ago

If there's no separator whatsoever, this problem is unsolvable. For that matter, it's not just the e-mail; there's no way to separate first name from last name from anything else. In your original post you said it was XML. Where are the tags?

Posted by: clbirk
Premium member *
Comment on: API to extract email address from a given string
Posted: 11 years 10 months 1 days 14 hours 12 minutes ago

I thought too it was xml you were talking about, not a long string of garbly gook. As you said, you might want to get the input changed, I would not just go for a space, I would go for like a tab which has a particular hexidecimal value that you can go search for. If you didn't want a tab, then how about a cr or even a vertical bar  | etc.

 

 

Posted by: aahhdd
Premium member *
Comment on: API to extract email address from a given string
Posted: 11 years 10 months 22 hours 53 minutes ago
It is an XML, where 'employeedetails' tag is a part of complete XML and will store employee data in this tag. As it is also suggested by Daleb, there is no way to resolve this issue. So there has to be a separator in the 'emplyeedetails' tag to distinguish the information. As suggested by clbirk Using a ‘|’ is also a good option. Thank you for the comments. I will discuss with the concerned party to change the way they are sending information.
Posted by: neilrh
Premium member *
Jackson, MI
Comment on: API to extract email address from a given string
Posted: 11 years 10 months 18 hours 24 minutes ago

If it were good xml then the data should be <employeedetails><firstname>Bob</firstname><lastname>Cozzi</lastname>.  The whole point of xml is to provide a data map.