There is another approach, one that I learned back in the 1970's when writing a text editor (or word processing) program.
Doesn't matter if the info is in 1 record, 2 records, if the information is in 1024 byte long records, or 128 byte long records, etc.
What you do is something of this nature:
Get a word
examine the word to see if it is an email address, if so write it out.
so what is a word?
when you call get a word, it calls repeatedly get a character and get a character takes one character at a time out of the input "array" and when necessary reads the next character.
You can define what the delimitation of a word is. And you can pass back control of such so that you know you are done getting a word.
For example, say you have this string
<name>John</name><email>email@example.com</email><address>100 main street</address>....
FIrst word you could get could be <name>
second word you get could be John
third word </name>
fourth word <email>
fifth word firstname.lastname@example.org
sixth word </email>
when you examine each word to look for an @ sign as one criteria, all fail but the fifth word. You might check further some rules like a period after the @ sign, etc.
It is a structured approach that you call subroutines (for example).
The book is called Software Tools written by Brian Kernighan (1976) as I recall, they were all connected with bell labs and unix and even ratfor. It is about structured programming and getword and getchar, etc. were part of the concepts.
I had it as part of a course in computer science at Purdue in 1977. I can tell you that I use this concept many many times over the years. I get some poor excuse for xml and so I take it and break it into fields like:
<name> John Smith
<address> 100 Main St
etc. so that I can uitlize it in a program (because there is no rhyme/reason to how some of it is). and I actually download it from the website through a webservice, go and write it into a file 1 byte long and yep, do a get character and get word sort of thing.
Yes the book was originally written with coding that is fortran or ratfor (rational fortran) and it is a process that they really go and build a "precompiler" to add structured programming to fortran. But the concepts are great. I haven't read the book since 1977, but like I said I have used the concepts many times over the years of taking some less than desireable input and turning it into what is needed.
So back to what I was saying, while one could take the 1024 byte records and write them out to a file 1 byte long (which would be "easiest"), you can control that all in getchar routine which is called by getword and once you get a word, you examine it to see if it is an email address.
You could take it a bit further if you wanted once you think you got an email address and go call a webservice (maybe host a php site on your i or on a $4.95/month site) that the php page simply calls the checkdnsrr record to verify that it is a valid domain name (that it has an MX record or an A record).