Reading EDI Formatted Files
Asked Answered
H

3

13

I'm new to EDI, and I have a question.

I have read that you can get most of what you need about an EDI format by looking at the last 3 characters of the ISA line. This is fine if every EDI used line breaks to separate entities, but I have found that many are single line files with any number of characters used as breaks. I have noticed that the VERY last character in every EDI I've parsed is the break character. I've looked at a few hundred, and have found no exceptions to this. If I first grab that character, and use that to obtain the last 3 of the ISA line, should I reasonably expect that I will be able to parse data from an EDI?

I don't know if this helps, but the EDI 'types' in question tend to be 850, 875. I'm not sure if that is a standard or not, but it may be worth mentioning.

Hove answered 22/1, 2010 at 14:33 Comment(5)
EDI in 2010? I thought XML is just a little easier to work withLallage
90% of revenue is coming from EDI. Walmart, Target, Toys R Us, and other big retailers make up 50% of that. We don't use EDI because we like it, we use it because our customers do. It's not worth the time/money for any of these big retailers to change to another format, because it works.Hove
When I say 90% of revenue, I hope it was understood that I meant 90% of my company's revenue.Hove
edi.stedi.com/inspectorPanhandle
Might be a surprise to you but EDI is just EDI generally speaking an covers all your data integration. Edifact and X12 are still live and kicking, quite easy to interpret with a little guidance as both format are well documentedQuestionless
L
15

the transaction type of edi doesn't really matter (850 = order, 875 = grocery po). having written a few edi parsers, here are a few things i've found:

you should be able to count on the ISA (and the ISA only) being fixed width (105 characters if memory serves). strip off the first 105 characters. everything after that and before the first occurance of "GS" is your line terminator (this can be anything, include a 0x07 - the beep - so watch out if you're outputting to stdout for debugging or you may have a bunch of beeps coming out of the speaker). normally this is 1 or 2 characters, sometimes it can be more (if the person sending you the data adds an extra terminator for some reason). once you have the line terminator, you can get the segment (field) delimiter. i normally pull the 3 character of the GS line and use that, though the 4th character of the ISA line should work as well.

also be aware that you can get a file with multiple ISA's in it. in that case you cannot count on the line or field separators being the same within each ISA.

another thing .. it is also possible (again, not sure if its spec) for an edi file to have a variable length ISA. this is very rare, but i had to accommodate it. if that happens you have to parse the line into its fields. the last field in the ISA is only a character long, so you can determine the real length of the ISA from it. if it were me, i wouldn't worry about this unless you see a file like it. it is a rare occurance.

what i've said above may not be to the letter of the "spec" ... that is, i'm not sure its legal to have different line separators in the same file, but in different ISAs, but it is technically possible and I accommodate it because i have to process files that come through in that manner. the edi processor i use processes upwards of 5000 files a day with over 3000 possible sources of data (so i see a lot of weird stuff).

best regards, don

Label answered 22/1, 2010 at 14:51 Comment(12)
Don, that was a great response. I figured I could count on the last char of the file to be my line terminator, but that would only be true if a single ISA is in use, and even then, it doesn't accommodate situations where more than 1 char is used as a line terminator. I haven't seen more than one ISA per EDI where I work, nor anything over a single char as a line terminator, but I might as well be prepared for it.Hove
ya be careful with that. i see a lot of files where people put an extra character or two after the line terminator ... usually a null or two (0x00). what i do is first normalize the line terminators in the file - that is re-write the file with 0x0D/0x0A as the line terminator. i do that because it makes the file easy to read in a text editor. then i go through the file and make sure that for every ISA there is a matching IEA. if there's extra data after the IEA, i usually discard it. if the data after IEA starts with ISAt that means its a partial transmission (error condition).Label
oops, i meant to say "an extra character or two after the LAST line terminator" ... at the end of the file.Label
Don, I'm curious as to whether or not you have come across a SEGMENT terminator that is more than 1 char. I know this can be true of line terminators (although I have not seen this yet), as you alerted me to it.Hove
And yet another question. You mentioned that a file may have multiple ISA's, which means field terminators may change, but have you ever come cross a file with multiple line terminators? Seems to me that a situation like that would be a pain to parse.Hove
i've seen it, but it is awfully rare - i deal with a lot of involved parties and sometimes one of them will put previously separate data into one file and cause this. my software handles it (i think). i wouldn't worry about it. it is probably easier to get the sender to fix things than it is to accommodate that.Label
Don, thank you for all of the answers. I know have a fairly functional EDI parser that performs look-ups on our client database. Our CFO will thank you when I'm done with this project.Hove
Glad I could help. I've been out of town a couple days and not sure if i properly responded to you question about a segment terminator being more than one character ... i've never seen that.Label
5000 files is alot do you mind sharing what library youre using to parse the files or do you write your own?Bathtub
@bakalolo sorry for the very late response, I haven't been on much lately. I wrote it myself and don't really have access to the code anymore as I left that company about 8 years ago. The only way we could get the performance needed was a custom parser.Label
@DonDickinson Can you do edi file validation or ack generation with custom parser?Bathtub
As I recall, the parser was validating that the ISA/GS/ST segments were properly nested. I ignored invalid lines of data and didn't check that the correct number of fields were sent on a line ... as long as the ones I cared about were present. I did generate 997's for all docs received.Label
A
0

EDI content is composed of segments and elements.

To parse it, you will need to break it up into segments first, and then elements like so (in PHP):

<?php 

$edi = "YOUR EDIT STRING!";
$segment_delimeter = "~";
$element_delimeter = "*";

//First break it into segments
$segments = explode($segment_delimiter, $edi);

//Now break each segment into elements
$segs_and_elems = array();
foreach($segments as $segment){
    $segs_and_elems[] = explode(element_delimeter, $segment);
}

//To echo out what type of EDI this is for example:
foreach($segs_and_elems as $seg){
    if($seg[0] == "GS"){ echo($seg[1]); }
}

?>

Hope this helps get you started.

Astrograph answered 1/12, 2014 at 21:17 Comment(0)
A
0

For header information the following java will let you get the basic info pretty easy. C# has the split as well and the code looks very similar

try {
    String sCurrentLine;
    fileContent = new BufferedReader(new FileReader(filePathName));

    sCurrentLine = fileContent.readLine();

    // get the delimiter after ISA, if you know your field delimiter just force it.
    // we look at lots of different senders messages so never sure what it will be.

    delimiterElement = sCurrentLine.substring(3,1); // Grab the delimiter they are using
    String[] splitMessage = sCurrentLine.split(delimiterElement,16); // to get the messages if everything is on one line of course
    senderQualifier = splitMessage[5]; //who sent something we need fixed qualifier
    senderID = splitMessage[6]; //who sent something we need fixed alias
    ISA = splitMessage[13]; // Control number
    testIndicator = splitMessage[15]; 
    dateStamp = splitMessage[9];  
    timeStamp = splitMessage[10];

    ... do stuff with the pieces of info ...
Acanthocephalan answered 23/9, 2016 at 12:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.