For those that are experts in lexing and parsing... I am attempting to write a series of programs in perl that would parse out IBM mainframe z/OS JCL for a variety of purposes, but am hitting a roadblock in methodology. I am mostly following the lexing/parsing ideology put forth in "Higher Order Perl" by Mark Jason Dominus, but there are some things that I can't quite figure out how to do.
JCL has what's called inline data, which is very similar to "here" documents. I am not quite sure how to lex these into tokens.
The layout for inline data is as follows:
//DDNAME DD *
this is the inline data
this is some more inline data
/*
...
Conventionally, the "*" after the "DD" signifies that following lines are the inline data itself, terminated by either "/*" or the next valid JCL record (starting with "//" in the first 2 columns).
More advanced, the inline data could appear as such:
//DDNAME DD *,DLM=ZZ
//THIS LOOKS LIKE JCL BUT IT'S ACTUALLY DATA
//MORE DATA MASQUERADING AS JCL
ZZ
...
Sometimes the inline data is itself JCL (perhaps to be pumped to a program or the internal reader, whatever).
But here's the rub. In JCL, the records are 80 bytes, fixed in length. Everything past column 72 (cols 73-80) is a "comment". As well, everything following a blank that follows valid JCL is likewise a comment. Since I am looking to manipulate JCL in my programs and spit it back out, I'd like to capture comments so that I can preserve them.
So, here's an example of inline comments in the case of inline data:
//DDNAME DD *,DLM=ZZ THIS IS A COMMENT COL73DAT
data
...
ZZ
...more JCL
I originally thought that I could have my top-most lexer pull in a line of JCL and immediately create a non-token for cols 1-72 and then a token (['COL73COMMENT',$1]) for the column 73 comment, if any. This would then pass downstream to the next iterator/tokenizer a string of the cols 1-72 text followed by the col73 token.
But how would I, downstream from there, grab the inline data? I'd originally figured that the top-most tokenizer could look for a "DD \*(,DLM=(\S*))" (or the like) and then just keep pulling records from the feeding iterator until it hit the delimiter or a valid JCL starter ("//").
But you may see the issue here... I can't have 2 topmost tokenizers... either the tokenizer that looks for COL73 comments must be the top or the tokenizer that gets inline data must be at the top.
I imagine that perl parsers have the same challenge, since seeing
<<DELIM
isn't necessarily the end of the line, followed by the here document data. After all, you could see perl like:
my $this=$obj->ingest(<<DELIM)->reformat();
inline here document data
more data
DELIM
How would the tokenizer/parser know to tokenize the ")->reformat();" and then still grab the following records as-is? In the case of the inline JCL data, those lines are passed as-is, cols 73-80 are NOT comments in that case...
So, any takers on this? I know there will be tons of questions clarifying my needs and I'm happy to clarify as much as is needed.
Thanks in advance for any help...
/*
is not necessary to delimit the end of instream data forDD *
, but it is required forDD DATA
without theDLM
keyword. Also, JES2 JECL uses/*
for its commands and JES3 uses//*
(yes, the same as a comment). – Verisimilar