Read COBOL COMP data in C#
Asked Answered
P

1

7

I'm new to COBOL and I have been trying to read record information from a text file that is an output from the table.

Most non-comp data-types i'm okay with, it's the 'COMP' ones i'm getting stuck on.

I've been trying to figure this out all day today reading as much as I can on this.

The date fields below are the ones I can't convert to a date-string:

05 VALDATE          PIC 9(6) COMP
05 PAYDATE          PIC 9(6) COMP
05 SYSDATE          PIC 9(6) COMP

From my understanding all those types above are going to be 4 bytes each in the file.

They are supposed to be dates that should represent YYMMDD, but the data just doesn't seem to be as small as that. I've looked into EBCDIC and reversing the byte[] data and using BitConverter.ToUNIT32() and changing the Encoding used to read the file with no luck.

I read that dates that are computed into an integer are stored as the number of days from Jan 1st 1601, hence why code below is trying to add the value to 1601. (http://www.techtricky.com/cobol-date-functions-list-add-find-duration/)

My issue is that the either the data from the text file just isn't right or i'm missing a step to get what should be a date similar to YYMMDD.

The data for the 3 above are as follows:

[ 32] [237] [ 44] [  4] | 00100000 11101101 00101100 00000100
[ 33] [ 14] [ 32] [237] | 00100001 00001110 00100000 11101101
[131] [ 48] [ 48] [ 48] | 10000011 00110000 00110000 00110000

And how i'm opening the file, I've changed the encoding to ascii with no luck:

 using (BinaryReader reader = new BinaryReader(File.Open(nFilePath, FileMode.Open), Encoding.Default))

Code used to try and read the COMP fields:

  public class DateFromUIntExtractor : LineExtractor
  {
    public DateFromUIntExtractor() : base(4)
    {
    }

    public override string ExtractText(BinaryReader nReader)
    {
      // e.g 32,237,44,44, included but commented out things i've tried
      byte[] data = nReader.ReadBytes(Length); // Length = 4

      //Array.Reverse(data); - Makes num = 552414212
      //data = ConvertAsciiToEbcdic(data);

      int num = BitConverter.ToUInt32(data, 0);
      // in this example num = 70053152

      DateTime date = new DateTime(1601,1,1);
      date = date.AddDays(num); // Error : num is too big

      Extract = date.ToString("yyyyMMdd");
      return Extract;
    }
  }

Is the data malformed? Or am i missing something?

UPDATE

The task i'm trying to accomplish is to replicate a COBOL program that transforms the data from one definition into another but in CSV format, as the program outputs a .dat file.

Source

My inexperienced interpretation of the source definition is that the data in the text file is either a PUA-ICGROUP or PUA-PUGROUP. Looking at the COBOL program it chooses PUA-ICGROUP when PUA-HEADER>PUA-KEY>PUA-RTYPE = "03", everything else is PUA-PUGROUP.

C-WRITE-START.
    IF  PUA-RTYPE = 3 THEN
        PERFORM C-WRITE-A
    ELSE
        PERFORM C-WRITE-B
    END-IF.

C-WRITE-EXIT.
    EXIT.

Definition

01  DLRPUARC.
    03  PUA-HEADER.
        05  PUA-KEY.
            07  PUA-CDELIM         PIC 99.
            07  PUA-SUPNO          PIC 9(7).
            07  PUA-RTYPE          PIC 99.
            07  PUA-REF            PIC 9(9).
            07  PUA-SEQ            PIC 999.
        05  PUA-ALTKEY.
            07  PUA-ACDELIM        PIC 99.
            07  PUA-ASUPNO         PIC 9(7).
            07  PUA-ATRNDATE       PIC 9(6).
            07  PUA-ARTYPE         PIC 99.
            07  PUA-AREF           PIC 9(9).
            07  PUA-ASEQ           PIC 999.
        05  FILLER                 PIC X(82).

    03  PUA-ICGROUP REDEFINES PUA-HEADER.
        05  FILLER                 PIC X(52).
        05  PUA-ICEXTREF           PIC X(10).
        05  PUA-ICORDNO            PIC 9(11).
        05  PUA-ICVALDATE          PIC 9(6) COMP.
        05  PUA-ICPAYDATE          PIC 9(6) COMP.
        05  PUA-ICSYSDATE          PIC 9(6) COMP.
        05  PUA-ICTRNVAL           PIC S9(9).
        05  PUA-ICCLRREF           PIC 9(6).
        05  PUA-ICDELDATE          PIC 9(6) COMP.
        05  PUA-ICOTHQRY           PIC X.
        05  PUA-ICPRCQRY           PIC X.
        05  PUA-ICMRSQRY           PIC X.
        05  PUA-ICDSCTYPE          PIC 9.
        05  PUA-ICDSCVAL           PIC S9(9) COMP.
        05  PUA-ICVATCODE          PIC 9.
        05  PUA-ICVATAMT           PIC S9(8) COMP.
        05  PUA-ICTAXAMT           PIC S9(8) COMP.
        05  PUA-ICMRSREF           PIC 9(6).
        05  PUA-ICSUBDIV           PIC 9.
        05  PUA-ICCOSTCTR          PIC X(5).
        05  PUA-ICSEQIND           PIC X.
        05  FILLER                 PIC X(4).

    03  PUA-PUGROUP REDEFINES PUA-HEADER.
        05  FILLER                 PIC X(52).
        05  PUA-PUEXTREF           PIC X(10).
        05  PUA-PUORDNO            PIC 9(11).
        05  PUA-PUVALDATE          PIC 9(6) COMP.
        05  FILLER                 PIC XXX.
        05  PUA-PUSYSDATE          PIC 9(6) COMP.
        05  PUA-PUTRNVAL           PIC S9(9).
        05  PUA-PUCLRREF           PIC 9(6).
        05  PUA-PUDELDATE          PIC 9(6) COMP.
        05  PUA-PUOTHQRY           PIC X.
        05  PUA-PUSUBDIV           PIC 9.
        05  FILLER                 PIC X(32).

Output Definition

01  OUT-A-REC.
    03  OUT-A-PUA-CDELIM             PIC 99.
    03  OUT-A-PUA-SUPNO              PIC 9(7).
    03  OUT-A-PUA-RTYPE              PIC 99.
    03  OUT-A-PUA-REF                PIC 9(9).
    03  OUT-A-PUA-SEQ                PIC 999.
    03  OUT-A-PUA-ATRNDATE           PIC 9(8).
    03  OUT-A-PUA-ICEXTREF           PIC X(10).
    03  OUT-A-PUA-ICORDNO            PIC 9(11).
    03  OUT-A-PUA-ICVALDATE          PIC 9(8).
    03  OUT-A-PUA-ICPAYDATE          PIC 9(8).
    03  OUT-A-PUA-ICSYSDATE          PIC 9(8).
    03  OUT-A-PUA-ICTRNVAL           PIC S9(9) SIGN LEADING SEPARATE.
    03  OUT-A-PUA-ICCLRREF           PIC 9(6).
    03  OUT-A-PUA-ICDELDATE          PIC 9(8).
    03  OUT-A-PUA-ICOTHQRY           PIC X.
    03  OUT-A-PUA-ICPRCQRY           PIC X.
    03  OUT-A-PUA-ICMRSQRY           PIC X.
    03  OUT-A-PUA-ICDSCTYPE          PIC 9.
    03  OUT-A-PUA-ICDSCVAL           PIC S9(9) SIGN LEADING SEPARATE.
    03  OUT-A-PUA-ICVATCODE          PIC 9.
    03  OUT-A-PUA-ICVATAMT           PIC S9(8) SIGN LEADING SEPARATE.
    03  OUT-A-PUA-ICTAXAMT           PIC S9(8) SIGN LEADING SEPARATE.
    03  OUT-A-PUA-ICMRSREF           PIC 9(6).
    03  OUT-A-PUA-ICSUBDIV           PIC 9.
    03  OUT-A-PUA-ICCOSTCTR          PIC X(5).
    03  OUT-A-PUA-ICSEQIND           PIC X.
    03  OUT-A-CTRL-M                 PIC X.
    03  OUT-A-NL                     PIC X.

FD  F-OUTPUTB
    LABEL RECORDS OMITTED.

01  OUT-B-REC.
    03  OUT-B-PUA-CDELIM             PIC 99.
    03  OUT-B-PUA-SUPNO              PIC 9(7).
    03  OUT-B-PUA-RTYPE              PIC 99.
    03  OUT-B-PUA-REF                PIC 9(9).
    03  OUT-B-PUA-SEQ                PIC 999.
    03  OUT-B-PUA-ATRNDATE           PIC 9(8).
    03  OUT-B-PUA-PUEXTREF           PIC X(10).
    03  OUT-B-PUA-PUORDNO            PIC 9(11).
    03  OUT-B-PUA-PUVALDATE          PIC 9(8).
    03  OUT-B-PUA-PUSYSDATE          PIC 9(8).
    03  OUT-B-PUA-PUTRNVAL           PIC S9(9) SIGN LEADING SEPARATE.
    03  OUT-B-PUA-PUCLRREF           PIC 9(6).
    03  OUT-B-PUA-PUDELDATE          PIC 9(8).
    03  OUT-B-PUA-PUOTHQRY           PIC X.
    03  OUT-B-PUA-PUSUBDIV           PIC 9.
    03  OUT-B-CTRL-M                 PIC X.
    03  OUT-B-NL                     PIC X.

PROGRAM

Below is a small extract of what the cobol program is doing to the dates regardless of weather their source is COMP or not. (i did not write this code). it seems to be trying to fix the 2kY issue.

IF  PUA-ATRNDATE IS ZERO THEN
    MOVE ZERO                TO OUT-A-PUA-ATRNDATE
ELSE
    MOVE PUA-ATRNDATE        TO W-DATE-6DIGIT
    MOVE W-DATE-SEG1         TO W-DATE-YY
    MOVE W-DATE-SEG2         TO W-DATE-MM
    MOVE W-DATE-SEG3         TO W-DATE-DD
    IF W-DATE-YY > 50 THEN
        MOVE "19"            TO W-DATE-CC
    ELSE
        MOVE "20"            TO W-DATE-CC
    END-IF
    MOVE W-DATE-CCYYMMDD     TO OUT-A-PUA-ATRNDATE
END-IF.

MOVE PUA-ICEXTREF            TO OUT-A-PUA-ICEXTREF.
MOVE PUA-ICORDNO             TO OUT-A-PUA-ICORDNO.

IF  PUA-ICVALDATE IS ZERO THEN
    MOVE ZERO                TO OUT-A-PUA-ICVALDATE
ELSE
    MOVE PUA-ICVALDATE       TO W-DATE-6DIGIT
    MOVE W-DATE-SEG1         TO W-DATE-YY
    MOVE W-DATE-SEG2         TO W-DATE-MM
    MOVE W-DATE-SEG3         TO W-DATE-DD
    IF W-DATE-YY > 50 THEN
        MOVE "19"            TO W-DATE-CC
    ELSE
        MOVE "20"            TO W-DATE-CC
    END-IF
    MOVE W-DATE-CCYYMMDD     TO OUT-A-PUA-ICVALDATE
END-IF.

Program Working-Storage Section

01  W-DATE-6DIGIT               PIC 9(6).
01  W-DATE-6DIGIT-REDEF REDEFINES W-DATE-6DIGIT.
    03  W-DATE-SEG1             PIC 99.
    03  W-DATE-SEG2             PIC 99.
    03  W-DATE-SEG3             PIC 99.

01  W-DATE-CCYYMMDD             PIC 9(8).
01  W-DATE-CCYYMMDD-REDEF REDEFINES W-DATE-CCYYMMDD.
    03  W-DATE-CC               PIC 99.
    03  W-DATE-YY               PIC 99.
    03  W-DATE-MM               PIC 99.
    03  W-DATE-DD               PIC 99.

The DATA

Copied from Notepad++, each line starts at '220...' and end column is 135 before going onto next line, meaning length is 134(?)

    2200010010300005463400022000100106062003000054634000062703    09720200000 í,! íƒ00056319D001144ÕšNNN0    1 G¨    000000197202G    
    2200010010300005463500022000100106062903000054635000062858    09720200000 í, í" íƒ00082838{050906±RNNN0    1 áð    000000197202G    
    2200010010300005465500022000100106073003000054655000063378    09720200000 í í† í00179637A050906±RNNN0    1     000000197202G    

Noticed that above is missing some symbols:

    2200010010300005463400022000100106062003000054634000062703    09720200000 í,[EOT]![SO] íƒ00056319D001144[SOH]ÕšNNN0    1 [SOH]G¨    000000197202G    
    2200010010300005463500022000100106062903000054635000062858    09720200000 í, í" íƒ00082838{050906[SOH]±RNNN0    1 [SOH]áð    000000197202G    
    2200010010300005465500022000100106073003000054655000063378    09720200000 í í† í00179637A050906[SOH]±RNNN0    1 [EOT][NAK][EM]    000000197202G    

Update 2

I've acepted Rick Smith's answer below as when i put his data in i get the correct date-time values. So either my data is fudged or its somthing else as my data throws errors or date-time values 1000s of years in the future.

I've been able to get the ouput CSV of what these date time should actually be which are:

[using : int n = ((b[0] << 16) + (b[1] << 8) + b[2]);]

HEX: 0x20 0xED 0x2C
BIN: 32   237  44
INT: 2157868     (longer than 6 digit)
Actual DATE: 2006-07-16

HEX: 0x04 0x21 0x0e
BIN: 4    33   14
INT: 270606      (correct but segments are in reverse)
Actual DATE: 2006-06-27

HEX: 0x20 0xED 0x83
BIN: 32   237  131
INT: 2157955     (longer than 6 digits)
Actual DATE: 2006-08-03

Update 3

turns out it was bad data...

Paule answered 3/5, 2019 at 21:1 Comment(26)
COMP in COBOL is a binary datatype. The corresponding data type in c# is either FLOAT or DOUBLE. See: #42412171 - First get to read the data correctly in C# then worry about the date calculations later. If you can't read the data, use a COBOL compiler (some are free) to read the current file and produce a file with PIC 9(8) datatype so that you can use it directly. Note that Comp-1 Comp-2 and Comp-3 are different. I assume Comp refers to Comp-2.Gunilla
thank you, not sure how i'm reading it incorrectly. i read that COMP on its own was shorthand for COMP-4Paule
COMP in cobol is normally a big endian integer.Jasik
The numbers make no sense. It appears the fields should be 3 bytes each, big-endian binary. But the encoding may be changing the values. Just read the data into a byte array and provide those values. The rest we can deal with later.Pair
OK, Although I don't see Comp-4 in this list, this may help: microfocus.com/documentation/visual-cobol/VC40/EclWin/…Gunilla
@RickSmith is right. If the file is EBCDIC, then "EBCDIC to ASCII conversions must be done on a field-by-field basis if the record contains binary, BCD, floating point or non-separate-signed display numeric data" - Source: talsystems.com/tsihome_html/downloads/C2IEEE.htm - I suggest you do what I said in first comment or use a conversion tool.Gunilla
@Gunilla the file is not EBCDIC, the three occurrences of 48 are ASCII 000. This means no conversion.Pair
@RickSmith, Good point!Gunilla
This statement DateTime date = new DateTime(1601,1,1); should probably be DateTime date = new DateTime(1600,12,31); since 1600-12-31 is day zero..Pair
Forget about 1601 if the data internally should have YYDDMM Just read in as ASCII or EBCDIC [= one byte characters] (you should see quite quick on the unpacked data fields which of both are correct), then take the bytes at this point [= split fields by position], unpack, then create a new data with the parts. To ease the conversion ensure the data in the file gets in unpacked (as USAGE DISPLAY).Herminiahermione
What cobol compiler was used ???; can you confirm the the fields are 4 bytes long ???Jasik
Do you know what dates they represent ???.Jasik
Do you know the brand of COBOL compiler that produced the program that created the file?Shoplifter
@SimonSobisch That doesn't match the data at all - the sample bytes are not EBCDIC or ASCII and couldn't hold PIC 9(6) as either.Shoplifter
Please note the warning here. If you use the Default encoding to encode and decode data streamed between computers or retrieved at different times on the same computer, it may translate that data incorrectly.Pair
Thank you for all your comments, the data i'm reading from (the text file) is an extract, ill post the source definition and output definition,and first line of data should that help, i don't know the compiler used, I was given this task of replicating a cobol program that extracts the data to a csv (to be done in less than a day lol)Paule
COBOL Computational Fields Comp (Computational) Comp (with no suffix) leaves the choice of the data type to the compiler writer. The intent of this data type is to make it the most efficient format on any given machine, which is usually some binary format. Because of this, comp varies greatly between platforms, more than most other types.Pomona
Found out that the brand of the compiler might be ICOBOLPaule
The data does not match the record descriptions for either the A or B records. Each of those record descriptions identify a field with SIGN LEADING SEPARATE, this means there must be a + or - sign present in the record. There is not.Pair
The data source (text file) )is PUA-HEADER, which can take the form of PUA-ICGROUP or PUA-PUGROUP. The output definitions included are to show what this cobol program is doing with those values.Paule
Data was extracted using an 'icreorg command'Paule
Apparently, Notepad++ does not preserve [NUL] characters, so record alignment was off. The only valid dates I found were in PUA-ICDELDATE, which was 120218 for the first record and 110930 for the second and third records. At this point, I see the other dates as bad data.Pair
The documentation for ICOBOL may be of interest. In particular "The USAGE IS BINARY or COMPUTATIONAL clause specifies a twos-complement big-endian binary representation of the numeric item in the storage of the computer. The table below lists the bytes required to store BINARY and COMPUTATIONAL items." on page 196Idaline
yeah, found out the source data had null characters deliberately replaced with spaces.... now to recoup past couple days.Paule
Then you need to be aware that the fields: PUA-ICDSCVAL PIC S9(9) COMP, PUA-ICVATAMT PIC S9(8) COMP, and PUA-ICTAXAMT PIC 9(8) COMP; also contain spaces and it is likely those values will not translate correctly.Pair
Yep, it was every null char in the whole data file that i was given that was altered, person altering it thought Null char was bad. Got original file and it worked.Paule
P
2

I created a COBOL file with three date fields using the values contained in the first record of the sample data. The first and third dates are YYMMDD, the second is DDMMYY.

The dates are given in the code and have the same format as the dates you are trying to read, 3-bytes, big-endian binary.

   environment division.
   input-output section.
   file-control.
       select out-file assign "dates.dat"
           organization sequential
       .
   data division.
   file section.
   fd out-file.
   01 date-rec.
     02 date-1 comp pic 9(6).
     02 date-2 comp pic 9(6).
     02 date-3 comp pic 9(6).
   procedure division.
   begin.
       open output out-file
       move 060716 to date-1
       move 270606 to date-2
       move 060803 to date-3
       write date-rec
       close out-file
       stop run
       .

This C# program then reads those dates with a BinaryReader and displays the bytes, the decimal value, and the converted date. Notice that I commented /*, Encoding.Default*/, since it is not needed.

using System;
using System.Globalization;
using System.IO;

namespace ConsoleApp1
{
    class Program
    {
        static void Main(string[] args)
        {
            byte[] b = { 0, 0, 0 };
            string s;
            DateTime d = new DateTime();
            using (BinaryReader reader = new BinaryReader(File.Open(@"y:\dates.dat", FileMode.Open)/*, Encoding.Default*/))
            {
                for (int i = 0; i < 3; i++)  // Three dates in file
                {
                    b = reader.ReadBytes(b.Length);
                    Console.WriteLine("Bytes: {0}, {1}, {2}", b[0].ToString("X2"), b[1].ToString("X2"), b[2].ToString("X2"));
                    int n = ((b[0] << 16) + (b[1] << 8) + b[2]);
                    s = n.ToString("D6");
                    switch (i)
                    {
                        case 0:
                        case 2:
                            Console.WriteLine("Date(YYMMDD): {0}", s);
                            d = DateTime.ParseExact(s, "yyMMdd", CultureInfo.InvariantCulture);
                            Console.WriteLine("Date(yyyyMMdd): {0}", d.ToString("yyyyMMdd"));
                            break;
                        case 1:
                            Console.WriteLine("Date(DDMMYY): {0}", s);
                            d = DateTime.ParseExact(s, "ddMMyy", CultureInfo.InvariantCulture);
                            Console.WriteLine("Date(yyyyMMdd): {0}", d.ToString("yyyyMMdd"));
                            break;
                        default:
                            break;
                    }
                    Console.WriteLine("");
                }
            }
        }
    }
}

This is the output:

Bytes: 00, ED, 2C
Date(YYMMDD): 060716
Date(yyyyMMdd): 20060716

Bytes: 04, 21, 0E
Date(DDMMYY): 270606
Date(yyyyMMdd): 20060627

Bytes: 00, ED, 83
Date(YYMMDD): 060803
Date(yyyyMMdd): 20060803

The DateTime.ParseExact(s, "yyMMdd", CultureInfo.InvariantCulture); was from an answer to this question, String to DateTime conversion as per specified format.

Pair answered 7/5, 2019 at 1:7 Comment(3)
Thank you for your answer, using your values such as ( 0x0e, 0xf6, 0x87 ) it works so i'm marking it as correct answer, but on my end but my data is coming in as ( 0x20 0xED 0x2C ) and throws error when parsing to date time as the string s becomes '2157868' using n.ToString("D6"), "D8". I've been able to get confirmation that the three dates should be 20060716 20060627 20060803Paule
YYMMDD 060716 is 00 ED 2C, 060637 is 00 EC D3, and 060803 is 00 ED 83. It's a hack, but before converting the bytes to int, you could do if (b[0] == 32) { b[0] = 0; }. However, some of the other values, the [ 4] [ 33] [ 14], are not correct for the second date. I did notice that, when I loaded my file (dates.dat) into Notepad++, the 0x00 was changed to 0x20. Perhaps something similar is happening to the data you are processing.Pair
Thank you Rick, you've helped me save my sanity. Yeah Notepad++ does do that, Data i was given was bad and the NULL char was deliberately changed to space char (32).Paule

© 2022 - 2024 — McMap. All rights reserved.