How to pull data from Mainframe to Hadoop
Asked Answered
S

7

7

I have files in Mainframe. I want these data to be pushed to Hadoop(HDFS)/HIVE.

I can use Sqoop for the Mainframe DB2 database and import it to HIVE, but what about files (like COBOL,VASM etc.)

Is there any custom flume source that I can write or some alternative tool to use here?

Scenography answered 28/2, 2013 at 9:37 Comment(0)
G
8

COBOL is a programming language, not a file format. If what you need is to export files produced by COBOL programs, you can use the same technique as if those files were produced by C, C++, Java, Perl, PL/I, Rexx, etc.

In general, you will have three different data sources: flat files, VSAM files, and a DBMS such as DB2 or IMS.

DMBSs have export utilities to copy the data into flat files. Keep in mind that data in DB2 will likely be normalized and thus you likely need the contents of related tables in order to make sense of the data.

VSAM files can be exported to flat files via the IDCAMS utility.

I would strongly suggest you get the files into a text format before transferring them to another box with a different code page. Trying to deal with mixed text (which must have its code page translated) and binary (which must not have its code page translated but which likely must be converted from big endian to little endian) is harder than doing the conversion up front.

The conversion can likely be done via the SORT utility on the mainframe. Mainframe SORT utilities tend to have extensive data manipulation functions. There are other mechanisms you could use (other utilities, custom code written in the language of your choice, purchased packages) but this is what we tend to do in these circumstances.

Once you have your flat files converted such that all data is text, you can transfer them to your Hadoop boxes via FTP or SFTP or FTPS.

This isn't an exhaustive coverage of the topic, but it will get you started.

Greave answered 28/2, 2013 at 12:55 Comment(2)
Does IDCAMS work fine if my VSAM file has COM3 content in it ? Also, how realtime or near realtime replication this tool is capable of ?Simpkins
@Simpkins you would want to use your SORT utility to convert the packed decimal (COMP-3) data to a readable format. Real time replication could be accomplished via scheduling a job to run at appropriate intervals, but you might want to look to other solutions if the interval is very short and/or the VSAM file is large. Talk to your technical staff as Bill Woodger indicated in your question.Greave
L
2

Syncsort has been processing mainframe data for 40 years (approx 50% of mainframes already run the software) they have a specific product called DMX-H which can source mainframe data, handle the data type conversions, import the cobol copy books and load it directly into HDFS. Syncsort also recently contributed a new feature enhancement to the Apache Hadoop core I suggest you contact them at www.syncsort.com They were showing this in a demo at a recent Cloudera roadshow.

Larondalarosa answered 2/5, 2013 at 11:0 Comment(1)
You got any figures for "50% of mainframes already run [SyncSort]"?Erudite
C
2

Update for 2018:

There are a number of commercial products that help to move data from the mainframe to distributed platforms. Here is a list of ones that I have run into for those that are interested. All of them take data on Z as described in the question and will do some transformation and enable movement of the data to other platforms. Not an exact match, but, the industry has changed and the goal of moving data for analysis to other platforms is growing. Data Virtualization Manager provides the most robust tooling for transforming the data from what I've seen.

SyncSort IronStream

IBM Common Data Provider

Correlog

IBM Data Virtualization Manager

Circumstantial answered 29/8, 2018 at 12:58 Comment(0)
A
1

Why not : hadoop fs -put <what> <where>?

Assamese answered 28/2, 2013 at 10:43 Comment(1)
not really, I am trying to automate things, where I have the Source i.e VSAM files in Mainframes. How do I export them here directly into HadoopScenography
M
1

Transmission of cobol layout files can be done through above discussed options. However actual mapping them to Hive table is a complex task as cobol layout has complex formats as depending clause, variable length, etc.,

I have tried to create custom serde to achieve, although it is still in initial stages. But here is the link, which might give you some idea how to deserialize according to your requirements.

https://github.com/rbheemana/Cobol-to-Hive

Malay answered 29/5, 2015 at 16:58 Comment(1)
This jar is amazingUsance
H
1

Not pull, but push: use the Co:Z Launcher from Dovetailed Technologies.

For example (JCL excerpt):

//FORWARD  EXEC PGM=COZLNCH
//STDIN    DD *
hadoop fs -put <(fromfile /u/me/data.csv) /data/data.csv
# Create a catalog table
hive -f <(fromfile /u/me/data.hcatalog)
/*

where /u/me/data.csv (the mainframe-based data that you want in Hadoop) and /u/me/data.hcatalog (corresponding HCatalog file) are z/OS UNIX file paths.

For a more detailed example, where the data happens to be log records, see Extracting logs to Hadoop.

Hartsock answered 26/6, 2015 at 4:59 Comment(0)
A
1

Cobrix might be able to solve it for you. It is an open-source COBOL data source for Spark and can parse the files you mentioned.

Arvillaarvin answered 22/8, 2018 at 19:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.