Downloading text files with Python and ftplib.FTP from z/os
Asked Answered
I

5

6

I'm trying to automate downloading of some text files from a z/os PDS, using Python and ftplib.

Since the host files are EBCDIC, I can't simply use FTP.retrbinary().

FTP.retrlines(), when used with open(file,w).writelines as its callback, doesn't, of course, provide EOLs.

So, for starters, I've come up with this piece of code which "looks OK to me", but as I'm a relative Python noob, can anyone suggest a better approach? Obviously, to keep this question simple, this isn't the final, bells-and-whistles thing.

Many thanks.

#!python.exe
from ftplib import FTP

class xfile (file):
    def writelineswitheol(self, sequence):
        for s in sequence:
            self.write(s+"\r\n")

sess = FTP("zos.server.to.be", "myid", "mypassword")
sess.sendcmd("site sbd=(IBM-1047,ISO8859-1)")
sess.cwd("'FOO.BAR.PDS'")
a = sess.nlst("RTB*")
for i in a:
    sess.retrlines("RETR "+i, xfile(i, 'w').writelineswitheol)
sess.quit()

Update: Python 3.0, platform is MingW under Windows XP.

z/os PDSs have a fixed record structure, rather than relying on line endings as record separators. However, the z/os FTP server, when transmitting in text mode, provides the record endings, which retrlines() strips off.

Closing update:

Here's my revised solution, which will be the basis for ongoing development (removing built-in passwords, for example):

import ftplib
import os
from sys import exc_info

sess = ftplib.FTP("undisclosed.server.com", "userid", "password")
sess.sendcmd("site sbd=(IBM-1047,ISO8859-1)")
for dir in ["ASM", "ASML", "ASMM", "C", "CPP", "DLLA", "DLLC", "DLMC", "GEN", "HDR", "MAC"]:
    sess.cwd("'ZLTALM.PREP.%s'" % dir)
    try:
        filelist = sess.nlst()
    except ftplib.error_perm as x:
        if (x.args[0][:3] != '550'):
            raise
    else:
        try:
            os.mkdir(dir)
        except:
            continue
        for hostfile in filelist:
            lines = []
            sess.retrlines("RETR "+hostfile, lines.append)
            pcfile = open("%s/%s"% (dir,hostfile), 'w')
            for line in lines:
                pcfile.write(line+"\n")
            pcfile.close()
        print ("Done: " + dir)
sess.quit()

My thanks to both John and Vinay

Inductive answered 26/7, 2009 at 15:31 Comment(3)
Please edit your question to mention and describe PDS files. "some text files" is rather inadequate.Caeoma
Also please state what platform, what version of Python, and why your writelineswitheol method appends '\r\n' instead of '\n'. AND please state whether you have actually run this and examined the output to ensure it has the correct line termination for your platform.Caeoma
Done. I'm doing some weekend coding at home outside the corp.firewall, so I'll only be testing the idea later this week.Inductive
W
6

Just came across this question as I was trying to figure out how to recursively download datasets from z/OS. I've been using a simple python script for years now to download ebcdic files from the mainframe. It effectively just does this:

def writeline(line):
    file.write(line + "\n")

file = open(filename, "w")
ftp.retrlines("retr " + filename, writeline)
Willywillynilly answered 1/11, 2011 at 10:13 Comment(1)
file is not definedTryma
M
3

You should be able to download the file as a binary (using retrbinary) and use the codecs module to convert from EBCDIC to whatever output encoding you want. You should know the specific EBCDIC code page being used on the z/OS system (e.g. cp500). If the files are small, you could even do something like (for a conversion to UTF-8):

file = open(ebcdic_filename, "rb")
data = file.read()
converted = data.decode("cp500").encode("utf8")
file = open(utf8_filename, "wb")
file.write(converted)
file.close()

Update: If you need to use retrlines to get the lines and your lines are coming back in the correct encoding, your approach will not work, because the callback is called once for each line. So in the callback, sequence will be the line, and your for loop will write individual characters in the line to the output, each on its own line. So you probably want to do self.write(sequence + "\r\n") rather than the for loop. It still doesn' feel especially right to subclass file just to add this utility method, though - it probably needs to be in a different class in your bells-and-whistles version.

Mote answered 26/7, 2009 at 15:39 Comment(4)
Thanks, Vinay, that's an interesting idea, but how do I insert the newlines? (These are conventional zos PDSs, not OpenEdition files)Inductive
How are the lines terminated on the host system, then, if not with EBCDIC line feeds?Mote
The host file system is record-based. It's either fixed-length, in which case all the records have the same length, or variable-length, where the length is stored in a descriptor field at the start of each record. FTP.retrlines() extracts the records correctly, but (correctly, I think) doesn't provide the newlines.Inductive
@Vinay.Update: Oops, yes, I understand. When I get back to the mainframe, later this week, I'll give some ideas a try, and post back.Inductive
C
1

Your writelineswitheol method appends '\r\n' instead of '\n' and then writes the result to a file opened in text mode. The effect, no matter what platform you are running on, will be an unwanted '\r'. Just append '\n' and you will get the appropriate line ending.

Proper error handling should not be relegated to a "bells and whistles" version. You should set up your callback so that your file open() is in a try/except and retains a reference to the output file handle, your write call is in a try/except, and you have a callback_obj.close() method which you use when retrlines() returns to explicitly file_handle.close() (in a try/except) -- that way you get explict error handling e.g. messages "can't (open|write to|close) file X because Y" AND you save having to think about when your files are going to be implicitly closed and whether you risk running out of file handles.

Python 3.x ftplib.FTP.retrlines() should give you str objects which are in effect Unicode strings, and you will need to encode them before you write them -- unless the default encoding is latin1 which would be rather unusual for a Windows box. You should have test files with (1) all possible 256 bytes (2) all bytes that are valid in the expected EBCDIC codepage.

[a few "sanitation" remarks]

  1. You should consider upgrading your Python from 3.0 (a "proof of concept" release) to 3.1.

  2. To facilitate better understanding of your code, use "i" as an identifier only as a sequence index and only if you irredeemably acquired the habit from FORTRAN 3 or more decades ago :-)

  3. Two of the problems discovered so far (appending line terminator to each character, wrong line terminator) would have shown up the first time you tested it.

Caeoma answered 27/7, 2009 at 0:54 Comment(1)
John, thank you. Please be assured that I have taken your just criticisms on board.Inductive
E
0

Use retrlines of ftplib to download file from z/os, each line has no '\n'.

It's different from windows ftp command 'get xxx'.

We can rewrite the function 'retrlines' to 'retrlines_zos' in ftplib.py.

Just copy the whole code of retrlines, and chane the 'callback' line to:

...

callback(line + "\n")

...

I tested and it worked.

Electrolytic answered 22/9, 2020 at 7:8 Comment(1)
Callback? What callback? Code, please.Tryma
T
0

you want a lambda function and a callback. Like so:

def writeLineCallback(line, file):
     file.write(line + "\n")

ftpcommand = "RETR {}{}{}".format("'",zOsFile,"'")  
filename = "newfilename"
with open( filename, 'w' ) as file :
     callback_lambda = lambda x: writeLineCallback(x,file)
     ftp.retrlines(ftpcommand, callback_lambda)

This will download file 'zOsFile' and write it to 'newfilename'

Tryma answered 21/12, 2021 at 22:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.