Writing/parsing text file with fixed width lines
Asked Answered
P

8

14

I'm a newbie to Python and I'm looking at using it to write some hairy EDI stuff that our supplier requires.

Basically they need an 80-character fixed width text file, with certain "chunks" of the field with data and others left blank. I have the documentation so I know what the length of each "chunk" is. The response that I get back is easier to parse since it will already have data and I can use Python's "slices" to extract what I need, but I can't assign to a slice - I tried that already because it sounded like a good solution, and it didn't work since Python strings are immutable :)

Like I said I'm really a newbie to Python but I'm excited about learning it :) How would I go about doing this? Ideally I'd want to be able to say that range 10-20 is equal to "Foo" and have it be the string "Foo" with 7 additional whitespace characters (assuming said field has a length of 10) and have that be a part of the larger 80-character field, but I'm not sure how to do what I'm thinking.

Present answered 11/5, 2009 at 15:1 Comment(3)
Are you processing X12 EDI messages? The layout is not really fixed. Are you processing some other format? If so, it isn't really [EDI] is it? It's just fixed file layout.Incorruptible
I have no idea, really. They refer to it as "EDI" in all of their documentation. All I know is I have to send them a record (they call it an "H0" record and they'll send me back a file to parse.Present
The ISA header of X12 is fixed width (the very first line) as the delimiters aren't declared until the end of the line.Cornish
C
22

You don't need to assign to slices, just build the string using % formatting.

An example with a fixed format for 3 data items:

>>> fmt="%4s%10s%10s"
>>> fmt % (1,"ONE",2)
'   1       ONE         2'
>>> 

Same thing, field width supplied with the data:

>>> fmt2 = "%*s%*s%*s"
>>> fmt2 % (4,1, 10,"ONE", 10,2)
'   1       ONE         2'
>>> 

Separating data and field widths, and using zip() and str.join() tricks:

>>> widths=(4,10,10)
>>> items=(1,"ONE",2)
>>> "".join("%*s" % i for i in zip(widths, items))
'   1       ONE         2'
>>> 
Culbert answered 11/5, 2009 at 15:32 Comment(0)
R
10

Hopefully I understand what you're looking for: some way to conveniently identify each part of the line by a simple variable, but output it padded to the correct width?

The snippet below may give you what you want

class FixWidthFieldLine(object):

    fields = (('foo', 10),
              ('bar', 30),
              ('ooga', 30),
              ('booga', 10))

    def __init__(self):
        self.foo = ''
        self.bar = ''
        self.ooga = ''
        self.booga = ''

    def __str__(self):
        return ''.join([getattr(self, field_name).ljust(width) 
                        for field_name, width in self.fields])

f = FixWidthFieldLine()
f.foo = 'hi'
f.bar = 'joe'
f.ooga = 'howya'
f.booga = 'doin?'

print f

This yields:

hi        joe                           howya                         doing     

It works by storing a class-level variable, fields which records the order in which each field should appear in the output, together with the number of columns that field should have. There are correspondingly-named instance variables in the __init__ that are set to an empty string initially.

The __str__ method outputs these values as a string. It uses a list comprehension over the class-level fields attribute, looking up the instance value for each field by name, and then left-justifying it's output according to the columns. The resulting list of fields is then joined together by an empty string.

Note this doesn't parse input, though you could easily override the constructor to take a string and parse the columns according to the field and field widths in fields. It also doesn't check for instance values that are longer than their allotted width.

Rugging answered 11/5, 2009 at 15:32 Comment(0)
C
7

You can use justify functions to left-justify, right-justify and center a string in a field of given width.

'hi'.ljust(10) -> 'hi        '
Ceiba answered 11/5, 2009 at 15:37 Comment(0)
M
2

I know this thread is quite old, but we use a library called django-copybook. It has nothing to do with django (anymore). We use it to go between fixed width cobol files and python. You create a class to define your fixed width record layout and can easy move between typed python objects and fixed width files:

USAGE:
class Person(Record):
    first_name = fields.StringField(length=20)
    last_name = fields.StringField(length=30)
    siblings = fields.IntegerField(length=2)
    birth_date = fields.DateField(length=10, format="%Y-%m-%d")

>>> fixedwidth_record = 'Joe                 Smith                         031982-09-11'
>>> person = Person.from_record(fixedwidth_record)
>>> person.first_name
'Joe'
>>> person.last_name
'Smith'
>>> person.siblings
3
>>> person.birth_date
datetime.date(1982, 9, 11)

It can also handle situations similar to Cobol's OCCURS functionality like when a particular section is repeated X times

Meliorism answered 13/2, 2017 at 21:32 Comment(0)
G
1

I used Jarret Hardie's example and modified it slightly. This allows for selection of type of text alignment(left, right or centered.)

class FixedWidthFieldLine(object):
    def __init__(self, fields, justify = 'L'):
        """ Returns line from list containing tuples of field values and lengths. Accepts
            justification parameter.
            FixedWidthFieldLine(fields[, justify])

            fields = [(value, fieldLenght)[, ...]]
        """
        self.fields = fields

        if (justify in ('L','C','R')):
            self.justify = justify
        else:
            self.justify = 'L'

    def __str__(self):
        if(self.justify == 'L'):
            return ''.join([field[0].ljust(field[1]) for field in self.fields])
        elif(self.justify == 'R'):
            return ''.join([field[0].rjust(field[1]) for field in self.fields])
        elif(self.justify == 'C'):
            return ''.join([field[0].center(field[1]) for field in self.fields])

fieldTest = [('Alex', 10),
         ('Programmer', 20),
         ('Salem, OR', 15)]

f = FixedWidthFieldLine(fieldTest)
print f
f = FixedWidthFieldLine(fieldTest,'R')
print f

Returns:

Alex      Programmer          Salem, OR      
      Alex          Programmer      Salem, OR
Granlund answered 28/5, 2014 at 16:47 Comment(0)
P
0

It's a little difficult to parse your question, but I'm gathering that you are receiving a file or file-like-object, reading it, and replacing some of the values with some business logic results. Is this correct?

The simplest way to overcome string immutability is to write a new string:

# Won't work:
test_string[3:6] = "foo"

# Will work:
test_string = test_string[:3] + "foo" + test_string[6:]

Having said that, it sounds like it's important to you that you do something with this string, but I'm not sure exactly what that is. Are you writing it back to an output file, trying to edit a file in place, or something else? I bring this up because the act of creating a new string (which happens to have the same variable name as the old string) should emphasize the necessity of performing an explicit write operation after the transformation.

Phyllous answered 11/5, 2009 at 15:21 Comment(0)
P
0

You can convert the string to a list and do the slice manipulation.

>>> text = list("some text")
>>> text[0:4] = list("fine")
>>> text
['f', 'i', 'n', 'e', ' ', 't', 'e', 'x', 't']
>>> text[0:4] = list("all")
>>> text
['a', 'l', 'l', ' ', 't', 'e', 'x', 't']
>>> import string
>>> string.join(text, "")
'all text'
Platt answered 11/5, 2009 at 15:33 Comment(3)
Intersting. You don't need to convert to a list to extract. That's silly. But building a list and then collapsing to a string... it gives you what looks a little bit like a "mutable string" -- only if you pre-allocate enough space.Incorruptible
Actually you don't need to preallocate anything if you don't care all too much about performance. The list type will automatically allocate more space if the range sliced is assigned a bigger range.Platt
Also the list-conversion is there for clarity. Of course it might be better if he read data straight into a list from the beginning, but that's not what I wanted to show.Platt
A
0

It is easy to write function to "modify" string.

def change(string, start, end, what):
    length = end - start
    if len(what)<length: what = what + " "*(length-len(what))
    return string[0:start]+what[0:length]+string[end:]

Usage:

test_string = 'This is test string'

print test_string[5:7]  
# is
test_string = change(test_string, 5, 7, 'IS')
# This IS test string
test_string = change(test_string, 8, 12, 'X')
# This IS X    string
test_string = change(test_string, 8, 12, 'XXXXXXXXXXXX')
# This IS XXXX string
Aesthetic answered 11/5, 2009 at 15:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.