Unexpected "padding" in a Fortran unformatted file
Asked Answered
F

4

21

I don't understand the format of unformatted files in Fortran.

For example:

open (3,file=filename,form="unformatted",access="sequential")
write(3) matrix(i,:)

outputs a column of a matrix into a file. I've discovered that it pads the file with 4 bytes on either end, however I don't really understand why, or how to control this behavior. Is there a way to remove the padding?

Florrieflorry answered 5/1, 2012 at 23:3 Comment(3)
Changed the title because I really dislike the misleading usage of that term - "binary". Binary means base 2, which is not directly connected to your problem. Pretty much everything on your computer is binary, at some level. It is a common term nowadays, but fortran's "unformattted" is a lot closer.Basipetal
Some useful info here regarding stream ... star.le.ac.uk/~cgp/streamIO.htmlBasipetal
this seems to be a duplicate of #8751654, could you please not post your question multiple times?Kizzie
K
25

For unformated IO, Fortran compilers typically write the length of the record at the beginning and end of the record. Most but not all compilers use four bytes. This aids in reading records, e.g., length at the end assists with a backspace operation. You can suppress this with the new Stream IO mode of Fortran 2003, which was added for compatibility with other languages. Use access='stream' in your open statement.

Keeleykeelhaul answered 5/1, 2012 at 23:54 Comment(1)
As a minor note, some compilers such as Gfortran and Intel Fortran support records larger than 2 GB despite having 4 byte record markers, by using subrecords.Tarry
J
7

I never used sequential access with unformatted output for this exact reason. However it depends on the application and sometimes it is convenient to have a record length indicator (especially for unstructured data). As suggested by steabert in Looking at binary output from fortran on gnuplot, you can avoid this by using keyword argument ACCESS = 'DIRECT', in which case you need to specify record length. This method is convenient for efficient storage of large multi-dimensional structured data (constant record length). Following example writes an unformatted file whose size equals the size of the array:

REAL(KIND=4),DIMENSION(10) :: a = 3.141
INTEGER                    :: reclen

INQUIRE(iolength=reclen)a
OPEN(UNIT=10,FILE='direct.out',FORM='UNFORMATTED',&
     ACCESS='DIRECT',RECL=reclen)
WRITE(UNIT=10,REC=1)a
CLOSE(UNIT=10)

END

Note that this is not the ideal aproach in sense of portability. In an unformatted file written with direct access, there is no information about the size of each element. A readme text file that describes the data size does the job fine for me, and I prefer this method instead of padding in sequential mode.

Jaggery answered 11/1, 2012 at 20:3 Comment(0)
B
5

Fortran IO is record based, not stream based. Every time you write something through write() you are not only writing the data, but also beginning and end markers for that record. Both record markers are the size of that record. This is the reason why writing a bunch of reals in a single write (one record: one begin marker, the bunch of reals, one end marker) has a different size with respect to writing each real in a separate write (multiple records, each of one begin marker, one real, and one end marker). This is extremely important if you are writing down large matrices, as you could balloon the occupation if improperly written.

Blockade answered 7/1, 2012 at 2:5 Comment(3)
what you're saying is only true for 'sequential' accessKizzie
@steabert: which is the most commonly (99.999 %) used.Blockade
Approximately half my code uses 'direct' access -- So that would make it only 50% for me :PDextrorse
D
1

Fortran Unformatted IO I am quite familiar with differing outputs using the Intel and Gnu compilers. Fortunately my vast experience dating back to 1970's IBM's allowed me to decode things. Gnu pads records with 4 byte integer counters giving the record length. Intel uses a 1 byte counter and a number of embedded coding values to signify a continuation record or the end of a count. One can still have very long record lengths even though only 1 byte is used. I have software compiled by the Gnu compiler that I had to modify so it could read an unformatted file generated by either compiler, so it has to detect which format it finds. Reading an unformatted file generated by the Intel compiler (which follows the "old' IBM days) takes "forever" using Gnu's fgetc or opening the file in stream mode. Converting the file to what Gnu expects results in a factor of up to 100 times faster. It depends on your file size if you want to bother with detection and conversion or not. I reduced my program startup time (which opens a large unformatted file) from 5 minutes down to 10 seconds. I had to add in options to reconvert back again if the user wants to take the file back to an Intel compiled program. It's all a pain, but there you go.

Dejected answered 16/3, 2016 at 2:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.