Transfer binary data (image etc) in a CSV file
Asked Answered
U

2

8

I have some binary data to transfer in addition to some other metadata information stored as string, number etc. The binary data I have is image files stored as a blob column in the database, I want to include the blob column in the csv file and store the csv file in filesystem or sftp server, I guess it does not matter where it is stored.

How can the binary data be stored as another column in the csv file? Is this a good practice to transfer the binary data this way?

Urinate answered 8/8, 2018 at 16:11 Comment(6)
What are you transferring from/to?Firm
As for my understanding, it is not good practice to transfer binary in CSV. CSV is basically text file (e.g. ASCII or UTF8). However, if you must do it, then you can convert the binary data to base64. That way your data is converted to ASCII and you can put as one of the column. Cons: 1. File size will be lot bigger. 2) Required to make sure there are not any breaking chracters like , or tab.Blur
@Blur Agreed, but base64 converts whitespace characters, right? Or did you mean the OP should confirm the base64 encoder they use doesn't break lines at a certain length?Schick
@barycarter yes base64 converts all bytes to ascii (basically whitespaces are not included). I found out that normally when converting to base64 it does not break line so can be included between quotes to tackle potential error.Blur
@Blur want to say -whitespaces are not excluded-? Also when we use double quotes, should not it solve the problem with characters like , or tab?Urinate
@Blur The standard seems to disagree with you about binary data being excluded from the CSV format. The only mention of binary data is a simple caution that any included binary data could possibly be malicious. Therefore, the authors of the RFC consider binary data to be possibly included in any CSV file.Blah
E
5

Base64

The usual (and proper) way is to encode the binary data in Base64. This will make the data larger by a factor 4:3.

While CSV files are generally considered to be text files, you could write raw binary data to the file.

That data should then be enclosed in duoble quotes, and all existing double quotes inside the data have to be escaped with another double quote. Quoting the fields will also take care of any newlines in the binary data, but the reader has to support that. There could also be null bytes inside which should not be a problem if the reader knows it's reading binary data (i.e. if you provide the reader yourself).

Problems could arise however if your data has to be in some unicode form..

So, in general it is not good practice to write raw binary data to a csv file, it's better to use base64 encoding.

Eritrea answered 8/8, 2018 at 16:50 Comment(5)
base64 encoders tend to insert either line feeds or carriage returns and line feeds into the output data, which will break a line-based CSV file.Mediative
@AndrewHenle In that case the data should be enclosed in quotes. Any decent CSV reader should be able to handle those multiline fields (especially if that reader has to handle the base64 data -- some CSV files contain whole html pages as a single multiline field). Otoh there is no real reason to split that data (as opposed to base64 in email for example) and it could be just one long string of data (as in html embedded images).Eritrea
Another option would be to just store the name of the image in the CSV and send the CSV file together with the images in a single ZIP file. But that was not really the question here.Eritrea
Definitely, it is a very good option @Eritrea it is in my list, but I want to understand if it is feasible to move all my data at once. Thanks a lot for the insights on doing it in a csv file.Urinate
@mCeviker Glad to help - don't forget to accept the answer if it answers your question :)Eritrea
B
3

As the correct Answer by Danny_ds discusses, yes indeed you can include binary data in a CSV but you must encode that data in a text-and-CSV-friendly manner. The obvious choice for such text-and-CSV-friendly is Base64, as mentioned.

Example app

Here is the full source code for an example app.

This app downloads a couple images from over the Internet. Namely, the logos from the sister sites StackOverflow.com and StackExchange.com. After downloading, these images are encoded as Base64 strings.

Then a 2-column CSV file is written. The columns are a name and the image data. For this example, we have 2 rows, one for each site listed above. Notice that Base64 encoding does not involve commas or quotes, so there is no need to enclose a Base64 with quotes in standard CSV format.

To prove this worked, the CSV file is then read. The image data is decoded from Base64 back into binary data. The binary data is written to storage. You can open the PNG image files yourself. They should look like this & this.

The writing and reading of the CSV is accomplished using the Apache Commons CSV library.

package work.basil.example;

import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVPrinter;
import org.apache.commons.csv.CSVRecord;

import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.*;
import java.net.MalformedURLException;
import java.net.URL;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Base64;
import java.util.HashMap;
import java.util.Map;
import java.util.Objects;

public class CsvImager
{
    // Write a CSV file, of two columns: name of the image, image in Base64 data.
    private void writeCsv ( final Map < String, BufferedImage > map , final Path path )
    {
        Objects.requireNonNull( map );

        CSVFormat format = CSVFormat.RFC4180.withHeader( "Name" , "Image" );
        try (
                BufferedWriter writer = Files.newBufferedWriter( path , StandardCharsets.UTF_8 ) ;
                CSVPrinter printer = new CSVPrinter( writer , format ) ;
        )
        {
            // Print rows.
            for ( String key : map.keySet() )
            {
                printer.print( key );
                BufferedImage image = Objects.requireNonNull( map.get( key ) );
                ByteArrayOutputStream stream = new ByteArrayOutputStream();
                ImageIO.write( image , "PNG" , stream );
                String imageData = Base64.getEncoder().encodeToString( stream.toByteArray() );
                printer.print( imageData );
                printer.println();
            }
        } catch ( IOException e )
        {
            e.printStackTrace();
        }
    }

    // Read images from a CSV file in storage.
    public Map < String, BufferedImage > readCsv ( final Path path )
    {
        Objects.requireNonNull( path );
        Map < String, BufferedImage > map = Map.of();
        try ( BufferedReader reader = Files.newBufferedReader( path ) )
        {
            map = new HashMap <>();
            CSVFormat format = CSVFormat.RFC4180.withHeader( "Name" , "Image" ).withFirstRecordAsHeader();
            Iterable < CSVRecord > records = format.parse( reader );
            for ( CSVRecord record : records )
            {
                String name = record.get( "Name" );
                String imageBase64 = record.get( "Image" );
//                System.out.println("imageBase64:\n" + imageBase64 + "\n");
                byte[] bytes = Base64.getDecoder().decode( imageBase64 );
                ByteArrayInputStream stream = new ByteArrayInputStream( bytes );
                BufferedImage image = ImageIO.read( stream );
                map.put( name , image );
            }

        } catch ( IOException e )
        {
            e.printStackTrace();
        }
        return map;
    }

    // Download images from the Internet.
    private Map < String, BufferedImage > fetchImages ()
    {
        Map < String, BufferedImage > map = Map.of(); // Initialize to empty map.
        try
        {
            URL urlStackOverflow = null, urlStackExchange = null;
            urlStackOverflow = new URL( "https://cdn.sstatic.net/Sites/stackoverflow/company/img/logos/so/so-logo.png" );
            urlStackExchange = new URL( "https://cdn.sstatic.net/Sites/stackoverflow/company/img/logos/se/se-logo.png" );

            BufferedImage imageStackOverflow = ImageIO.read( urlStackOverflow );
            BufferedImage imageStackExchange = ImageIO.read( urlStackExchange );

            System.out.println( "imageStackOverflow: " + imageStackOverflow );
            System.out.println( "imageStackExchange: " + imageStackExchange );

            map = Map.of( "logoStackOverflow.png" , imageStackOverflow , "logoStackExchange.png" , imageStackExchange );
        } catch ( MalformedURLException e ) // `new URL` fail.
        {
            e.printStackTrace();
        } catch ( IOException e ) // `ImageIO.read` fail.
        {
            e.printStackTrace();
        }
        ;
        return map;
    }

    // Produce individual image files on disk, to manually verify that the downloaded images were successfully Base64 endcoded, written to CSV, read from CSV, and decoded back to images.
    public void writeImages ( final Map < String, BufferedImage > map , final Path pathToFolder )
    {
        Objects.requireNonNull( map );
        Objects.requireNonNull( pathToFolder );
        if ( map.isEmpty() )
        {
            throw new IllegalArgumentException( "The Map should have elements but is empty. Message # 77063b5a-4398-49f0-b1a4-442255a13b77." );
        }
        if ( ! Files.isDirectory( pathToFolder ) )
        {
            throw new IllegalArgumentException( "The specified path must lead to an existing folder. Message # 6a19313d-b8a9-4a53-9b82-7672172923f9." );
        }
        for ( String key : map.keySet() )
        {
            Path pathToFile = pathToFolder.resolve( key );
            try (
                    OutputStream stream = Files.newOutputStream( pathToFile ) ;
            )
            {
                BufferedImage image = Objects.requireNonNull( map.get( key ) );
                ImageIO.write( image , "PNG" , stream );
            } catch ( IOException e )
            {
                e.printStackTrace();
            }
        }
    }

    // --------| Demo  |-----------------------

    public void demo ()
    {
        Map < String, BufferedImage > map = this.fetchImages(); // Pairs of name & image.
        Path path = Paths.get( "/Users/basilbourque/images.csv" );
        this.writeCsv( map , path );
        Map < String, BufferedImage > mapOut = this.readCsv( path );
        Path pathOut = path.getParent();
        this.writeImages( mapOut , pathOut );
    }

    public static void main ( String[] args )
    {
        CsvImager app = new CsvImager();
        app.demo();

        System.out.println( "Done." );
    }
}

Tip: The scattered use of magic strings Name & Image for the column names in the CSV should be replaced with the use of an enum as supported by Apache Commons CSV. I leave that as an exercise for the reader.

Blah answered 27/6, 2019 at 22:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.