golang unzip Response.Body
Asked Answered
Q

2

17

I wrote a little web crawler and had known that the Response is a zip file.
In my limited experience with golang programing, I only know how to unzip a existing file.
Can I unzip the Response.Body in memory without saving it in hard disk in advance?

Quality answered 26/5, 2018 at 3:38 Comment(0)
F
37

Updating answer for handling Zip file response body in-memory.

Note: Ensure you have sufficient memory for handling zip file.

package main

import (
    "archive/zip"
    "bytes"
    "fmt"
    "io/ioutil"
    "log"
    "net/http"
)

func main() {
    resp, err := http.Get("zip file url")
    if err != nil {
        log.Fatal(err)
    }
    defer resp.Body.Close()

    body, err := ioutil.ReadAll(resp.Body)
    if err != nil {
        log.Fatal(err)
    }

    zipReader, err := zip.NewReader(bytes.NewReader(body), int64(len(body)))
    if err != nil {
        log.Fatal(err)
    }

    // Read all the files from zip archive
    for _, zipFile := range zipReader.File {
        fmt.Println("Reading file:", zipFile.Name)
        unzippedFileBytes, err := readZipFile(zipFile)
        if err != nil {
            log.Println(err)
            continue
        }

        _ = unzippedFileBytes // this is unzipped file bytes
    }
}

func readZipFile(zf *zip.File) ([]byte, error) {
    f, err := zf.Open()
    if err != nil {
        return nil, err
    }
    defer f.Close()
    return ioutil.ReadAll(f)
}

By default Go HTTP client handles Gzip response automatically. So do typical read and close of response body.

However there is a catch in it.

// Reference https://github.com/golang/go/blob/master/src/net/http/transport.go
//
// DisableCompression, if true, prevents the Transport from
// requesting compression with an "Accept-Encoding: gzip"
// request header when the Request contains no existing
// Accept-Encoding value. If the Transport requests gzip on
// its own and gets a gzipped response, it's transparently
// decoded in the Response.Body. However, if the user
// explicitly requested gzip it is not automatically
// uncompressed.
DisableCompression bool

What it means is; If you add a header Accept-Encoding: gzip manually in the request then you have to handle Gzip response body by yourself.

For Example -

reader, err := gzip.NewReader(resp.Body)
if err != nil {
    log.Fatal(err)
}
defer reader.Close()

body, err := ioutil.ReadAll(reader)
if err != nil {
    log.Fatal(err)
}

fmt.Println(string(body))
Fogged answered 26/5, 2018 at 4:30 Comment(5)
Thanks for your answer. I know how to handle gzip. But I don't know handle .zip file. It may need archive/zip, but I don't know how to use archive/zip unzip .zip Response.Body. Is gzip the same as zip?Quality
No gzip != zip. It seems you asked for .zip file, my bad. I will update the code snippet in a while.Fogged
Dear reader ioutil.ReadAll(f) is deprecated for io.ReadAll(f)Chambliss
Do not use io.ReadAll! in the first place. This will load the whole body into memory, causing out of memory issues and most likely also memory leaks. Instead try to go with io.Copy.Theona
I added another answer, see my other full solution, which is using io.Copy.Theona
T
0

I believe the other proposed solutions are not great as they will not give you the full idea how to unzip the all the content of the zip file.

Furthermore, the example above is using ReadAll, which should be avoid (since that will read the whole content into the memory!).

Instead of io.ReadAll, this example uses io.Copy to avoid out-of-memory issues as well as memory leaks.

Instead of reading everything in memory, I try to first write the body response content into a temporary file via io.Copy. After that I use zip.OpenReader to read-in the file. I know for sure that my example is currently not causing any (major) memory leaks. Correct me if I'm wrong.

See example below. Which should work with large zip files (eg. 100MB+) and executing this function within a goroutine shouldn't be a problem either.

package main
import (
    "archive/zip"
    "fmt"
    "io"
    "log"
    "net/http"
    "os"
    "path/filepath"
    "time"
)

// Global reusable HTTP client with time-out set to 30s
var httpClient = &http.Client{
    Timeout: 30 * time.Second,
}

// Example download function
func download() {
    res, err := httpClient.Get("https://somedomain.com/yourfile.zip")
    if err != nil {
        log.Printf("Error making http request: %v\n", err)
        return
    }
    defer res.Body.Close() // Close the body resource always

    // Create a temporary file to store the response body
    tmpFile, err := os.CreateTemp("", "temp.zip")
    if err != nil {
        log.Printf("Error creating temporary file: %v\n", err)
        return
    }
    defer os.Remove(tmpFile.Name()) // Clean up the temporary file afterwards

    // Copy the response body to the temporary file
    _, err = io.Copy(tmpFile, res.Body)
    if err != nil {
        log.Printf("Error copying response body to temporary file: %v\n", err)
        return
    }
    tmpFile.Close()

    // Unzip the data from resBody
    err = unzip(tmpFile.Name(), destinationPath)
    if err != nil {
        log.Printf("Failed to unzip file: %v", err)
        return
    }
}

func unzip(filename string, dest string) error {
    reader, err := zip.OpenReader(filename)
    if err != nil {
        return fmt.Errorf("failed to create zip reader: %w", err)
    }
    defer reader.Close()

    for _, file := range reader.File {
        filePath := filepath.Join(dest, file.Name)

        // Create directories as needed
        if file.FileInfo().IsDir() {
            if err := os.MkdirAll(filePath, os.ModePerm); err != nil {
                return fmt.Errorf("failed to create directory: %w", err)
            }
            continue
        }

        // Create a file
        if err := os.MkdirAll(filepath.Dir(filePath), os.ModePerm); err != nil {
            return fmt.Errorf("failed to create directory for file: %w", err)
        }
        dstFile, err := os.OpenFile(filePath, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, file.Mode())
        if err != nil {
            return fmt.Errorf("failed to open file: %w", err)
        }

        // Extract the file
        srcFile, err := file.Open()
        if err != nil {
            dstFile.Close()
            return fmt.Errorf("failed to open zip file: %w", err)
        }
        _, err = io.Copy(dstFile, srcFile)

        // Close the open files
        dstFile.Close()
        srcFile.Close()
        if err != nil {
            return fmt.Errorf("failed to copy file contents: %w", err)
        }
    }

    return nil
}

Feel free to extend my example by another example that can do everything in memory (without the need of a temp file, but also without memory leaks of course).

I hope this will helps somebody!

Theona answered 22/7, 2024 at 21:59 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.