SageMaker failed to extract model data archive tar.gz for container when deploying

Asked 25/1, 2021 at 9:3 Answered 19/8, 2024 at 22:35

Solved machine-learning scikit-learn deployment amazon-sagemaker

I am trying in Amazon Sagemaker to deploy an existing Scikit-Learn model. So a model that wasn't trained on SageMaker, but locally on my machine.

On my local (windows) machine I've saved my model as model.joblib and tarred the model to model.tar.gz.

Next, I've uploaded this model to my S3 bucket ('my_bucket') in the following path s3://my_bucket/models/model.tar.gz. I can see the tar file in S3.

But when I'm trying to deploy the model, it keeps giving the error message "Failed to extract model data archive".

The .tar.gz is generated on my local machine by running 'tar -czf model.tar.gz model.joblib' in a powershell command window.

The code for uploading to S3

import boto3
s3 = boto3.client("s3", 
              region_name='eu-central-1', 
              aws_access_key_id=AWS_KEY_ID, 
              aws_secret_access_key=AWS_SECRET)
s3.upload_file(Filename='model.tar.gz', Bucket=my_bucket, Key='models/model.tar.gz')

The code for creating the estimator and deploying:

import boto3
from sagemaker.sklearn.estimator import SKLearnModel

...

model_data = 's3://my_bucket/models/model.tar.gz'
sklearn_model = SKLearnModel(model_data=model_data,
                             role=role,
                             entry_point="my-script.py",
                             framework_version="0.23-1")
predictor = sklearn_model.deploy(instance_type="ml.t2.medium", initial_instance_count=1)

The error message:

error message: UnexpectedStatusException: Error hosting endpoint sagemaker-scikit-learn-2021-01-24-17-24-42-204: Failed. Reason: Failed to extract model data archive for container "container_1" from URL "s3://my_bucket/models/model.tar.gz". Please ensure that the object located at the URL is a valid tar.gz archive

Is there a way to see why the archive is invalid?

Tamelatameless answered 25/1, 2021 at 9:3 Comment(5)

How are you generating your .tar.gz? I had a similar issue (and might have a solution), but want to make sure to give you good info. – Rutharuthann 26/1, 2021 at 18:57

First I saved my model with joblib.dump, which generates model.joblib. Next, in 2 steps using 7zip, I added it to a tar archive and next to a gzip archive, resulting in model.tar.gz. I thought I also tried using tar -czf from a windows powershell window, but I'm not sure. I'll try that again. – Tamelatameless 27/1, 2021 at 10:59

I also ran 'tar -czf model.tar.gz model.joblib' (from a jupyter notebook, on my windows machine), but I got the same error message. – Tamelatameless 27/1, 2021 at 15:57

How did you upload the .tar.gz to S3? – Rutharuthann 27/1, 2021 at 23:1

Hi Joe, Thanks to your question I discovered an error in uploading! I just updated the question with how i generate the .tar.gz and with the code for uploading. But I found that I provided the wrong filename in the upload_file method. I put a variable there which had the wrong value. So instead of Filename='model.tar.gz', it uploaded 'model.joblib' (in my code I used variables, no string literals). I have changed it and now it works! A kind of stupid error. If you provide an answer that I uploaded the wrong file, I can mark this question as answered. – Tamelatameless 28/1, 2021 at 8:57

I had a similar issue as well, along with a similar fix to Bas (per comment above).

I was finding I wasn't necessarily having issues with the .tar.gz step, this command does work fine:

tar -czf <filename> ./<directory-with-files>

but rather with the uploading step.

Manually uploading to S3 should take care of this, however, if you're doing this step programmatically, you might need to double check the steps taken. Bas appears to have had filename issues, mine were around using boto properly. Here's some code that works (Python only here, but watch for similar issues with other libraries):

bucket = 'bucket-name'
key = 'directory-inside-bucket'
file = 'the file name of the .tar.gz'

s3_client = boto3.client('s3')
s3_client.upload_file(file, bucket, key)

Docs: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.upload_file

Rutharuthann answered 28/1, 2021 at 16:41 Comment(4)

repost.aws/knowledge-center/sagemaker-endpoint-creation-fail note that symlinks are not allowed in the tar.gz file – Zeralda 21/7, 2023 at 15:51

having said that, I run into "Failed to extract model data archive for container" several times and none of the solution above helped. Changed my setup a number of times and eventually the error message (from endpoint status) switched to "tar.gz too big not enough memory". – Zeralda 21/7, 2023 at 15:51

Unfortunately, I'm not surprised. I've been exploring using Fargate instead of SageMaker. I'm not sure their endpoint inference offers much, and you have more flexibility with something like Fargate. – Rutharuthann 23/7, 2023 at 1:42

Yeah I agree, if anything it gets in the way. I got NVidia Triton working locally in a handful of hours, but have spent days deploying the exact same models to Sagemakers version. The latest is s3 upload truncated the file, and endpoint deployment not showing any logs until the container has been created. So I had to download the file and uncompress it in order to work out what error Sagemaker was producing. – Lillith 28/8, 2023 at 4:32

This "Failed to extract model data archive from URL" error is very misleading, any permission issue to access the s3 object will also cause this error, in my case, it turns out to be the sagemaker role does not have decrypt permission of s3 bucket.

So if you see this error message when deploy sagemaker job, make sure the sagemaker role has proper permissions:

ListBucket permission
GetObject permission
Access to the KMS and decrypt permission if your bucket is encrypted by a KMS key

Fessler answered 30/1, 2024 at 20:59 Comment(1)

also there would not be any logs until the container was created, but you could reach out AWS support and they get the detailed error logs on their end – Fessler 31/1, 2024 at 17:58

SageMaker does not accept tar archives that contain symbolic links. That was my case and I solved by replacing the symlinks with the actual files, as mentioned also here: https://repost.aws/knowledge-center/sagemaker-endpoint-creation-fail

You can remove the symlinks in a folder using this bash cmd (unix):

#!/bin/bash

# Check if a directory is provided as an argument
if [ $# -eq 0 ]; then
    echo "Usage: $0 <directory>"
    exit 1
fi

# The directory to process
dir="$1"

# Find all symbolic links in the directory and its subdirectories
find "$dir" -type l | while read -r link; do
    # Get the target of the symbolic link
    target=$(readlink -f "$link")
    
    # Check if the target file exists
    if [ -e "$target" ]; then
        # Remove the symbolic link
        rm "$link"
        
        # Copy the actual file to the location of the former symbolic link
        cp -a "$target" "$link"
        
        echo "Replaced symlink: $link"
    else
        echo "Warning: Target does not exist for $link"
    fi
done

Acre answered 19/8, 2024 at 22:35 Comment(0)

Recommended topics

Hot tags