Remove artifacts from CI manually
Asked Answered
P

11

39

I have a private repository at gitlab.com that uses the CI feature. Some of the CI jobs create artifacts files that are stored. I just implemented that the artifacts are deleted automatically after one day by adding this to the CI configuration:

expire_in: 1 day

That works great - however, old artifacts won't be deleted (as expected). So my question is:

How can I delete old artifacts or artifacts that do not expire? (on gitlab.com, no direct access to the server)

Pleiades answered 28/2, 2017 at 15:28 Comment(0)
T
23

You can use the GitLab REST API to delete the artifacts from the jobs if you don't have direct access to the server. Here's a sample curl script that uses the API:

#!/bin/bash
    
# project_id, find it here: https://gitlab.com/[organization name]/[repository name]/edit inside the "General project settings" tab
project_id="3034900"
    
# token, find it here: https://gitlab.com/profile/personal_access_tokens
token="Lifg_azxDyRp8eyNFRfg"
server="gitlab.com"
    
# go to https://gitlab.com/[organization name]/[repository name]/-/jobs
# then open JavaScript console
# copy/paste => copy(_.uniq($('.ci-status').map((x, e) => /([0-9]+)/.exec(e.href)).toArray()).join(' '))
# press enter, and then copy the result here :
# repeat for every page you want
job_ids=(48875658 48874137 48873496 48872419)
    
for job_id in ${job_ids[@]}
do
     URL="https://$server/api/v4/projects/$project_id/jobs/$job_id/erase"
     echo "$URL"
     curl --request POST --header "PRIVATE-TOKEN:${token}" "$URL"
     echo "\n"
done
Tactless answered 24/2, 2018 at 11:11 Comment(4)
This console command worked for me: copy([...$('a[title="Download artifacts"],a[download]').map((x, e) => /\/([0-9]+)\//.exec(e.href)[1])].join(' ')).Woothen
This also deletes the job log according to the documentation. Since GitLab 11.9 there is a separate API endpoint for deleting only the artifacts, maybe you would like to update your answer?Canna
Note that this will delete the jobs along with the artifacts. See my answer for a way to keep the jobs and delete only the artifacts.Graft
The provided js commandline script does not work.Isabel
H
29

An API call should be easier to script, with GitLab 14.7 (January 2022), which now offers:

Bulk delete artifacts with the API

While a good strategy for managing storage consumption is to set regular expiration policies for artifacts, sometimes you need to reduce items in storage right away.

Previously, you might have used a script to automate the tedious task of deleting artifacts one by one with API calls, but now you can use a new API endpoint to bulk delete job artifacts quickly and easily.

See Documentation, Issue 223793 and Merge Request 75488.

 curl --request DELETE --header "PRIVATE-TOKEN: <your_access_token>" \
      "https://gitlab.example.com/api/v4/projects/1/artifacts"

As noted by Lubo in the comments:

Response of given API is 202 Accepted. It means for me, that deletion will happen on background.

Also admin area ís updated a bit later than deletion happens

Hylo answered 22/1, 2022 at 21:38 Comment(9)
Out of curiosity will this delete the raw job logs associated with each job as well? Do those count towards the artifact storage limit?Silvie
@BrianJordan Apparently, it does not delete logs: gitlab.com/gitlab-org/gitlab/-/issues/223793#note_443460706. gitlab.com/gitlab-org/gitlab/-/merge_requests/75488 mentions "erasable job artifacts - all job artifacts except trace".Hylo
I get '202: Accepted' back but the Artifacts size of my repository remains the same.Isabel
@GrogPirate With what version of GitLab? Or is it gitlab.com? In any case, it is best to ask a separate question.Hylo
@GrogPirate We have the same problem on gitlab.com, this bulk delete api seems not work, or does not what is expected.Lipman
@Lipman I unfortunately ended up moving the repository since I could not get rid of the remaining artifacts.Isabel
Response of given API is 202 Accepted. It means for me, that deletion will happen on background. Also admin area ís updated a bit later than deletion happens.Krystenkrystin
@Krystenkrystin Good point. I have included your comment in the answer for more visibility.Hylo
The same request with the gitlab library in Python: ``` import gitlab gl = gitlab.Gitlab('YOUR URL', private_token='YOUR TOKEN') project = gl.projects.get('YOUR PROJECT ID') project.artifacts.delete() ```Countershading
T
23

You can use the GitLab REST API to delete the artifacts from the jobs if you don't have direct access to the server. Here's a sample curl script that uses the API:

#!/bin/bash
    
# project_id, find it here: https://gitlab.com/[organization name]/[repository name]/edit inside the "General project settings" tab
project_id="3034900"
    
# token, find it here: https://gitlab.com/profile/personal_access_tokens
token="Lifg_azxDyRp8eyNFRfg"
server="gitlab.com"
    
# go to https://gitlab.com/[organization name]/[repository name]/-/jobs
# then open JavaScript console
# copy/paste => copy(_.uniq($('.ci-status').map((x, e) => /([0-9]+)/.exec(e.href)).toArray()).join(' '))
# press enter, and then copy the result here :
# repeat for every page you want
job_ids=(48875658 48874137 48873496 48872419)
    
for job_id in ${job_ids[@]}
do
     URL="https://$server/api/v4/projects/$project_id/jobs/$job_id/erase"
     echo "$URL"
     curl --request POST --header "PRIVATE-TOKEN:${token}" "$URL"
     echo "\n"
done
Tactless answered 24/2, 2018 at 11:11 Comment(4)
This console command worked for me: copy([...$('a[title="Download artifacts"],a[download]').map((x, e) => /\/([0-9]+)\//.exec(e.href)[1])].join(' ')).Woothen
This also deletes the job log according to the documentation. Since GitLab 11.9 there is a separate API endpoint for deleting only the artifacts, maybe you would like to update your answer?Canna
Note that this will delete the jobs along with the artifacts. See my answer for a way to keep the jobs and delete only the artifacts.Graft
The provided js commandline script does not work.Isabel
G
17

Building on top of @David 's answer, @Philipp pointed out that there is now an api endpoint to delete only the job artifacts instead of the entire job.

You can run this script directly in the browser's Dev Tools console, or use node-fetch to run in node.js.

//Go to: https://gitlab.com/profile/personal_access_tokens
const API_KEY = "API_KEY";

//You can find project id inside the "General project settings" tab
const PROJECT_ID = 12345678;
const PROJECT_URL = "https://gitlab.com/api/v4/projects/" + PROJECT_ID + "/"

let jobs = [];
for(let i = 0, currentJobs = []; i == 0 || currentJobs.length > 0; i++){
    currentJobs = await sendApiRequest(
        PROJECT_URL + "jobs/?per_page=100&page=" + (i + 1)
    ).then(e => e.json());
    jobs = jobs.concat(currentJobs);
}

//skip jobs without artifacts
jobs = jobs.filter(e => e.artifacts);

//keep the latest build.
jobs.shift();

for(let job of jobs)
    await sendApiRequest(
        PROJECT_URL + "jobs/" + job.id + "/artifacts",
        {method: "DELETE"}
    );

async function sendApiRequest(url, options = {}){
    if(!options.headers)
        options.headers = {};
    options.headers["PRIVATE-TOKEN"] = API_KEY;

    return fetch(url, options);
}
Graft answered 1/5, 2020 at 22:11 Comment(2)
I get the following error when trying to use yours above in Chrome console Uncaught SyntaxError: Unexpected end of JSON input at <anonymous>:28:44 at async <anonymous>:21 (anonymous) @ VM62:28Essayist
@Essayist I fixed the script, can you try again?Graft
L
11

According to the documentation, deleting the entire job log (click on the trash can) will also delete the artifacts.

Literate answered 23/1, 2019 at 9:38 Comment(0)
W
4

I am on GitLab 8.17 and am able to remove artifacts for particular job by navigating to storage directory on server itself, default path is:

/var/opt/gitlab/gitlab-rails/shared/artifacts/<year_month>/<project_id?>/<jobid>

Removing both whole folder for job or simply contents, disappears artifact view from GitLab pipline page.

The storage path can be changed as described in docs:
https://gitlab.com/gitlab-org/gitlab-ce/blob/master/doc/administration/job_artifacts.md#storing-job-artifacts

Whipstall answered 1/3, 2017 at 12:37 Comment(2)
Unfortunately, I have no direct access to the server as the repository is hosted at gitlab.com. I updated my question to point that out in more detail.Pleiades
According to GitLab, you'll want to delete through the gitlab-rails console if you have access to the GitLab server itself. Otherwise, you may see discrepencies when looking at project size in the admin UI because the underlying database isn't updated. Ref: gitlab.com/gitlab-org/gitlab-ce/issues/5572#note_3359570Tactless
G
4

If you don't mind removing entire jobs along with their artifacts in bulk, you can use the glab CLI like this:

glab ci delete --dry-run --older-than 8760h --paginate

This removes all jobs older than 1 year. Just remove --dry-run to make it happen.

The artifacts seem to be deleted asynchronously, so it may take some time for your repository's storage usage to be updated.

Geneticist answered 16/1 at 8:16 Comment(0)
A
2

If you have deleted all the jobs by accident (thinking the artifacts would be gone, but they didn't) what would be the alternative then brute-forcing a loop range?

I have this code, which does bruteforce on a range of numbers. But since I use the gitlab.com public runners, It's a long-range

    # project_id, find it here: https://gitlab.com/[organization name]/[repository name]/edit inside the "General project settings" tab
project_id="xxxxxx" #

# token, find it here: https://gitlab.com/profile/personal_access_tokens
token="yyyyy"
server="gitlab.com"


# Get a range of the oldest known job and the lastet known one, then bruteforce. Used in the case when you deleted pipelines and can't retrive Job Ids.

# https://mcmap.net/q/409272/-for-loop-over-sequence-of-large-numbers-in-bash-duplicate
for (( job_id = 59216999; job_id <= 190239535; job_id++ )) do
echo "$job_id"

echo Job ID being deleted is "$job_id"

curl --request POST --header "PRIVATE-TOKEN:${token}" "https://${server}/api/v4/projects/${project_id}/jobs/${job_id}/erase"
echo -en '\n'
echo -en '\n'
done
Austriahungary answered 17/9, 2020 at 18:16 Comment(0)
K
2

This Python solution worked for me with GitLab 13.11.3.

#!/bin/python3
# delete_artifacts.py  

import json
import requests

# adapt accordingly
base_url='https://gitlab.example.com'
project_id='1234'
access_token='123412341234'

#
# Get Version Tested with Version 13.11.3
# cf. https://docs.gitlab.com/ee/api/version.html#version-api
#
print(f'GET /version')
x= (requests.get(f"{base_url}/api/v4/version", headers = {"PRIVATE-TOKEN": access_token }))
print(x)
data=json.loads(x.text)
print(f'Using GitLab version {data["version"]}. Tested with 13.11.3')

#
# List project jobs
# cf. https://docs.gitlab.com/ee/api/jobs.html#list-project-jobs
#
request_str=f'projects/{project_id}/jobs'
url=f'{base_url}/api/v4/{request_str}'
print(f'GET /{request_str}')
x= (requests.get(url, headers = {"PRIVATE-TOKEN": access_token }))
print(x)
data=json.loads(x.text)

input('WARNING: This will delete all artifacts. Job logs will remain be available. Press Enter to continue...' )

#
# Delete job artifacts
# cf. https://docs.gitlab.com/ee/api/job_artifacts.html#delete-artifacts
#
for entry in data:
    request_str=f'projects/{project_id}/jobs/{entry["id"]}/artifacts'
    url=f'{base_url}/api/v4/{request_str}'
    print(f'DELETE /{request_str}')
    x = requests.delete(url, headers = {"PRIVATE-TOKEN": access_token })
    print(x)

I'll keep an updated version here. Feel free to reach out and improve the code.

Kristeenkristel answered 16/5, 2021 at 14:32 Comment(0)
C
0

As you said, it's possible to change the retention time by adding a artifacts:expire_in field in the job settings.

In my testing in Gitlab 15.10.2-ee, the setting is applied retroactively to all matching jobs in the history. The deletion is not instant: it presumably happens once some scheduled job runs, probably once a day.

You could also change the instance-wide setting but that doesn't apply to past artifacts.

Countershading answered 22/12, 2023 at 8:50 Comment(1)
There is the parameter expire_build_artifacts_worker_cron in gitlab.rb.Cresa
L
0

Although the answers here are pretty nicely summarised, I am just adding on the Python script used by me for manually cleaning only the artefacts on the latest version of GitLab linux installation (v16.10.3-ee).

I first retrieved the number of pages and projects I have in our GitLab environment, using the values "x-total-pages" and "x-total" of the command curl https://gitlab.company/api/v4/projects?private_token=<token> --head. Then, I iterated through the GitLab projects in the paginated manner, and retrieved the project-IDs into a list. Lastly, I iterated through this list to perform the necessary artefact cleanup.

# This is a sample Python script referenced on the idea from https://mcmap.net/q/400016/-remove-artifacts-from-ci-manually

import json
import requests


def clean_gitlab_artefact():
    base_url = "https://gitlab.company"
    access_token = "access-token"  # check with your GitLab project owner
    print(f'GET /version')
    x = (requests.get(f"{base_url}/api/v4/version", headers={"PRIVATE-TOKEN": access_token}))
    print(x)
    data = json.loads(x.text)
    print(f'Using GitLab version {data["version"]}. Implemented on 16.10.3-ee!')

    # # there were 173 projects at the time of running this script, which can be checked by exploring the
    # # value "x-total" of the command `curl https://gitlab.company/api/v4/projects?private_token=<token> --head`
    page = 1
    total_project_ids = []
    while page != 3:
        print(f'GET /project-IDs')
        projects = (requests.get(f"{base_url}/api/v4/projects?per_page=100&page={page}",
                                 headers={"PRIVATE-TOKEN": access_token}))
        page += 1
        # print(projects)
        data = json.loads(projects.text)
        project_ids = [o["id"] for o in data]
        total_project_ids += project_ids
    print(total_project_ids)

    for project_id in total_project_ids:
        request_str = f'projects/{project_id}/artifacts'
        url = f'{base_url}/api/v4/{request_str}'
        print(f'DELETE /{request_str}')
        x = (requests.delete(url, headers={"PRIVATE-TOKEN": access_token}))
        print(x)

if __name__ == '__main__':
    clean_gitlab_artefact()

And, please find below the output of the script which returns soon (with an accepted response 202 if the artefacts would be deleted asynchronously), as cleaning will happen in the background asynchronously, as also explained in other answers here. enter image description here

Lassalle answered 17/4 at 22:53 Comment(1)
H
0

None of the API based solutions worked for me because the DELETE API only sets the expiry date on the build. Then it is upto sidekiq to perform the deletion. If there is a bug in gitlab that ignores the expiry date on artifacts even if it is set, then nothing will happen.

Builds can also be removed manually with rails runner. The below script cleans up both artifacts as well as job logs that are older than 1 month. Note that it only does so for the top 20 projects that are consuming diskpace

#!/usr/bin/env ruby

# This is a ruby script to delete build artifacts from gitlab that are older than 1 month
# Copy this file to /tmp/ folder on gitlab server then execute rails runner as below
# gitlab-rails runner /tmp/cleanupArtifacts.rb

include ActionView::Helpers::NumberHelper


ProjectStatistics.order(build_artifacts_size: :desc).limit(20).each do |s|
 builds_artifacts =  s.project.builds.with_downloadable_artifacts
  counter=1
  builds_artifacts.find_each do |build|
    counter=counter+1
    puts "Build #{build.id} \t created at #{build.created_at}"

    if build.created_at < 1.month.ago
      puts "Build #{build.id} marked for deletion"
      build.destroy!
    end

  end
  puts "#{number_to_human_size(s.build_artifacts_size)} \t #{s.project.full_path} \t Builds: #{counter}"

end
Hatch answered 4/5 at 9:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.