How Do You "Permanently" Delete An Experiment In Mlflow?
Asked Answered
R

9

27

Permanent deletion of an experiment isn't documented anywhere. I'm using Mlflow w/ backend postgres db

Here's what I've run:

client = MlflowClient(tracking_uri=server)
client.delete_experiment(1)

This deletes the the experiment, but when I run a new experiment with the same name as the one I just deleted, it will return this error:

mlflow.exceptions.MlflowException: Cannot set a deleted experiment 'cross-sell' as the active experiment. You can restore the experiment, or permanently delete the  experiment to create a new one.

I cannot find anywhere in the documentation that shows how to permanently delete everything.

Remunerate answered 6/2, 2020 at 6:26 Comment(0)
R
30

Unfortunately it seems there is no way to do this via the UI or CLI at the moment :-/

The way to do it depends on the type of backend file store that you are using.

Filestore:

If you are using the filesystem as a storage mechanism (the default) then it is easy. The 'deleted' experiments are moved to a .trash folder. You just need to clear that out:

rm -rf mlruns/.trash/*

As of the current version of the documentation (1.7.2), they remark:

It is recommended to use a cron job or an alternate workflow mechanism to clear .trash folder.

SQL Database:

This is more tricky, as there are dependencies that need to be deleted. I am using MySQL, and these commands work for me:

USE mlflow_db;  # the name of your database
DELETE FROM experiment_tags WHERE experiment_id=ANY(
    SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
);
DELETE FROM latest_metrics WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
    )
);
DELETE FROM metrics WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
    )
);
DELETE FROM tags WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
    )
);
DELETE FROM runs WHERE experiment_id=ANY(
    SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
);
DELETE FROM experiments where lifecycle_stage="deleted";
Rafaelof answered 26/3, 2020 at 14:5 Comment(1)
In case anyone is looking for this folder on EC2, it will be in cd ~/./mlruns/ from the root directoryLyssa
D
22

As of mlflow 1.11.0, the recommended way to permanently delete runs within an experiment is: mlflow gc [OPTIONS].

From the documentation, mlflow gc will

Permanently delete runs in the deleted lifecycle stage from the specified backend store. This command deletes all artifacts and metadata associated with the specified runs.

Dachau answered 4/8, 2020 at 12:51 Comment(3)
It does not work with SQL backend: mlflow.exceptions.MlflowException: This cli can only be used with a backend that allows hard-deleting runsDicks
Seems they are working on it by adding parameter --backend-store-uri <PATH>: mlflow.org/docs/latest/cli.html#mlflow-gc But when I run it on our sqlite backend I still get mlflow.exceptions.MlflowException: Not implemented yetAnselme
Working with postgres hosted in AWS RDS as of MLflow 1.18.0Van
D
14

I am adding SQL commands if you want to delete permanently Trash of MLFlow if you are using PostgreSQL as backend storage.

Change to your MLFlow Database, e.g. by using: \c mlflow and then:

DELETE FROM experiment_tags WHERE experiment_id=ANY(
    SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
);
DELETE FROM latest_metrics WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM metrics WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM tags WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM params WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs where experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
));
DELETE FROM runs WHERE experiment_id=ANY(
    SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
);
DELETE FROM experiments where lifecycle_stage='deleted';

The difference is, that I added the 'params' Table SQL Delete command there.

Dannydannye answered 24/11, 2020 at 7:57 Comment(0)
I
4

Extending @Lee Netherton's answer, you can use PyMySQL to execute those queries and remove all metadata from MLFlow tracking server after deleting the experiment from the MLFlow tracking client.

import pymysql

def perm_delete_exp():
    connection = pymysql.connect(
        host='localhost',
        user='user',
        password='password',
        db='mlflow',
        cursorclass=pymysql.cursors.DictCursor)
    with connection.cursor() as cursor:
        queries = """
            USE mlflow;
            DELETE FROM experiment_tags WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted");
            DELETE FROM latest_metrics WHERE run_uuid=ANY(SELECT run_uuid FROM runs WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted"));
            DELETE FROM metrics WHERE run_uuid=ANY(SELECT run_uuid FROM runs WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted"));
            DELETE FROM tags WHERE run_uuid=ANY(SELECT run_uuid FROM runs WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted"));
            DELETE FROM runs WHERE experiment_id=ANY(SELECT experiment_id FROM experiments where lifecycle_stage="deleted");
            DELETE FROM experiments where lifecycle_stage="deleted";
        """
        for query in queries.splitlines()[1:-1]:
            cursor.execute(query.strip())
    connection.commit()
    connection.close()

You can (perhaps should) execute the entire query at once, but I found debugging it easier this way.

Implosion answered 15/11, 2020 at 14:0 Comment(0)
Z
1

Unfortunately, the SQL commands above did not work with SQLITE in my case. Here is the SQL version working with sqlite in a database IDE by replacing the "any" commands with "in":

DELETE FROM experiment_tags WHERE experiment_id in (
    SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
    );
DELETE FROM latest_metrics WHERE run_uuid in (
    SELECT run_uuid FROM runs WHERE experiment_id in (
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM metrics WHERE run_uuid in (
    SELECT run_uuid FROM runs WHERE experiment_id in (
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM tags WHERE run_uuid in (
    SELECT run_uuid FROM runs WHERE experiment_id in (
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM params WHERE run_uuid in (
    SELECT run_uuid FROM runs where experiment_id in (
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
));
DELETE FROM runs WHERE experiment_id in (
    SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
);
DELETE FROM experiments where lifecycle_stage='deleted';
Zoon answered 18/7, 2021 at 18:26 Comment(0)
G
1

If you are using S3 as backend store for artifacts and have a EC2 server for tracking, this is my workaround to delete full experiments 'folders'. You have to delete the experiment both on the artifact store (S3) and on the backend store (database hosted on EC2)

Permanently delete experiments via list of experiment ids:

def permanently_delete_experiments_on_mlflow(list_of_experiments_id: list):
    mlflow_client = MlflowClient(tracking_uri=YOUR_EC2_TRACKING_URI)
    commands = []
    for experiment_id in list_of_experiments_id:
        print(f'deleting experiment {experiment_id}')
        os.system(f"aws s3 rm {YOUR_S3_ARTIFACTS_STORE} "
                  f"--recursive --exclude '*' --include '{experiment_id}/*'")
        try:
            mlflow_client.delete_experiment(experiment_id)
        except Exception as e:
            print_red(f'failed to execute mlflow_client.delete_experiment({experiment_id}) \n {str(e)}')
        commands.append(f"YOUR_PATH_TO_DATABASE_ON_EC2{os.sep}database.db{os.sep}{experiment_id} ")
        commands.append(f"YOUR_PATH_TO_DATABASE_ON_EC2{os.sep}database.db{os.sep}.trash{os.sep}{experiment_id} ")
    # format commands to send via ssh to EC2
    commands = f"ssh -i {YOUR_EC2_SSH_KEY_PATH} ubuntu@{YOUR_EC2_IP} rm -r " \
               + ' '.join(commands)
    print('executing on EC2 the following command: \n   ', commands)
    result = subprocess.Popen(commands, shell=True, stdout=subprocess.PIPE, stdin=subprocess.PIPE)
    response, err = result.communicate()
    print('response:', response)

Note that for this to work you need to have AWS CLI installed.

It basically runs a shell command from Python that does the trick. As a side note, the mlflow tracking with EC2 creates 'folders' both on the EC2 database and on S3 named according to the experiment id which contains a 'subfolder' for each run id corresponding to that experiment. The code above relies on this structure.

Gouache answered 5/1, 2023 at 15:58 Comment(0)
L
1

If you use SQlite, you can permanently delete experiment 42 as follows:

mlflow gc --backend-store-uri sqlite:////path/to/mlflow.db --experiment-ids 42

Also see mlflow gc documentation.

Lapides answered 9/2, 2023 at 11:24 Comment(0)
Y
0

Just leaving this here for the mssql folks who might need this:

DELETE FROM mlflow.dbo.experiment_tags WHERE experiment_id in (
    SELECT experiment_id FROM mlflow.dbo.experiments where lifecycle_stage='deleted'
    );
DELETE FROM mlflow.dbo.latest_metrics WHERE run_uuid in (
    SELECT run_uuid FROM mlflow.dbo.runs WHERE experiment_id in (
        SELECT experiment_id FROM mlflow.dbo.experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM mlflow.dbo.metrics WHERE run_uuid in (
    SELECT run_uuid FROM mlflow.dbo.runs WHERE experiment_id in (
        SELECT experiment_id FROM mlflow.dbo.experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM mlflow.dbo.tags WHERE run_uuid in (
    SELECT run_uuid FROM mlflow.dbo.runs WHERE experiment_id in (
        SELECT experiment_id FROM mlflow.dbo.experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM mlflow.dbo.params WHERE run_uuid in (
    SELECT run_uuid FROM mlflow.dbo.runs where experiment_id in (
        SELECT experiment_id FROM mlflow.dbo.experiments where lifecycle_stage='deleted'
));
DELETE FROM mlflow.dbo.runs WHERE experiment_id in (
    SELECT experiment_id FROM mlflow.dbo.experiments where lifecycle_stage='deleted'
);
DELETE FROM mlflow.dbo.experiments where lifecycle_stage='deleted';
Yuk answered 3/12, 2023 at 22:4 Comment(0)
D
0

Adding to @dominik-franek,

DELETE FROM datasets WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage='deleted');

This is also required. Im using MLFlow 2.14.x

Deadfall answered 18/7 at 14:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.