How do you edit an existing Tensorboard Training Loss summary?
Asked Answered
G

1

9

I've trained my network and generated some training/validation losses which I saved via the following code example (example of training loss only, validation is perfectly equivalent):

valid_summary_writer = tf.summary.create_file_writer("/path/to/logs/")
with train_summary_writer.as_default():
    tf.summary.scalar('Training Loss', data=epoch_loss, step=current_step)

After training I would then like to view the loss curves using Tensorboard. However because I saved the loss curves under the names 'Training Loss' and 'Validation Loss' these curves are plotted on separate graphs. I know that I should change the name to be simply 'loss' to solve this problem for future writes to the log directory. But how do I edit my existing log files for the training/validation losses to account for this?

I attempted to modify the following post's solution: https://stackoverflow.com/a/55061404 which edits the steps of a log file and re-writes the file; where my version involves changing the tags in the file. But I had no success in this area. It also requires importing older Tensorflow code through 'tf.compat.v1'. Is there a way to achieve this (maybe in TF 2.X)?

I had thought to simply acquire the loss and step values from each log directory containing the losses and write them to new log files via my previous working method, but I only managed to obtain the step, and not the loss value itself. Has anyone had any success here?

---=== EDIT ===---

I managed to fix the problem using the code from @jhedesa

I had to slightly alter the way that the function "rename_events_dir" was called as I am using Tensorflow collaboratively inside of a Google Colab Notebook. To do this I changed the final part of the code which read:

if __name__ == '__main__':
    if len(sys.argv) != 5:
        print(f'{sys.argv[0]} <input dir> <output dir> <old tags> <new tag>',
              file=sys.stderr)
        sys.exit(1)
    input_dir, output_dir, old_tags, new_tag = sys.argv[1:]
    old_tags = old_tags.split(';')
    rename_events_dir(input_dir, output_dir, old_tags, new_tag)
    print('Done')

To read this:

rootpath = '/path/to/model/'
dirlist = [dirname for dirname in os.listdir(rootpath) if dirname not in ['train', 'valid']]
for dirname in dirlist:
  rename_events_dir(rootpath + dirname + '/train', rootpath + '/train', 'Training Loss', 'loss')
  rename_events_dir(rootpath + dirname + '/valid', rootpath + '/valid', 'Validation Loss', 'loss')

Notice that I called "rename_events_dir" twice, once for editing the tags for the training loss, and once for the validation loss tags. I could have used the previous method of calling the code by setting "old_tags = 'Training Loss;Validation Loss'" and using "old_tags = old_tags.split(';')" to split the tags. I used my method simply to understand the code and how it processed the data.

Grape answered 5/2, 2020 at 16:1 Comment(0)
D
13

As mentioned in How to load selected range of samples in Tensorboard, TensorBoard events are actually stored record files, so you can read them and process them as such. Here is a script similar to the one posted there but for the purpose of renaming events, and updated to work in TF 2.x.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

# rename_events.py

import sys
from pathlib import Path
import os
# Use this if you want to avoid using the GPU
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
import tensorflow as tf
from tensorflow.core.util.event_pb2 import Event

def rename_events(input_path, output_path, old_tags, new_tag):
    # Make a record writer
    with tf.io.TFRecordWriter(str(output_path)) as writer:
        # Iterate event records
        for rec in tf.data.TFRecordDataset([str(input_path)]):
            # Read event
            ev = Event()
            ev.MergeFromString(rec.numpy())
            # Check if it is a summary
            if ev.summary:
                # Iterate summary values
                for v in ev.summary.value:
                    # Check if the tag should be renamed
                    if v.tag in old_tags:
                        # Rename with new tag name
                        v.tag = new_tag
            writer.write(ev.SerializeToString())

def rename_events_dir(input_dir, output_dir, old_tags, new_tag):
    input_dir = Path(input_dir)
    output_dir = Path(output_dir)
    # Make output directory
    output_dir.mkdir(parents=True, exist_ok=True)
    # Iterate event files
    for ev_file in input_dir.glob('**/*.tfevents*'):
        # Make directory for output event file
        out_file = Path(output_dir, ev_file.relative_to(input_dir))
        out_file.parent.mkdir(parents=True, exist_ok=True)
        # Write renamed events
        rename_events(ev_file, out_file, old_tags, new_tag)

if __name__ == '__main__':
    if len(sys.argv) != 5:
        print(f'{sys.argv[0]} <input dir> <output dir> <old tags> <new tag>',
              file=sys.stderr)
        sys.exit(1)
    input_dir, output_dir, old_tags, new_tag = sys.argv[1:]
    old_tags = old_tags.split(';')
    rename_events_dir(input_dir, output_dir, old_tags, new_tag)
    print('Done')

You would use it like this:

> python rename_events.py my_log_dir renamed_log_dir "Training Loss;Validation Loss" loss
Delicacy answered 5/2, 2020 at 16:51 Comment(5)
Thanks for the suggestion. It mainly worked to solve my problem however I had to indent the line "writer.write(ev.SerializeToString())" to the same level of the first if statement as it was only writing the last 'ev' entry to file for most of my logs.Grape
Additionally, it did give me some problems when displaying the training/validation loss curves in Tensorboard. As it connects the curves together at the start/end epochs from separate dates, I think it also tries to display the x-axis based on the time&epoch at which I trained the network. I'll post some imgur links for reference. imgur.com/s1Dn97z imgur.com/RcfM79M Do you have any suggestions?Grape
@Grape You're right, the indenting was wrong in that line, thanks for pointing that out. Avoid the curves, I have seen that in the past... did you write both "Training Loss" and "Validation Loss" summaries within both train and validation dirs (as opposed to "Training Loss" only in train and "Validation Loss" only in validation)? I guess not, but that would explain TensorBoard being confused like that after the renaming...Delicacy
I previously saved each log file in the following way: "main_logdir/date/train" so there would be, say, 5 'date' folders if I ran the training session 5 times (I have since changed it as it isn't a great storage structure if I only ever want to display all curves as one). I've also managed to fix the additional curve/strange lines problem; before I posted here I was testing storing newly created logs in my current file structure, I had forgotten to clear one of the old tfevents files, so removing that fixed the issue. Thank you again for the help, you've saved me a lot of time!Grape
Thank you for the script, @jdehesa. I tweaked it to modify files directly and work only on a single input file, so this can now be fed to find, e.g.: find . -name "*.tfevents*" -exec tb-rename-events.py {} "iteration-time" "iteration-time/iteration-time" \; the script is here.Smattering

© 2022 - 2024 — McMap. All rights reserved.