c#: how to read parts of a file? (DICOM)

Z

5

13

I would like to read a DICOM file in C#. I don't want to do anything fancy, I just for now would like to know how to read in the elements, but first I would actually like to know how to read the header to see if is a valid DICOM file.

It consists of Binary Data Elements. The first 128 bytes are unused (set to zero), followed by the string 'DICM'. This is followed by header information, which is organized into groups.

A sample DICOM header

First 128 bytes: unused DICOM format.
Followed by the characters 'D','I','C','M'
Followed by extra header information such as:

0002,0000, File Meta Elements Groups Len: 132
0002,0001, File Meta Info Version: 256
0002,0010, Transfer Syntax UID: 1.2.840.10008.1.2.1.
0008,0000, Identifying Group Length: 152
0008,0060, Modality: MR
0008,0070, Manufacturer: MRIcro

In the above example, the header is organized into groups. The group 0002 hex is the file meta information group which contains 3 elements: one defines the group length, one stores the file version and the their stores the transfer syntax.

Questions

How to I read the header file and verify if it is a DICOM file by checking for the 'D','I','C','M' characters after the 128 byte preamble?
How do I continue to parse the file reading the other parts of the data?

Zloty answered 4/3, 2010 at 19:7 Comment(1)

I think it will be better to use ReadLine() instead of reading the file byte by byte. Each record seems to be on a different line – Unmoving 5/3, 2010 at 15:44

B

12

Something like this should read the file, its basic and doesn't handle all cases, but it would be a starting point:


public void ReadFile(string filename)
{
    using (FileStream fs = File.OpenRead(filename))
    {
        fs.Seek(128, SeekOrigin.Begin);
        if ((fs.ReadByte() != (byte)'D' ||
             fs.ReadByte() != (byte)'I' ||
             fs.ReadByte() != (byte)'C' ||
             fs.ReadByte() != (byte)'M'))
        {
            Console.WriteLine("Not a DCM");
            return;
        }
        BinaryReader reader = new BinaryReader(fs);

        ushort g;
        ushort e;
        do
        {
            g = reader.ReadUInt16();
            e = reader.ReadUInt16();

            string vr = new string(reader.ReadChars(2));
            long length;
            if (vr.Equals("AE") || vr.Equals("AS") || vr.Equals("AT")
                || vr.Equals("CS") || vr.Equals("DA") || vr.Equals("DS")
                || vr.Equals("DT") || vr.Equals("FL") || vr.Equals("FD")
                || vr.Equals("IS") || vr.Equals("LO") || vr.Equals("PN")
                || vr.Equals("SH") || vr.Equals("SL") || vr.Equals("SS")
                || vr.Equals("ST") || vr.Equals("TM") || vr.Equals("UI")
                || vr.Equals("UL") || vr.Equals("US"))
               length = reader.ReadUInt16();
            else
            {
                // Read the reserved byte
                reader.ReadUInt16();
                length = reader.ReadUInt32();
            }

            byte[] val = reader.ReadBytes((int) length);

        } while (g == 2);

        fs.Close();
    }

    return ;
}

The code does not actually try and take into account that the transfer syntax of the encoded data can change after the group 2 elements, it also doesn't try and do anything with the actual values read in.

Briefs answered 5/3, 2010 at 4:19 Comment(1)

Thanks, Xaisoft. If you really want to look, I copied some of this code from the ClearCanvas libraries, which are based on an early version of mDCM. You can look at the ClearCanvas.Dicom.IO.DicomStreamReader class to see our parser, which is obviously a lot more complex than just this snippet. – Briefs 5/3, 2010 at 15:37

S

1

Just some pseudologic

How to I read the header file and verify if it is a DICOM file by checking for the 'D','I','C','M' characters after the 128 byte preamble?

Open as binary file, using File.OpenRead
Seek to position 128 and read 4 bytes into the array and compare it againts byte[] value for DICM. You can use ASCIIEncoding.GetBytes() for that

How do I continue to parse the file reading the other parts of the data?

Continue reading the file using Read or ReadByte using the FileStream object handle that you have earlier
Use the same method like above to do your comparison.

Dont forget to close and dispose the file.

Sticktight answered 5/3, 2010 at 4:28 Comment(0)

L

1

Taken from EvilDicom.Helper.DicomReader from the Evil Dicom library:

 public static bool IsValidDicom(BinaryReader r)
    {
        try
        {
            //128 null bytes
            byte[] nullBytes = new byte[128];
            r.Read(nullBytes, 0, 128);
            foreach (byte b in nullBytes)
            {
                if (b != 0x00)
                {
                    //Not valid
                    Console.WriteLine("Missing 128 null bit preamble. Not a valid DICOM file!");
                    return false;
                }
            }
        }
        catch (Exception)
        {

            Console.WriteLine("Could not read 128 null bit preamble. Perhaps file is too short");
            return false;
        }

        try
        {
            //4 DICM characters
            char[] dicm = new char[4];
            r.Read(dicm, 0, 4);
            if (dicm[0] != 'D' || dicm[1] != 'I' || dicm[2] != 'C' || dicm[3] != 'M')
            {
                //Not valid
                Console.WriteLine("Missing characters D I C M in bits 128-131. Not a valid DICOM file!");
                return false;
            }
            return true;

        }
        catch (Exception)
        {

            Console.WriteLine("Could not read DICM letters in bits 128-131.");
            return false;
        }

    }

Labdanum answered 6/9, 2011 at 15:5 Comment(0)

A

0

you can also use like this.

FileStream fs = File.OpenRead(path);

byte[] data = new byte[132];
fs.Read(data, 0, data.Length);

int b0 = data[0] & 255, b1 = data[1] & 255, b2 = data[2] & 255, b3 = data[3] & 255;

if (data[128] == 68 && data[129] == 73 && data[130] == 67 && data[131] == 77)
        {
           //dicom file
        }
        else if ((b0 == 8 || b0 == 2) && b1 == 0 && b3 == 0)
        {
            //dicom file
        }

Adynamia answered 21/4, 2010 at 7:11 Comment(0)

P

0

This code will read most important dicom tags and store it in a json and excel. It will identify studies within given directory using folder name, dicom study id, and date. It will create a dictionary for each study, containing most useful information, along with a dictionary for dicom series (each dicom study hast so many dicom images, each belong to series). My code is here And here is the full code with descriptions. Oh, you can add or remove your desired information.

#FINAL 20231216
#My context: I coded this on my windows11 with RTC3080Ti and Corei9-12gen and 32G Ram. I am coding on VS code and using jupyter notebook.
#Your requirment: It doesn't need any exceptional hardward you can run it on an average pc/labtob

import pydicom as pm #for reading dicoms
import os #for looping through system direcotries
from pydicom.multival import MultiValue #for reading dicom metadata
from pydicom.valuerep import PersonName #since tunring dictionary to json raised an error you should use this
from tqdm.notebook import tqdm #for that fancy loop progress, I like it though
import pandas as pd #for tunring dic to excel, first we trasnform it to pandas dataframe
import json #for storing as json

from IPython.display import HTML #so you can click on the sotred excel and json and open it from jupyter notebook

def get_dicom_tag_value(dicom_file, tag, default=None):
    '''this function will get the dicom tag from the dicom filde for the given tag/code'''
    tag_value = dicom_file.get(tag, None)
    if tag_value is None:
        return default
    if isinstance(tag_value, MultiValue):
        return list(tag_value)  # Convert MultiValue to list
    return tag_value.value

def get_path_to_first_subfolder(full_path, first_subfolder):
    """this will get the path to the first folder of root, which is the subfolder that contains all dicom filed of one dicom study """
    path_parts = full_path.split(os.sep)
    if first_subfolder in path_parts:
        subfolder_index = path_parts.index(first_subfolder)
        return os.sep.join(path_parts[:subfolder_index + 1])
    else:
        return full_path

def count_subfolders(directory):
    '''this will cont the number of files and folders within a direcotyr'''
    total_subfolders = 0
    total_files=0
    for root, dirs, files in os.walk(directory):
        total_subfolders += len(dirs)
        total_files += len(files)
    return total_subfolders,total_files 


class CustomJSONEncoder(json.JSONEncoder): #this class will turn our multilevel dictionary into a json file
    def default(self, obj):
        if isinstance(obj, MultiValue):
            return list(obj)  # Convert MultiValue to list
        elif isinstance(obj, PersonName):
            return str(obj)   # Convert PersonName to string
        return json.JSONEncoder.default(self, obj)

def ensure_json_extension(directory): 
    '''this function will ensure that definied json direcotry contains the required extension, otherwise, it will add this to the end of definied dir'''
    if not directory.endswith(".json"):
        return directory + "\\JSON.json"
    return directory

def ensure_excel_extension(directory):
    '''this function will ensure that definied excel direcotry contains the required extension, otherwise, it will add this to the end of definied dir'''
    if not directory.endswith(".xlsx"):
        return directory + "\\excel.xlsx"
    return directory

def create_clickable_dir_path(dir_path):
    # Convert the directory path to a file URL
    file_url = f"{dir_path}"
    return HTML(f'<a href="{file_url}" target="_blank">{dir_path}</a>')



def get_dicomdir_give_dicomdicom_datadic(dicom_dir, #direcotry that you want to read, usually dicom studies should be in one folder, preferably with patient unique id/name
                                     dicom_validation=True, #this will check wether the file in the loop is dicom or not. Although make it slower, I recommend using it to ensure only dicom files go through loop 
                                     folder_list_name_indicomdir=None, #In your dicom_dir you can include list of folders name that you want to read. It will not read other folders. Kepp in mind that this will look into subfolders in the main folder, and not the subfolders of subfolders :)
                                     store_as_json_dir=None, #if you want to store your ditionary as json, give your desired json direcotry
                                     store_as_excel_dir=None #if you want to store your ditionary as excel, give your desired excel direcotry
                                     ):
    """
    This function creates a multi-level dictionary for DICOM meta data (named dicom_data) in a directory (named dicom_dir).
    The top level has the last component of dicom_dir, which is the first level subfolder, as a key.
    For each subforled it will store study data within this dic, along with another dicitonary for series data, within this study dictionary.
    For series dictionary the data corresponding for series number will be stored.
    We also have another private_info dictionary within subfodler dictionary.
    
    - dicom_validation: If you set dicom_validation=True, it will validate the file in the loop for being an dicom file. This is super important although it makes code slower.
    Becaouse, sometimes some dicom files have no extension, and also reading other files may cause error in the loop.
    
    - folder_list_name_indicomdir: #In your dicom_dir you can include list of folders name that you want to read. It will not read other folders. Kepp in mind that this will look into subfolders in the main folder, and not the subfolders of subfolders :)
    
    - store_as_json_dir: if you want to store your ditionary as json, give your desired json direcotry
    
    - store_as_excel_dir: if you want to store your ditionary as excel, give your desired excel direcotry
    
    For using this function, the best practice is to place each folder containing one dicom study in subfolder, under the dicom_dir. 
    However, you can change finding unique dicom studies, even placed next to each other beacouse I definied the study_unique=f'{first_subfolder}_{study_id}_{study_date}'.
    If you want your code to be faster you can chane the study_unique to study_unique=first_subfolder. It makes your code 15% faster, sometimes at the cost of incurrect retrival.
    
    """

    total_subfolder,total_files=count_subfolders(dicom_dir)
    print(f'your direcotry contains {total_subfolder} folders and {total_files} files')
    
    last_dir_name = os.path.basename(os.path.normpath(dicom_dir))
    dicom_data = {last_dir_name: {}}

    for root, dirs, files in tqdm(os.walk(dicom_dir), desc="Processing directories", total=total_subfolder,unit='folder'):
        if folder_list_name_indicomdir:
            split_path = root.replace(dicom_dir, '').split(os.sep)
            first_subfolder = split_path[1] if len(split_path) > 1 else ""
            if first_subfolder not in folder_list_name_indicomdir:
                print(f"""The folder {first_subfolder} was not in your definied list.""")
                continue  # Skip if the first subfolder is not in the user-defined list
            
        for file in files:
            if dicom_validation and not pm.misc.is_dicom(os.path.join(root, file)):
                continue # Skip if the it is not dicom file
                   

            try:
                dicom_file = pm.dcmread(os.path.join(root, file))
                study_id = get_dicom_tag_value(dicom_file, (0x0020, 0x0010))
                dicom_data_number = get_dicom_tag_value(dicom_file, (0x0020, 0x0011))
                study_date = get_dicom_tag_value(dicom_file, (0x0008, 0x0020))
                split_path = root.replace(dicom_dir, '').split(os.sep)
                first_subfolder = split_path[1] if len(split_path) > 1 else ""
                if study_id and dicom_data_number and study_date:
                    study_unique = f'{first_subfolder}_{study_id}_{study_date}' #you can change it for increasing the speed > study_unique=first_subfolder
                    if study_unique not in dicom_data[last_dir_name]:
                        private_info={'name': get_dicom_tag_value(dicom_file, (0x0010, 0x0010)),
                                      'institute': get_dicom_tag_value(dicom_file, (0x0008, 0x0080)),
                                      'patient_id': get_dicom_tag_value(dicom_file, (0x0010, 0x0020)),
                                      'accession_number':get_dicom_tag_value(dicom_file, (0x0008, 0x0050))
                                      }
                        
                        dicom_data[last_dir_name][study_unique] = {
                            'dir_to_root': get_path_to_first_subfolder(root, first_subfolder),
                            'study_description': get_dicom_tag_value(dicom_file, (0x0008, 0x1030)),
                            'date': study_date,
                            'age': get_dicom_tag_value(dicom_file, (0x0010, 0x1010)),
                            'sex': get_dicom_tag_value(dicom_file, (0x0010, 0x0040)),
                            'manufacture_model': get_dicom_tag_value(dicom_file, (0x0008, 0x1090)),
                            'manufacture_brand': get_dicom_tag_value(dicom_file, (0x0008, 0x0070)),
                            'manufacture_brand': get_dicom_tag_value(dicom_file, (0x0008, 0x0070)),
                            'protocol': get_dicom_tag_value(dicom_file, (0x0018, 0x1030)),
                            'study_id': study_id,
                            'patient_weight': get_dicom_tag_value(dicom_file, (0x0010, 0x1030)),
                            'Image_type': get_dicom_tag_value(dicom_file, (0x0008, 0x0008)),
                            'body_part': get_dicom_tag_value(dicom_file, (0x0018, 0x0015)),
                            'modalitty':get_dicom_tag_value(dicom_file, (0x0008, 0x0050)),
                            'private_info':private_info,
                            'image_dicom_data_list': {}
                        }

                    

                    dicom_data_info = {
                        'dicom_data_description': get_dicom_tag_value(dicom_file, (0x0008, 0x103E)),
                        'body_part': get_dicom_tag_value(dicom_file, (0x0018, 0x0015)),
                        'slice_thickness': get_dicom_tag_value(dicom_file, (0x0018, 0x0050)),
                        'Image_comment': get_dicom_tag_value(dicom_file, (0x0020, 0x4000)),
                        'kvp': get_dicom_tag_value(dicom_file, (0x0018, 0x0060)),
                        'exposure': get_dicom_tag_value(dicom_file, (0x0018, 0x1152)),
                        'exposure_time': get_dicom_tag_value(dicom_file, (0x0018, 0x1150)),
                    }
                    dicom_data[last_dir_name][study_unique]['image_dicom_data_list'][dicom_data_number] = dicom_data_info

            except Exception as e:
                print(f"""Error reading for {file}::: {e} \n """)
                continue
            
    if store_as_json_dir is not None:
        try:
            json_read = json.dumps(dicom_data, indent=4, cls=CustomJSONEncoder)
            store_as_json_dir=str(store_as_json_dir)
            store_as_json_dir=ensure_json_extension(store_as_json_dir)
            with open(store_as_json_dir, 'w') as json_file:
                json_file.write(json_read)
            print(f"""Json stored at :::""")
            display(create_clickable_dir_path(store_as_json_dir))         
        except:
            print(f"""Error storing the json ::: {e} \n """)
            
    if store_as_excel_dir is not None:
        try:
            dataframes = []
            for key, value in dicom_data.items():
                # Convert value to DataFrame if necessary
                df = pd.DataFrame(value)
                # Add the key as a new column or as part of the index
                df['Key'] = key  # Add key as a column
                # df = df.set_index(['Key'], append=True)  # Add key as part of a MultiIndex
                dataframes.append(df)

            # Concatenate all dataframes
            df2 = pd.concat(dataframes).T
            store_as_excel_dir=str(store_as_excel_dir)
            store_as_excel_dir=ensure_excel_extension(store_as_excel_dir)
            df2.to_excel(store_as_excel_dir)
            print(f"""Excel stored at :::""")
            display(create_clickable_dir_path(store_as_excel_dir))          
        except:
            print(f"""Error storing the excel ::: {e} \n """)
            
                                 
    return dicom_data


#example of running code
dicom_dir=r"F:\Data\Big Pancreas (CT, EUS)\Raw Data Hospital\Dr Radmard\Valid Case" 
save_dir_json=r'F:\Data\Big Pancreas (CT, EUS)\Raw Data Hospital\Radmard_all_dcm.json'
save_dir_xlsx=r'F:\Data\Big Pancreas (CT, EUS)\Raw Data Hospital\Radmard_all_dcm.xlsx'


dicom_dic=get_dicomdir_give_dicomdicom_datadic(
    dicom_dir, #direcotry that you want to read, usually dicom studies should be in one folder, preferably with patient unique id/name
                                     dicom_validation=True, #this will check wether the file in the loop is dicom or not. Although make it slower, I recommend using it to ensure only dicom files go through loop 
                                     folder_list_name_indicomdir=None, #In your dicom_dir you can include list of folders name that you want to read. It will not read other folders. Kepp in mind that this will look into subfolders in the main folder, and not the subfolders of subfolders :)
                                     store_as_json_dir=save_dir_json, #if you want to store your ditionary as json, give your desired json direcotry
                                     store_as_excel_dir=save_dir_xlsx #if you want to store your ditionary as excel, give your desired excel direcotry
                                     )

Palmary answered 17/12, 2023 at 19:42 Comment(2)

The question was for C#, and I don't think this post adds anything the other answers haven't already added. This is an awful lot of code, too! I'm sure everyone at SO would welcome your efforts, but this time I'm afraid I don't think you quite hit the nail on the head so to speak :) – Achaea 23/12, 2023 at 11:1

@DanRayson Sorry for my mistake. I didn't know the C# at the beginning. Anyway, I thought it could help someone who wanted to extract Dicom metadata. This long code has multiple components (extraction Dicom meta, pseudonymization, and updating Dicom meta). I just wanted to share it with someone who comes to this page from a google search (like me :). Anyway, I will remove it in a day or two; thanks for your notice. – Palmary 23/12, 2023 at 15:33

Recommended topics

Hot tags