Is there a way to prevent repetition of similar blocks of code in these hashing functions?
Asked Answered
O

5

5

I created this program to calculate the sha256 or sha512 hash of a given file and digest calculations to hex.

It consists of 5 files, 4 are custom modules and 1 is the main.

I have two functions in different modules but the only difference in these functions is one variable. See below:

From sha256.py

def get_hash_sha256():
    global sha256_hash
    filename = input("Enter the file name: ")
    sha256_hash = hashlib.sha256()
    with open(filename, "rb") as f:
        for byte_block in iter(lambda: f.read(4096),b""):
            sha256_hash.update(byte_block)
#       print("sha256 valule: \n" + Color.GREEN + sha256_hash.hexdigest())
        print(Color.DARKCYAN + "sha256 value has been calculated")
        color_reset()

From sha512.py

def get_hash_sha512():
    global sha512_hash
    filename = input("Enter the file name: ")
    sha512_hash = hashlib.sha512()
    with open(filename, "rb") as f:
        for byte_block in iter(lambda: f.read(4096),b""):
            sha512_hash.update(byte_block)
#       print("sha512 valule: \n" + Color.GREEN + sha512_hash.hexdigest())
        print(Color.DARKCYAN + "sha512 value has been calculated")
        color_reset()

These functions are called in my simple_sha_find.py file:

def which_hash():
    sha256_or_sha512 = input("Which hash do you want to calculate: sha256 or sha512? \n")
    if sha256_or_sha512 == "sha256":
        get_hash_sha256()
        verify_checksum_sha256()
    elif sha256_or_sha512 == "sha512":
        get_hash_sha512()
        verify_checksum_sha512()
    else:
        print("Type either sha256 or sha512. If you type anything else the program will close...like this.")
        sys.exit()

if __name__ == "__main__":
    which_hash()

As you can see, the functions that will be called are based on the users input. If the user types sha256, then it triggers the functions from sha256.py, but if they type sha512 then they trigger the functions from sha512.py

The application works, but I know I can make it less redundant but I do not know how.

How can I define the get_hash_sha---() and verify_checksum_sha---() functions once and they perform the appropriate calculations based on whether the user chooses sha256 or sha512?

I have performed a few variations of coding this program.

I have created it as one single file as well as creating different modules and calling functions from these modules.

In either case I've had the repetition but I know that tends to defeat the purpose of automation.

Occasionalism answered 6/10, 2024 at 6:27 Comment(0)
A
5

You could union these 2 functions into a single one:

import hashlib

def get_hash(hash_type):
    if hash_type == 'sha256':
        hash_obj= hashlib.sha256()
    elif hash_type == 'sha512':
        hash_obj = hashlib.sha512()
    else:
        print("Invalid hash type.Please choose 'sha256'or'sha512'")
        return

    filename = input("Enter the fileename:  ")
    try:
        with open(filename,"rb") as f:
            for byte_block in iter(lambda: f.read(4096), b""):
                hash_obj.update(byte_block)
        print(Color.DARKCYAN + f"{hash_type} value has been calculated")
        color_reset()
    except FileNotFoundError:
        print(f"File '{filename}' not found.")

def which_hash():
    sha_type =input("Which hash do you want to calculate: sha256 or sha512? \n").lower()
    if sha_type in ['sha256', 'sha512']:
        get_hash(sha_type)
        verify_checksum(sha_type)
    else:
        print("Type sha256 or sha512. If you type anything else program will close. .")
        sys.exit()

if __name__ == "__main__":
    which_hash() 

Also its a best practice to use Enum instead of plain text:

from enum import Enum

class HashType(Enum):
    SHA256 = 'sha256'
    SHA512 = 'sha512'

So you could change

if hash_type == HashType.SHA256:
    hash_obj = hashlib.sha256()
elif hash_type == HashType.SHA512:
    hash_obj = hashlib.sha512()
def which_hash():
    sha_type_input = input("Which hash do you want to calculate: sha256 or sha512? \n").lower()
    
    try:
        sha_type = HashType(sha_type_input)
        get_hash(sha_type)
        verify_checksum(sha_type)
    except ValueError:
        print("Type either sha256 or sha512. If you type anything else the program will close.")
        sys.exit()
Abate answered 6/10, 2024 at 6:41 Comment(0)
R
2

You can refactor the functions to make the type of hash a parameter. Probably also avoid the use of global variables, and leave any interactive I/O to the calling code.

I have also changed the code to raise an error when there is a problem. Merely printing an error message is fine for very simple programs, but reusable code needs to properly distinguish between success and failure.

def get_hash(hash_type, filename):
    if hash_type == 'sha256':
        hash_obj = hashlib.sha256()
    elif hash_type == 'sha512':
        hash_obj = hashlib.sha512()
    else:
        raise ValueError("Invalid hash type. Please choose 'sha256' or 'sha512'")

    # Don't trap the error
    with open(filename,"rb") as f:
        for byte_block in iter(lambda: f.read(4096), b""):
                hash_obj.update(byte_block)

    # Return the result
    return hash_obj.hexdigest()

def which_hash():
    sha_type = input("Which hash do you want to calculate: sha256 or sha512? \n").lower()
    if sha_type in ['sha256', 'sha512']:
        filename = input("File name: ")
        digest = get_hash(sha_type, filename)

        print(f"{Color.DARKCYAN}{hash_type} value has been calculated")
        color_reset()

        verify_checksum(digest, sha_type, filename)
    else:
        raise ValueError("Type sha256 or sha512")

This sort of "case and call" is precisely what object-oriented programming was designed to avoid, but perhaps it's too early in your programming journey to tackle that topic.

Reproof answered 6/10, 2024 at 7:32 Comment(3)
@s3dev oops, thanks! I still kept the raise because the function could be called by other code.Reproof
I actually did read about not using global variables in my research but every time I tried not to use the global variable, I would receive an error code. ill post another question regarding the global variable issueOccasionalism
and thanks for bringing raise to my attentionOccasionalism
E
2

You can generalise the function that generates the hash by passing the relevant hashing function as an argument.

Something like this:

from hashlib import sha256, sha512
from typing import Callable

HASH_MAP: dict[str, Callable] = {"sha256": sha256, "sha512": sha512}
CHUNK = 4096

def make_hash(filename: str, hash_function: Callable) -> str:
    hf = hash_function()
    with open(filename, "rb") as data:
        while buffer := data.read(CHUNK):
            hf.update(buffer)
    return hf.hexdigest()

def main():
    filename = input("Enter filename: ")
    func = input(f"Enter hash type {tuple(HASH_MAP)}: ")
    if hfunc := HASH_MAP.get(func):
        print(make_hash(filename, hfunc))
    else:
        print("Invalid hash type selection")

if __name__ == "__main__":
    main()

If you subsequently want to add more hashing algorithms you just need to edit the HASH_MAP dictionary appropriately. No other code would need to change

Evoy answered 6/10, 2024 at 7:43 Comment(1)
@s3dev I should probably wait until I'm fully awake before coding. Thank you for pointing out my error. Code editedEvoy
D
1

You can give hashlib.file_digest the algorithm name as a string.

import hashlib

options = 'sha256', 'sha512'

# Choose algorithm
opts = ' or '.join(options)
alg = input(f"Which hash do you want to calculate: {opts}? \n")
if alg not in options:
    print(f"Type either {opts}. If you type anything else the program will close...like this.")
    sys.exit()

# Choose file and hash it
filename = input("Enter the file name: ")
with open(filename, "rb") as f:
    digest = hashlib.file_digest(f, alg)

print(f"{alg} value has been calculated")

Attempt This Online!

Dancer answered 6/10, 2024 at 21:8 Comment(1)
May I please know the downvote reason so I can address the issue?Dancer
O
0

This is what I came up with after studying your responses.

Because I am learning, I wanted to integrate aspects of each answer that was foreign to me.

As you will see I condensed the files from 5 to 3. I removed the global variables. Utilized the Enum module. And most pertinently, removed the repetition of similar blocks of code

Here is a link to the final product let me know what you think and/or where I can improve. Just found out how to post the whole block of code.

colors.py

class Color():
    PURPLE = '\033[95m'
    CYAN = '\033[96m'
    DARKCYAN = '\033[36m'
    BLUE = '\033[54m'
    GREEN = '\033[92m'
    YELLOW = '\033[93m'
    RED = '\033[91m'
    BOLD = '\033[91m'
    UNDERLINE = '\033[4m'
    END = '\033[0m'

def color_reset():
    print(Color.END)

simple_sha_find.py

"""Module providing definition for calculating, digesting, and verifying hash with checksum"""

from slim_sha import which_hash

if __name__ == "__main__":
    which_hash()

slim_sha.py

import sys

import hashlib

from enum import Enum

from colors import Color, color_reset

class HashType(Enum):
    SHA256 = 'sha256'
    SHA512 = 'sha512'

def get_hash(hash_type):
    if hash_type == HashType.SHA256:
        hash_obj = hashlib.sha256()
    elif hash_type == HashType.SHA512:
        hash_obj = hashlib.sha512()
    else:
        raise ValueError("Invalid hash type. Please choose 'sha256'or'sha512'")

    file_name = input("Enter the filename: ")
    try:
        with open(file_name,"rb") as f:
            for byte_block in iter(lambda: f.read(4096), b""):
                hash_obj.update(byte_block)
        print(Color.DARKCYAN + f"{hash_type} value has been calculated")
        color_reset()
        get_hash.hash_digested = hash_obj.hexdigest()
        return get_hash.hash_digested
    except FileNotFoundError:
        print(f"File '{file_name}")

def which_hash():
    sha_type_input = input("Which hash do you want to calculate? sha256 OR sha512?  \n")

    try:
        sha_type = HashType(sha_type_input)
        get_hash(sha_type)
        verify_checksum()
    except ValueError:
        print("Type " + Color.UNDERLINE + "sha256" +  Color.END + " or " + Color.UNDERLINE + "sha512")

def verify_checksum():
    """Function for comparing calcuated hash with hash provided by developer"""
    given_checksum = input("Enter Checksum Provided by Authorized Distrubutor or Developer... \n")
    print(Color.PURPLE + "You entered: " + given_checksum + Color.END)
    print("Calculated : " + Color.GREEN + get_hash.hash_digested)
    if given_checksum == get_hash.hash_digested:
        safe_results()
    else:
        bad_results()

def safe_results():
    safe_result = (Color.BOLD + Color.GREEN + "Checksum Verfied! File is OK.")
    print(safe_result)
    color_reset()
    sys.exit()
def bad_results():
    bad_result = (Color.BOLD + Color.RED + "WARNING!!! Checksum is NOT verified. Verify checksum entry with the checuksum source. Verify correct file or package. This is a potentially harmful file or package! Do not proceed! Notify developer or distributor if correct software is being checked and teh calculated checksum continues to not match checksum from developer or distributor.")
    print(bad_result)
    color_reset()
    sys.exit()
Occasionalism answered 8/10, 2024 at 20:3 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.