huffman Algorithm

The Huffman Algorithm is a widely-used, lossless data compression technique that was developed by David A. Huffman in 1952. It is a variable-length, prefix coding algorithm that works by constructing an optimal binary tree using the frequency of each character in the input data. The basic idea behind Huffman coding is to assign shorter codes to more frequently occurring characters and longer codes to less frequent characters, thereby reducing the overall size of the data being encoded. This algorithm is particularly effective when applied to text, image, and audio files, where certain patterns or symbols occur more frequently than others. To create the optimal binary tree, the Huffman Algorithm begins by assigning each character or symbol in the input a leaf node with a weight equal to its frequency. It then constructs the tree from the bottom up by repeatedly selecting the two nodes with the lowest weights, combining them into a new node, and assigning the combined weight to this new node. This process continues until all nodes are part of a single tree. Once the tree is constructed, the unique binary code for each character can be determined by traversing the tree from the root to the leaf node, assigning a "0" for each left branch and a "1" for each right branch along the way. The resulting set of binary codes can be used to efficiently compress the original data without any loss of information, allowing for effective storage and transmission.
import sys


class Letter:
    def __init__(self, letter, freq):
        self.letter = letter
        self.freq = freq
        self.bitstring = ""

    def __repr__(self):
        return f"{self.letter}:{self.freq}"


class TreeNode:
    def __init__(self, freq, left, right):
        self.freq = freq
        self.left = left
        self.right = right


def parse_file(file_path):
    """
    Read the file and build a dict of all letters and their
    frequencies, then convert the dict into a list of Letters.
    """
    chars = {}
    with open(file_path) as f:
        while True:
            c = f.read(1)
            if not c:
                break
            chars[c] = chars[c] + 1 if c in chars.keys() else 1
    return sorted([Letter(c, f) for c, f in chars.items()], key=lambda l: l.freq)


def build_tree(letters):
    """
    Run through the list of Letters and build the min heap
    for the Huffman Tree.
    """
    while len(letters) > 1:
        left = letters.pop(0)
        right = letters.pop(0)
        total_freq = left.freq + right.freq
        node = TreeNode(total_freq, left, right)
        letters.append(node)
        letters.sort(key=lambda l: l.freq)
    return letters[0]


def traverse_tree(root, bitstring):
    """
    Recursively traverse the Huffman Tree to set each
    Letter's bitstring, and return the list of Letters
    """
    if type(root) is Letter:
        root.bitstring = bitstring
        return [root]
    letters = []
    letters += traverse_tree(root.left, bitstring + "0")
    letters += traverse_tree(root.right, bitstring + "1")
    return letters


def huffman(file_path):
    """
    Parse the file, build the tree, then run through the file
    again, using the list of Letters to find and print out the
    bitstring for each letter.
    """
    letters_list = parse_file(file_path)
    root = build_tree(letters_list)
    letters = traverse_tree(root, "")
    print(f"Huffman Coding of {file_path}: ")
    with open(file_path) as f:
        while True:
            c = f.read(1)
            if not c:
                break
            le = list(filter(lambda l: l.letter == c, letters))[0]
            print(le.bitstring, end=" ")
    print()


if __name__ == "__main__":
    # pass the file path to the huffman function
    huffman(sys.argv[1])

LANGUAGE:

DARK MODE: