Huffman suffix-code
Asked Answered
W

2

9

I'm trying to efficiently construct a binary suffix code for a given set of characters with their probabilities (i.e. a set of words none of which is a suffix of any other).

My basic idea is to construct a prefix-code using an implementation of the Huffman algorithm. By reversing the code words I get a suffix-free code. While this solution is working, it might not seem optimal, because I have to reverse variable-length code words (thus I need a lookup table combined with bit-shifts).

Is there any way to modify the Huffman algorithm in order to create a suffix-code more efficiently?

Waxwork answered 7/2, 2017 at 9:35 Comment(3)
Why is that a problem? The reversing only happens once anyway, doesn't it? In fact it doesn't need to be explicit, just build the codes in reverse when you convert the tree to a lookup table.Bagby
@harold You are right, the reversing happens only once. And I could of course reverse the codes when building the lookup table. I was just curious if there is any way to do the reversing while constructing the tree. Simply for optimization.Waxwork
It's the same tree. Only the interpretation is differentBagby
F
1

I would implement the HuffmanNode as

class HuffmanNode implements Comparable<HuffmanNode>
{
    // data
    private String text;
    private double frequency;

    // linkage
    private HuffmanNode left;
    private HuffmanNode right;
    private HuffmanNode parent;

    public HuffmanNode(String text, double frequency)
    {
        this.text = text;
        this.frequency = frequency;
    }
    public HuffmanNode(HuffmanNode n0, HuffmanNode n1)
    {
        if(n0.frequency < n1.frequency)
        {
            left = n0;
            right = n1;
        }else if(n0.frequency > n1.frequency)
        {
            left = n1;
            right = n0;
        }else
        {
            if(n0.text.compareTo(n1.text) < 0)
            {
                left = n0;
               right = n1;
            }else
            {
                left = n1;
                right = n0;
            }
        }
        left.parent = this;
        right.parent = this;
        text = left.text + right.text;
        frequency = left.frequency + right.frequency;
    }

    public HuffmanNode getParent() {
        return parent;
    }

    public HuffmanNode getLeft() {
       return left;
    }

    public HuffmanNode getRight() {
        return right;
    }

    public String getText()
    {
        return text;
    }

    @Override
    public int compareTo(HuffmanNode o) {
        if(frequency < o.frequency)
            return -1;
        else if(frequency > o.frequency)
            return 1;
        else
            return text.compareTo(o.text);
    }

    public Collection<HuffmanNode> leaves()
    {
        if(left == null && right == null)
        {
            Set<HuffmanNode> retval = new HashSet<>();
            retval.add(this);
            return retval;
        }
        else if(left == null || right == null)
        {
            Set<HuffmanNode> retval = new HashSet<>();
            if(left != null)
                retval.addAll(left.leaves());
            if(right != null)
                retval.addAll(right.leaves());
            retval.add(this);
            return retval;
        }
        else
        {
            Set<HuffmanNode> retval = new HashSet<>();
            retval.addAll(left.leaves());
            retval.addAll(right.leaves());
            return retval;
        }
    }

    public String toString()
    {
         return "{" + text + " -> " + frequency + "}";
    }
}

This class represents a single node in a Huffman tree.
It has convenience methods for getting all the leaves from a (sub)tree.

You can then easily build the tree:

private Map<String,String> buildTree(String text)
{
    List<HuffmanNode> nodes = new ArrayList<>();
    for(Map.Entry<String,Double> en : frequency(text).entrySet())
    {
        nodes.add(new HuffmanNode(en.getKey(), en.getValue()));
    }
    java.util.Collections.sort(nodes);
    while(nodes.size() != 1)
    {
        HuffmanNode n0 = nodes.get(0);
        HuffmanNode n1 = nodes.get(1);

        // build merged node
        HuffmanNode newNode = new HuffmanNode(nodes.get(0), nodes.get(1));
        nodes.remove(n0);
        nodes.remove(n1);

        // calculate insertion point
        int insertionPoint = - java.util.Collections.binarySearch(nodes, newNode) - 1;

        // insert
        nodes.add(insertionPoint, newNode);
    }

    // build lookup table
    Map<String, String> lookupTable = new HashMap<>();
    for(HuffmanNode leaf : nodes.iterator().next().leaves())
    {
        String code = "";
        HuffmanNode tmp = leaf;
        while(tmp.getParent() != null)
        {
            if(tmp.getParent().getLeft() == tmp)
                code = "0" + code;
            else
                code = "1" + code;
            tmp = tmp.getParent();
        }
        lookupTable.put(leaf.getText(), code);
    }
    return lookupTable;
}

By changing the method that builds the code (for instance pre-pending the next digit rather than appending it) you can change the codes being produced.

Fortress answered 29/8, 2017 at 11:54 Comment(0)
T
0

I made my Huffman coding tree deploying C++ shown as below:

enter image description here

I created three class for this purpose - HuffmanTree, BinTree and BinNode.

You can see more details on my GitHub:https://github.com/MouChiaHung/DataStructures

Check these three files: bin_node.h, bin_tree.h and huffman_tree.h. They read source file "source", encode to file "encode" in the Huffman way, then decode the file "encode" and store results to output file "decode". Besides, Huffman table is recorded in file "table".

One of core functions is HuffmanTree::encode() which read characters from source file.

template<typename T> void amo::HuffmanTree<T>::grow(std::list<Model*>& list) { //ascendantly sorted list
Model* l;
Model* r;
Model* m;
BinNode<T>* lchild;
BinNode<T>* rchild;
BinNode<T>* vertex;
std::list<Model*>::iterator it = list.begin();
std::vector<BinNode<T>*> subs; //roots of sub-trees
typename std::vector<BinNode<T>*>::iterator it_subs = subs.begin();
int i = 0;
while (it!=list.end()) {
    lchild = NULL;
    rchild = NULL;
    vertex = NULL;
    cout << YELLOW << "while-loop:" << ++i << WHITE << endl;
    if (std::next(it,1) == list.end()) { //met the last and single leaf or sub-tree 
        if (subs.size() > 1) {
            cout << RED << "size of sub-tree is more than 1:" << subs.size() << WHITE << endl;
            this->_root = subs.back();
            subs.pop_back();
            break;
        }
        else if (subs.size() == 1){ 
            if (**it == subs.back()->data) { //met the last sub-tree 
                cout << GREEN << "going to attach the last sub-tree" << WHITE << endl;
                vertex = subs.back();
                subs.pop_back();
            } 
            else { //met the last leaf 
                cout << GREEN << "going to attach the last leaf" << WHITE << endl;
                r = *it;
                lchild = subs.back();
                subs.pop_back();
                cout << CYAN << "lchild points to the root of the last sub-tree:" << *lchild;
                rchild = new BinNode<T>(*r);
                cout << CYAN << "rchild points to a new node:" << *rchild;
                m = new Model(CHAR_VERTEX, (lchild->data.prob)+(r->prob));
                vertex = new BinNode<T>(*m);
                lchild->parent = vertex;
                rchild->parent = vertex;
                vertex->lchild = lchild;
                vertex->rchild = rchild;
            }   
            this->_root = vertex;
            cout << CYAN << "root:" << *this->_root <<  WHITE << endl;
            break;
        }
        else {
            cout << RED << "size of sub-tree is less than 1:" << subs.size() << WHITE << endl;
            this->_root = subs.back();
            subs.pop_back();
            break;
        }
    }
    else {
        l = *it;
        it++;
        r = *it;
        m = new Model(CHAR_VERTEX, l->prob+r->prob);        

        for (it_subs=subs.begin(); it_subs!=subs.end(); it_subs++) { //set lchild if any sub-tree corresponds with this l model iterated currently 
            if (*l == (*it_subs)->data) {
                cout << CYAN << "lchild points to the root of sub-tree:" << **it_subs;
                lchild = *it_subs;
                --(it_subs = subs.erase(it_subs));
            }
            if (lchild != NULL) break; //tricky but important
        }
        for (it_subs=subs.begin(); it_subs!=subs.end(); it_subs++) { //set rchild if any sub-tree corresponds with this r model iterated currently 
            if (*r == (*it_subs)->data) {
                cout << CYAN << "rchild points to the root of sub-tree:" << **it_subs;
                rchild = *it_subs;
                --(it_subs = subs.erase(it_subs));
            }
            if (rchild != NULL) break; //tricky but important
        }
        if (lchild == NULL) { //set lchild with a new node if no any sub-tree corresponds with this l model iterated currently, which means meeting a row leaf 
            lchild = new BinNode<T>(*l);
            cout << CYAN << "lchild points to a new node:" << *lchild;
        }
        if (rchild == NULL) { //set rchild with a new node if no any sub-tree corresponds with this r model iterated currently, which means meeting a row leaf
            rchild = new BinNode<T>(*r);
            cout << CYAN << "rchild points to a new node:" << *rchild;
        }

        vertex = new BinNode<T>(*m);
        std::cout << GREEN << "growing..." << WHITE << endl;
        std::cout << CYAN << "lchild" << *lchild << WHITE;
        std::cout << CYAN << "rchild" << *rchild << WHITE;
        std::cout << CYAN << "vertex" << *vertex << WHITE;
        lchild->parent = vertex;
        rchild->parent = vertex;
        vertex->lchild = lchild;
        vertex->rchild = rchild;
        subs.push_back(vertex);
        for (std::list<Model*>::iterator itt=it;itt!=list.end();itt++) {
            if ((*m < **itt) || (*m == **itt)) {
                list.insert(itt, m);
                break;
            }
            else if (std::next(itt,1) == list.end()) {
                list.push_back(m);
                break;
            }
        }
        it++;
    }
}

this->updateHeightAll();
cout << GREEN << "-*-*-*-*-*-*-*-* Huffman tree top -*-*-*-*-*-*-*-*" << WHITE << endl;
this->traverseLevel();
cout << GREEN << "-*-*-*-*-*-*-*-* Huffman tree bottom -*-*-*-*-*-*-*-*" << WHITE << endl;

subs.clear();}

Another core function is Huffman::grow() which makes a binary tree for PFC coding.

template<typename T> void amo::HuffmanTree<T>::grow(std::list<Model*>& list) { //ascendantly sorted list
Model* l;
Model* r;
Model* m;
BinNode<T>* lchild;
BinNode<T>* rchild;
BinNode<T>* vertex;
std::list<Model*>::iterator it = list.begin();
std::vector<BinNode<T>*> subs; //roots of sub-trees
typename std::vector<BinNode<T>*>::iterator it_subs = subs.begin();
int i = 0;
while (it!=list.end()) {
    lchild = NULL;
    rchild = NULL;
    vertex = NULL;
    cout << YELLOW << "while-loop:" << ++i << WHITE << endl;
    if (std::next(it,1) == list.end()) { //met the last and single leaf or sub-tree 
        if (subs.size() > 1) {
            cout << RED << "size of sub-tree is more than 1:" << subs.size() << WHITE << endl;
            this->_root = subs.back();
            subs.pop_back();
            break;
        }
        else if (subs.size() == 1){ 
            if (**it == subs.back()->data) { //met the last sub-tree 
                cout << GREEN << "going to attach the last sub-tree" << WHITE << endl;
                vertex = subs.back();
                subs.pop_back();
            } 
            else { //met the last leaf 
                cout << GREEN << "going to attach the last leaf" << WHITE << endl;
                r = *it;
                lchild = subs.back();
                subs.pop_back();
                cout << CYAN << "lchild points to the root of the last sub-tree:" << *lchild;
                rchild = new BinNode<T>(*r);
                cout << CYAN << "rchild points to a new node:" << *rchild;
                m = new Model(CHAR_VERTEX, (lchild->data.prob)+(r->prob));
                vertex = new BinNode<T>(*m);
                lchild->parent = vertex;
                rchild->parent = vertex;
                vertex->lchild = lchild;
                vertex->rchild = rchild;
            }   
            this->_root = vertex;
            cout << CYAN << "root:" << *this->_root <<  WHITE << endl;
            break;
        }
        else {
            cout << RED << "size of sub-tree is less than 1:" << subs.size() << WHITE << endl;
            this->_root = subs.back();
            subs.pop_back();
            break;
        }
    }
    else {
        l = *it;
        it++;
        r = *it;
        m = new Model(CHAR_VERTEX, l->prob+r->prob);        

        for (it_subs=subs.begin(); it_subs!=subs.end(); it_subs++) { //set lchild if any sub-tree corresponds with this l model iterated currently 
            if (*l == (*it_subs)->data) {
                cout << CYAN << "lchild points to the root of sub-tree:" << **it_subs;
                lchild = *it_subs;
                --(it_subs = subs.erase(it_subs));
            }
            if (lchild != NULL) break; //tricky but important
        }
        for (it_subs=subs.begin(); it_subs!=subs.end(); it_subs++) { //set rchild if any sub-tree corresponds with this r model iterated currently 
            if (*r == (*it_subs)->data) {
                cout << CYAN << "rchild points to the root of sub-tree:" << **it_subs;
                rchild = *it_subs;
                --(it_subs = subs.erase(it_subs));
            }
            if (rchild != NULL) break; //tricky but important
        }
        if (lchild == NULL) { //set lchild with a new node if no any sub-tree corresponds with this l model iterated currently, which means meeting a row leaf 
            lchild = new BinNode<T>(*l);
            cout << CYAN << "lchild points to a new node:" << *lchild;
        }
        if (rchild == NULL) { //set rchild with a new node if no any sub-tree corresponds with this r model iterated currently, which means meeting a row leaf
            rchild = new BinNode<T>(*r);
            cout << CYAN << "rchild points to a new node:" << *rchild;
        }

        vertex = new BinNode<T>(*m);
        std::cout << GREEN << "growing..." << WHITE << endl;
        std::cout << CYAN << "lchild" << *lchild << WHITE;
        std::cout << CYAN << "rchild" << *rchild << WHITE;
        std::cout << CYAN << "vertex" << *vertex << WHITE;
        lchild->parent = vertex;
        rchild->parent = vertex;
        vertex->lchild = lchild;
        vertex->rchild = rchild;
        subs.push_back(vertex);
        for (std::list<Model*>::iterator itt=it;itt!=list.end();itt++) {
            if ((*m < **itt) || (*m == **itt)) {
                list.insert(itt, m);
                break;
            }
            else if (std::next(itt,1) == list.end()) {
                list.push_back(m);
                break;
            }
        }
        it++;
    }
}

this->updateHeightAll();
cout << GREEN << "-*-*-*-*-*-*-*-* Huffman tree top -*-*-*-*-*-*-*-*" << WHITE << endl;
this->traverseLevel();
cout << GREEN << "-*-*-*-*-*-*-*-* Huffman tree bottom -*-*-*-*-*-*-*-*" << WHITE << endl;

subs.clear();}

And Huffman::generate() creates table for coding content.

template<typename T> void amo::HuffmanTree<T>::generate() {
std::string code = "";
std::queue<BinNode<T>*> queue;
BinNode<T>* node = this->_root;
BinNode<T>* tmp;
queue.push(node);
int i = 0;
while (true) {
    if (queue.empty()) break;
    node = queue.front();
    queue.pop();
    cout << YELLOW << "while-loop:" << ++i << ", node:" << *node << WHITE << endl;

    if (node->data.c == CHAR_VERTEX) {
        //do nothing
    } 
    else {
        if (node->isLeaf()) code = "";
        tmp = node;
        while (tmp!=NULL) {
            if (tmp->isLeftChild()) code.insert(0, "0");
            else if (tmp->isRightChild()) code.insert(0, "1");
            tmp = tmp->parent;
        }
        if (node->data.c != CHAR_VERTEX) codes[node->data.c] = code;
    }

    if (node->hasLeftChild()) queue.push(node->lchild);
    if (node->hasRightChild()) queue.push(node->rchild);
}

for (std::map<char,string>::iterator it=codes.begin();it!=codes.end();it++) {
    cout << YELLOW << "codes[" << distance(codes.begin(),it) << "]:" << " key:" << it->first << " => value:" << it->second << WHITE << endl; 
}}

Thanks, any suggestion is welcome.

Tacklind answered 21/5, 2019 at 10:26 Comment(1)
Sure, my point of view focused on efficiently creating codes based on Huffman algorithm. Excuse me if my code was't constructive for you.Tacklind

© 2022 - 2024 — McMap. All rights reserved.