I've been doing some research on compression-based text classification and I'm trying to figure out a way of storing a dictionary built by the encoder (on a training file) for use to run 'statically' on a test file? Is this at all possible using UNIX's gzip utility?
For example I have been using 2 'class' files of sport.txt and atheism.txt, hence I want to run compression on both of these files and store their dictionaries used. Next I want to take a test file (which is unlabelled, could be either atheism or sport) and by using the prebuilt dictionaries on this test.txt I can analyse how well it compresses under that dictionary/model.
Thanks