The Llama2 7B model on huggingface (meta-llama/Llama-2-7b) has a pytorch .pth file consolidated.00.pth that is ~13.5GB in size. The hugging face transformers compatible model meta-llama/Llama-2-7b-hf has three pytorch model files that are together ~27GB in size and two safetensors file that are together around 13.5Gb.
Could someone please explain the reason for the big difference in file sizes?
I could not find an explanation in the huggingface model cards or in their blog Llama 2 is here - get it on Hugging Face.
Update: When the models are downloaded to huggingface cache, I noticed that only the safetensors are downloaded and not the Pytorch binary model files. This avoids downloading both the safetensors and pytorch model files.