I know you mentioned huggingface is unnecessary in your case but to download and use the model, it's much easier to use their transformers.
After you download the weights - you need to re-structure the folder as follows:(notice I moved 3 of the files under 7B)
├── 7B
│ ├── checklist.chk
│ ├── consolidated.00.pth
│ └── params.json
├── config.json
├── generation_config.json
├── LICENSE
├── tokenizer_checklist.chk
├── tokenizer.model
└── USE_POLICY.md
Next download the conversion script from here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py
And finally run this script:
python convert_llama_weights_to_hf.py --input_dir llama-2-7b/ --model_size 7B --output_dir model
Once it's finished - you can import the model as follows:
from transformers import LlamaForCausalLM, LlamaTokenizer
tokenizer = LlamaTokenizer.from_pretrained("./model")
model = LlamaForCausalLM.from_pretrained("./model")
You can then learn more on how to prompt the model here:
https://huggingface.co/docs/transformers/v4.31.0/en/model_doc/llama2#transformers.LlamaForCausalLM.forward.example