LLaMA 2: Theory and Hands-On

Vishal Anand

6 min readAug 17, 2023

LLaMA, Large Language Model Meta AI.

Surpassing benchmarks, partnership with key players, signatures of c-suites, powerful, enterprise-class, open-source…

Let me start - straight to the point.

Section 1: Theory

Key announcements:

Q3 2023, the availability of Llama 2, the next generation of open-source large language model from Meta was announced
Q3 2023, IBM announced its plans to make Llama 2 available within its watsonx.AI and Data platform
Q3 2023, Microsoft expanded its AI partnership with Llama 2 on Azure and Windows

Key observations:

Architecture:

LLaMA uses the transformer architecture, the standard architecture for language modelling, with minor architectural differences compared to a few others: it uses SwiGLU, activation function instead of ReLU, uses rotary positional embeddings instead of absolute positional embedding, uses root-mean-squared layer-normalization instead of standard layer normalization, increases context length tokens to 4K tokens.

Llama 2 is open source, free for research and commercial use
With each model download you receive: Model code, Model Weights, README (User Guide), Responsible Use Guide, License, Acceptable Use Policy, Model Card (pretty cool)
Llama 2 models are trained on 2 trillion tokens and have double the context length of Llama 1. Llama-2-chat models have additionally been trained on over 1 million new human annotations
Model size parameter: 7B, 13 B and 70 B
Pre-training tokens: 2 Trillion
Context length: 4096
Benchmarks:

Llama 2 outperforms other open-source language models on many external benchmarks, including reasoning, coding, proficiency, and knowledge tests.

Safety and helpfulness:

Llama-2-chat uses RLHF, reinforcement learning from human feedback to ensure safety and helpfulness.

Training:

Llama 2 is pretrained using publicly available online data. An initial version of Llama-2-chat is then created through the use of supervised fine-tuning. Also, Llama-2-chat is iteratively refined using Reinforcement Learning from Human Feedback (RLHF), which includes rejection sampling and proximal policy optimization (PPO).

Responsible use:

Like all LLMs, Llama 2 is a new technology that carries potential risks with use. Testing conducted to date has not — and could not — cover all scenarios. In order to help developers address these risks, there has been created the Responsible Use Guide. With more details in the Research paperand Model card.

Sustainability:

Carbon Footprint: Pretraining utilized a cumulative 3.3M GPU hours of computation on hardware of type A100–80GB (TDP of 350–400W). Estimated total emissions were 539 tCO2eq, 100% of which were offset by Meta’s sustainability program.

CO2 emissions during pretraining, Time: total GPU time required for training each model. Power Consumption: peak power capacity per GPU device for the GPUs used adjusted for power usage efficiency. 100% of the emissions are directly offset by Meta’s sustainability program, and because they are openly releasing these models, the pretraining costs do not need to be incurred by others.

Data Freshness:

The pretraining data has a cut-off of September 2022, but some tuning data is more recent, up to July 2023.

Ethical Considerations and Limitations:

Llama 2 is a new technology that carries risks with use. Testing conducted to date has been in English, and has not covered, nor could it cover all scenarios. For these reasons, as with all LLMs, Llama2’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama2, developers should perform safety testing and tuning tailored to their specific applications of the model.

Models available:

Model 7B: Llama2, Llama2-hf, Llama2-chat, Llama2-chat-hf

Model 13B: Llama2, Llama2-hf, Llama2-chat, Llama2-chat-hf

Model 70B: Llama2, Llama2-hf, Llama2-chat, Llama2-chat-hf

You can download these models from ai.meta.com and also from huggingface.co. Regardless, you need to accept the terms and conditions on meta link before it is granted on huggingface.

Section 2: Hands-on

As always, I believe in what I see, touch, feel, hear, smell — down to dirtying my hands for real goodness and wisdom in technology space to influence business outcomes.

So, I decided to deploy my own instances of two LLama 2 models (7 B and 13 B) on my own virtual machine. I deployed a general purpose VM with Windows 2022 with 8 vCPUs, 32 GB RAM with primary OS disk and additional disk of 200 GB (LLama 2 13 B model required disk size of 109 GB).

On purpose, I selected two different Chatbots for two different deployment models.

Gradio webui chatbot for LLama 2 7B model and Text-generation webui chatbot for LLama 2 13 B model.

→ LLama 2 7B model with Gradio chatbot.

Softwares installed: Git, Python, Visual Studio.

After various commands’ execution (and troubleshooting), the final command was to run the webui loaded with the model as shown below.