Llama 3.1: A New AI Model from Meta


On July 23, 2024, Meta published the largest and most capable version of Llama, a huge language model, for free. Meta has not stated the cost of producing Llama 3.1, but Zuckerberg recently informed investors that his company is investing billions in AI development.

The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B, and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, and 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.

Mark Zuckerberg, founder and CEO of Meta, wrote the following regarding the Llama 3.1 release:

"Today, several tech companies are developing leading closed models. But open source is quickly closing the gap. Last year, Llama 2 was only comparable to an older generation of models behind the frontier. This year, Llama 3 is competitive with the most advanced models and leading in some areas. Starting next year, we expect future Llama models to become the most advanced in the industry. But even before that, Llama is already leading on openness, modifiability, and cost efficiency."

Model Architecture: Llama 3.1 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

Benchmark comparison of the Llama 3.1 405B with other leading models:

Meta llama 31Benchmark comparison of the Llama 3.1 8B and Llama 3.1 70B with other leading models:

Meta llama 31


The new Llama 3.1 family of models can now be used by developers via AWS, NVIDIA, Databricks, Groq, Dell, Azure, and Google Cloud. The Llama 3.1 405B is available via Azure AI's Models-as-a-Service as a serverless API endpoint. Also, the latest fine-tuned versions of Llama 3.1 8B and Llama 3.1 70B are now available on the Azure AI Model Catalog.

Intended Use Cases

Llama 3.1 is intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks. The Llama 3.1 model collection also supports the ability to leverage the outputs of its models to improve other models including synthetic data generation and distillation. The Llama 3.1 Community License allows for these use cases.

Use with transformers

Make sure to update your transformers installation via pip install --upgrade transformers.

import transformers
import torch

model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]

outputs = pipeline(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

Previous Post Next Post