Parameter-Efficient Fine-Tuning (PEFT) is a game-changer when it comes to adapting large language models to your specific domain without the headache of full model fine-tuning. It’s especially popular when training compute-heavy models like LLaMA, Falcon, or Mistral on tasks like legal reasoning, healthcare Q&A, or document extraction.
In this blog, we’ll focus on:
- What exactly is PEFT?
- Why it works
- What are the most important hyperparameters
- What changing each hyperparameter actually does
- Code examples
What is PEFT?
PEFT methods train a small number of parameters added to a pretrained model — keeping the base frozen. You get:
- Minimal GPU memory usage
- Fast training
- High performance on domain-specific tasks
Popular PEFT methods:
- LoRA (Low-Rank Adaptation)
- Prefix Tuning
- Adapters
- IA3 (Input/Output-Aware Adaptation)
We’ll focus on LoRA, the most widely used.
Key Components of LoRA
LoRA works by injecting trainable low-rank matrices into linear layers of a transformer. Instead of updating the full matrix W, it learns W + A @ B, where A and B are smaller trainable matrices.
This saves memory and allows faster updates. You can fine-tune models like LLaMA-7B on a single 16GB GPU with LoRA.
Important LoRA Hyperparameters :
Hyperparameter | Description | Effect of Increasing |
---|---|---|
r | LoRA rank (dimensionality of low-rank adapters) | More capacity, more memory usage |
lora_alpha | Scaling factor for LoRA weights | Affects learning dynamics; higher = stronger signal |
target_modules | Specific layers to inject LoRA into (e.g., q_proj, v_proj) | More modules = more learnability |
lora_dropout | Dropout between A and B during training | Helps regularization, especially on small datasets |
bias | Whether to train bias weights (none, all, or lora_only) | none = fewer params, all = more flexibility |
task_type | CAUSAL_LM or SEQ_CLS, affects which head is used for adaptation | Set appropriately for generation/classification |
Key TrainingArguments Hyperparameters :
Parameter | Description | Tip |
---|---|---|
num_train_epochs | Total number of epochs | Use 3–5 for small to medium datasets |
per_device_train_batch_size | Batch size per GPU | Lower = less memory needed; increase if you use accumulation |
gradient_accumulation_steps | Number of steps to accumulate before optimizer step | Use higher if batch size is low |
fp16 | Enables 16-bit precision | Great for memory efficiency (when using supported hardware) |
learning_rate | Initial learning rate | Use 2e-4 to 3e-5 for LoRA; lower = more stable training |
logging_steps | Frequency of logging | Use 10–50 for closer tracking |
save_steps | Frequency of saving model checkpoints | Use 100+ to save periodically |
eval_strategy | When to run evaluation (steps, epoch) | Use steps for tight monitoring |
eval_steps | Evaluation interval in steps | Should match save_steps or be smaller for frequent validation |
report_to | Where to log metrics (wandb, tensorboard, etc.) | Use wandb for collaborative tracking |
run_name | Naming the experiment run | Useful for organizing runs in experiment tracking tools |
Code Example:
from transformers import TrainingArguments
training_args = TrainingArguments(
output_dir="./deepseek_finetuned",
num_train_epochs=3,
per_device_train_batch_size=1,
gradient_accumulation_steps=16,
fp16=True,
logging_steps=10,
save_steps=100,
eval_strategy="steps",
eval_steps=100,
learning_rate=3e-5,
logging_dir="./logs",
report_to="wandb",
run_name="DeepSeek_FineTuning_Experiment",
)
Code Example:
from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from datasets import load_dataset
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
model = prepare_model_for_kbit_training(model)
config = LoraConfig(
r=8,
lora_alpha=16,
lora_dropout=0.1,
bias="none",
target_modules=["q_proj", "v_proj"],
task_type="CAUSAL_LM"
)
model = get_peft_model(model, config)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")
dataset = load_dataset("text", data_files={"train": "law_qa.txt"})
tokenized = dataset.map(lambda x: tokenizer(x["text"], padding=True, truncation=True), batched=True)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized["train"]
)
trainer.train()
Tips for Hyperparameter Tuning
- Start small with r=4 and increase only if performance plateaus.
- Use q_proj and v_proj as your initial target_modules. Add k_proj or o_proj if needed.
- If overfitting, increase lora_dropout to 0.2–0.3.
- Use lora_alpha values between 8–32 depending on the dataset size.
- If training fails to converge, lower the learning rate to 1e-4 or 5e-5.
- If using very small batches (e.g., batch_size=1), increase gradient_accumulation_steps to maintain effective batch size.
- Use fp16=True to reduce memory usage when supported by your hardware.
- Use logging_steps = 10–50 to monitor training progress without slowing it down.
- Set save_steps and eval_steps to 100–200 for regular checkpoints and validation.
- If training is unstable, try a lower learning_rate and reduce lora_alpha slightly. and increase only if performance plateaus.
- Use q_proj and v_proj as your initial target_modules. Add k_proj or o_proj if needed.
- If overfitting, increase lora_dropout to 0.2–0.3.
- Use lora_alpha values between 8–32 depending on the dataset size.
- If training fails to converge, lower the learning rate to 1e-4 or 5e-5.
Conclusion
PEFT, especially with LoRA, gives you production-grade model fine-tuning with minimal overhead. Instead of spending days on full model updates, you can run fine-tuning jobs in hours with excellent results.
And remember: fine-tuning is just the start. With PEFT adapters, you can swap tasks dynamically, experiment faster, and deploy smarter.