当前位置：首页 > AI > 正文内容

大模型微调实战完全指南：从入门到精通

廖万里3个月前 (03-20)AI10

大模型微调是让通用大语言模型具备专业领域能力的核心技术。本文将系统讲解全量微调、LoRA、QLoRA 等主流微调方法，并提供完整的代码实战案例，帮助你从零开始掌握大模型微调技术。

一、什么是大模型微调

大模型微调（Fine-tuning）是指在预训练大语言模型的基础上，使用特定领域或任务的数据集进行进一步训练，使模型适应特定场景需求的技术。通过微调，我们可以让通用大模型变成专业的法律顾问、医疗助手、代码专家或客服机器人。

与从头训练相比，微调具有以下优势：

成本低：无需从头训练，大幅降低计算资源需求
速度快：训练时间从数周缩短到数小时甚至数分钟
效果好：继承预训练知识，在小数据集上也能获得优秀表现
灵活性高：可根据业务需求定制模型能力

二、主流微调方法对比

2.1 全量微调（Full Fine-tuning）

全量微调是指更新模型的所有参数。这是最直接的方法，但需要大量显存和计算资源。

优点：效果最好，能充分发挥模型潜力
缺点：资源消耗大，容易过拟合，灾难性遗忘风险高

2.2 LoRA 微调

LoRA（Low-Rank Adaptation）是一种参数高效的微调方法。它通过在模型的注意力层添加低秩矩阵来实现适配，只训练这些新增的少量参数。

核心原理：假设模型参数更新的变化量是低秩的，可以用两个小矩阵的乘积来近似表示。

# LoRA 核心公式
# W = W0 + BA，其中 W0 是原始权重（冻结），B 和 A 是可训练的低秩矩阵
import torch
import torch.nn as nn

class LoRALayer(nn.Module):
    def __init__(self, in_features, out_features, rank=8, alpha=16):
        super().__init__()
        self.lora_A = nn.Parameter(torch.zeros(rank, in_features))
        self.lora_B = nn.Parameter(torch.zeros(out_features, rank))
        self.scaling = alpha / rank
        nn.init.kaiming_uniform_(self.lora_A, a=5**0.5)
        nn.init.zeros_(self.lora_B)
    
    def forward(self, x):
        return (x @ self.lora_A.T @ self.lora_B.T) * self.scaling

优点：显存占用降低 3 倍以上，训练速度快，不改变原模型
缺点：在某些复杂任务上效果略逊于全量微调

2.3 QLoRA 微调

QLoRA（Quantized LoRA）在 LoRA 基础上引入了 4-bit 量化技术，进一步降低显存需求。它让单张 24GB 显存的显卡也能微调 70B 参数的大模型。

核心创新：4-bit NormalFloat 量化、双重量化、分页优化器。

2.4 方法对比

全量微调：显存极高、速度慢、效果最佳，适合资源充足场景
LoRA：显存中等、速度快、效果优秀，适合大多数场景
QLoRA：显存低、速度较快、效果良好，适合消费级显卡

三、环境准备

3.1 硬件要求

最低配置：NVIDIA RTX 3060 12GB（适合 7B 模型 QLoRA 微调）
推荐配置：NVIDIA RTX 4090 24GB（支持 13B 模型 LoRA 微调）
专业配置：A100/H100（支持更大模型的全量微调）

3.2 软件环境

# 创建 Python 虚拟环境
conda create -n llm-finetune python=3.10 -y
conda activate llm-finetune

# 安装依赖
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install transformers>=4.36.0 peft>=0.7.0 bitsandbytes>=0.41.0
pip install accelerate>=0.25.0 datasets>=2.15.0 scipy einops evaluate trl

四、LoRA 微调实战

4.1 加载预训练模型

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen2.5-7B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

total_params = sum(p.numel() for p in model.parameters())
print(f"模型总参数量: {total_params / 1e9:.2f}B")

4.2 配置 LoRA

from peft import LoraConfig, get_peft_model, TaskType

lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    bias="none",
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# 输出: trainable params: 21,086,208 || all params: 7,621,082,112 || trainable%: 0.27%

4.3 准备训练数据

from datasets import Dataset
import json

train_data = [
    {"instruction": "什么是机器学习？", "input": "", "output": "机器学习是人工智能的一个分支，它使计算机系统能够从数据中学习并改进。"},
    {"instruction": "解释一下什么是过拟合？", "input": "", "output": "过拟合是指模型在训练数据上表现很好，但在新数据上表现差的现象。"},
]

dataset = Dataset.from_list(train_data)

def preprocess_function(examples):
    texts = []
    for i in range(len(examples[instruction])):
        text = f"### 指令:
{examples[instruction][i]}

### 回答:
{examples[output][i]}"
        texts.append(text)
    
    model_inputs = tokenizer(texts, max_length=512, truncation=True, padding="max_length")
    model_inputs["labels"] = model_inputs["input_ids"].copy()
    return model_inputs

tokenized_dataset = dataset.map(preprocess_function, batched=True, remove_columns=dataset.column_names)

4.4 开始训练

from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    output_dir="./lora_output",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    fp16=True,
    logging_steps=10,
    save_steps=100,
    gradient_checkpointing=True,
    optim="adamw_torch",
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    tokenizer=tokenizer,
)

trainer.train()
model.save_pretrained("./lora_output")
tokenizer.save_pretrained("./lora_output")

4.5 模型推理

from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-7B-Instruct",
    torch_dtype=torch.float16,
    device_map="auto"
)
model = PeftModel.from_pretrained(base_model, "./lora_output")

def generate_response(instruction):
    prompt = f"### 指令:
{instruction}

### 回答:
"
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    with torch.no_grad():
        outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7, top_p=0.9)
    
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response.split("### 回答:")[-1].strip()

result = generate_response("什么是机器学习？")
print(result)

五、QLoRA 微调实战

5.1 4-bit 量化加载模型

from transformers import BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-7B-Instruct",
    quantization_config=bnb_config,
    device_map="auto"
)

print(f"模型显存占用: {torch.cuda.memory_allocated() / 1024**3:.2f} GB")

5.2 配置 QLoRA 并训练

from peft import prepare_model_for_kbit_training
from trl import SFTTrainer

model = prepare_model_for_kbit_training(model)

qlora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=64,
    lora_alpha=16,
    lora_dropout=0.05,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
)

model = get_peft_model(model, qlora_config)

training_args = TrainingArguments(
    output_dir="./qlora_output",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    fp16=True,
    gradient_checkpointing=True,
    optim="paged_adamw_8bit",
)

trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
    tokenizer=tokenizer,
    max_seq_length=512,
)

trainer.train()