AutoGPTQ 量化

90阅读 0评论2025-02-21 badb0y
分类:系统运维

安装
git clone https :/ / github.com/AutoGPTQ/AutoGPTQ
cd AutoGPTQ
pip install -vvv --no-build-isolation -e .

代码:


点击(此处)折叠或打开

  1. from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
  2. from transformers import AutoTokenizer

  3. # Specify paths and hyperparameters for quantization
  4. model_path = "/data/qwen3b/Qwen/Qwen2___5-3B-Instruct/"
  5. quant_path = "/data/qwen3b/Qwen/Qwen2___5-3B-Instruct-4bit-gptq/"
  6. quantize_config = BaseQuantizeConfig(
  7.     bits=4, # 4 or 8
  8.     group_size=128,
  9.     damp_percent=0.01,
  10.     desc_act=False, # set to False can significantly speed up inference but the perplexity may slightly bad
  11.     static_groups=False,
  12.     sym=True,
  13.     true_sequential=True,
  14.     model_name_or_path=None,
  15.     model_file_base_name="model"
  16. )
  17. max_len = 8192

  18. # Load your tokenizer and model with AutoGPTQ
  19. # To learn about loading model to multiple GPUs,
  20. # visit https://github.com/AutoGPTQ/AutoGPTQ/blob/main/docs/tutorial/02-Advanced-Model-Loading-and-Best-Practice.md
  21. tokenizer = AutoTokenizer.from_pretrained(model_path)
  22. model = AutoGPTQForCausalLM.from_pretrained(model_path, quantize_config)

  23. examples = [
  24.     tokenizer(
  25.         "Auto-GPTQ 是一个简单易用的模型量化库,基于 GPTQ 算法,具有用户友好的 API。"
  26.     )
  27. ]
  28. #model.quantize(examples)
  29. model.quantize(examples, cache_examples_on_gpu=False)
  30. model.save_quantized(quant_path, use_safetensors=True)
  31. tokenizer.save_pretrained(quant_path)





上一篇:ssh over socks5代理
下一篇:AutoAWQ 量化