-
MoE-Pruner
Pruning MoE LLMs
-
AWQ
Activation-aware Weight Quantization
-
Wanda
Pruning by Weights and Activations
-
Distilling Step-by-Step!
Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
-
SparseGPT
Making Smaller LLMs in One-shot
-
GPTQ
Accurate Post-Training Quantization for Generative Pre-Trained Transformers
-
Attention Head Pruning
Pruning the Attention Heads in Layer-wise Way to Make LLMs Smaller