🤖 Machine Learning / AI
Advanced
What is quantization in deep learning?
Answer
Quantization reduces the precision of model weights and/or activations from 32-bit floating point (FP32) to lower-precision formats (INT8, INT4, FP16). This reduces model size (4× for INT8 vs FP32) and speeds up inference significantly on hardware with integer arithmetic support. Post-Training Quantization (PTQ) quantizes a trained model with minimal calibration data. Quantization-Aware Training (QAT) simulates quantization during training, producing models that are more robust to precision reduction. GPTQ and bitsandbytes enable 4-bit quantization of LLMs for deployment on consumer GPUs.
Previous
What is contrastive learning and what is SimCLR?
Next
What is the concept of model interpretability and explainability?