Quantization

Similarity Search, Part 2: Product Quantization
Learn a powerful technique to effectively compress large data
29776Murphy ≡ DeepGuide
Similarity Search, Part 3: Blending Inverted File Index and Product Quantization
In the first two parts of this series we have discussed two fundamental algorithms in information retrieval: inverted file index and...
25849Murphy ≡ DeepGuide
Introduction to Weight Quantization
Reducing the size of Large Language Models with 8-bit quantization
28860Murphy ≡ DeepGuide
Quantize Llama models with GGUF and llama.cpp
GGML vs. GPTQ vs. NF4
26489Murphy ≡ DeepGuide
Tensor Quantization: The Untold Story
A close look at the implementation details of quantization in machine learning frameworks
22061Murphy ≡ DeepGuide
ExLlamaV2: The Fastest Library to Run LLMs
Quantize and run EXL2 models
23119Murphy ≡ DeepGuide
Reducing the Size of AI Models
This introductory article gives an overview of different approaches to reduce model size. It introduces quantization as the most...
24291Murphy ≡ DeepGuide
Quantizing the Weights of AI Models
Reducing high-precision floating-point weights to low-precision integer weights
22100Murphy ≡ DeepGuide
Quantizing Neural Network Models
Understanding post-training quantization, quantization-aware training, and the straight through estimator
27727Murphy ≡ DeepGuide
Exploring "Small" Vision-Language Models with TinyGPT-V
TinyGPT-V is a "small" vision-language model that can run on a single GPU.
26212Murphy ≡ DeepGuide
Quantized Mistral 7B vs TinyLlama for Resource-Constrained Systems
Performance comparison between these models for accuracy and response time in a RAG question-answering setup.
25729Murphy ≡ DeepGuide
Improving LLM Inference Latency on CPUs with Model Quantization
Discover how to significantly improve inference latency on CPUs using quantization techniques for mixed, int8, and int4 precisions.
20876Murphy ≡ DeepGuide
Bit-LoRA as an application of BitNet and 1.58 bit neural network technologies
Abstract: applying ~1bit transformer technology to LoRA adapters allows us to reach comparable performance with full-precision LoRA...
24155Murphy ≡ DeepGuide

HyperLogLog implemented using

We look at an implementation of the HyperLogLog cardinality estimati

K-means Clustering: An Introdu

Using clustering algorithms such as K-means is one of the most popul

The 4 Small but Powerful Ways

Level up Your Data Game by Mastering These 4 Skills

Benchmarking Machine Learning

Learn how to create an object-oriented approach to compare and evalu

The smart, flexible way to run

When I was a beginner using Kubernetes, my main concern was getting

How To Forecast With Moving Av

Tutorial and theory on how to carry out forecasts with moving averag

Information related to Tags Quantization