Similarity Search, Part 2: Product Quantization
Learn a powerful technique to effectively compress large data- 29707Murphy ≡ DeepGuide
Similarity Search, Part 3: Blending Inverted File Index and Product Quantization
In the first two parts of this series we have discussed two fundamental algorithms in information retrieval: inverted file index and...- 25786Murphy ≡ DeepGuide
Introduction to Weight Quantization
Reducing the size of Large Language Models with 8-bit quantization- 28793Murphy ≡ DeepGuide
Quantize Llama models with GGUF and llama.cpp
GGML vs. GPTQ vs. NF4- 26424Murphy ≡ DeepGuide
Tensor Quantization: The Untold Story
A close look at the implementation details of quantization in machine learning frameworks- 22000Murphy ≡ DeepGuide
ExLlamaV2: The Fastest Library to Run LLMs
Quantize and run EXL2 models- 23052Murphy ≡ DeepGuide
Reducing the Size of AI Models
This introductory article gives an overview of different approaches to reduce model size. It introduces quantization as the most...- 24232Murphy ≡ DeepGuide
Quantizing the Weights of AI Models
Reducing high-precision floating-point weights to low-precision integer weights- 22038Murphy ≡ DeepGuide
Quantizing Neural Network Models
Understanding post-training quantization, quantization-aware training, and the straight through estimator- 27669Murphy ≡ DeepGuide
Exploring "Small" Vision-Language Models with TinyGPT-V
TinyGPT-V is a "small" vision-language model that can run on a single GPU.- 26147Murphy ≡ DeepGuide
Quantized Mistral 7B vs TinyLlama for Resource-Constrained Systems
Performance comparison between these models for accuracy and response time in a RAG question-answering setup.- 25671Murphy ≡ DeepGuide
Improving LLM Inference Latency on CPUs with Model Quantization
Discover how to significantly improve inference latency on CPUs using quantization techniques for mixed, int8, and int4 precisions.- 20817Murphy ≡ DeepGuide
Bit-LoRA as an application of BitNet and 1.58 bit neural network technologies
Abstract: applying ~1bit transformer technology to LoRA adapters allows us to reach comparable performance with full-precision LoRA...- 24086Murphy ≡ DeepGuide
We look at an implementation of the HyperLogLog cardinality estimati
Using clustering algorithms such as K-means is one of the most popul
Level up Your Data Game by Mastering These 4 Skills
Learn how to create an object-oriented approach to compare and evalu
When I was a beginner using Kubernetes, my main concern was getting
Tutorial and theory on how to carry out forecasts with moving averag