Gpu

Train your ML models on GPU changing just one line of code
Utilize cuML and ATOM to make your machine learning pipelines blazingly fast
24386Murphy ≡ DeepGuide
Pro GPU System vs Consumer GPU System for Deep Learning
Why you might consider going pro
28894Murphy ≡ DeepGuide
Implement Multi-GPU Training on a single GPU
An Advanced Guide for TensorFlow
26710Murphy ≡ DeepGuide
Deploying PyTorch Models with Nvidia Triton Inference Server
A flexible high-performant model serving solution
20848Murphy ≡ DeepGuide
Host Hundreds of NLP Models Utilizing SageMaker Multi-Model Endpoints Backed By GPU Instances
Integrate Triton Inference Server With Amazon SageMaker
22660Murphy ≡ DeepGuide
Matrix Multiplication on GPU
How to achieve state-of-the-art matrix multiplication performance in CUDA.
28521Murphy ≡ DeepGuide
Apple M2 Max GPU vs Nvidia V100, P100 and T4
Compare Apple Silicon M2 Max GPU performances to Nvidia V100, P100, and T4 for training MLP, CNN, and LSTM models with TensorFlow.
26683Murphy ≡ DeepGuide
Running a SOTA 7B Parameter Embedding Model on a Single GPU
In this post I will explain how to run a state-of-the-art 7B parameter LLM based embedding model on just a single 24GB GPU. I will cover some theory and then show how to run it with the HuggingFace Transformers library in Python in just a few lines of cod
20692Murphy ≡ DeepGuide
Unleashing the Power of Triton: Mastering GPU Kernel Optimization in Python
Accelerating AI/ML Model Training with Custom Operators - Part 2
25994Murphy ≡ DeepGuide
Massive Energy for Massive GPU Empowering AI
Massive GPUs for AI model training and deployment require significant energy. As AI scales, optimizing energy efficiency will be crucial
22359Murphy ≡ DeepGuide
Programming Apple GPUs through Go and Metal Shading Language
Investigating Go, Cgo, Metal Shading Language, Metal Performance Shaders, and benchmarking different approaches to matrix multiplication
26624Murphy ≡ DeepGuide
Metal Programming in Julia
Leveraging the power of macOS GPUs with the Metal.jl Framework.
30184Murphy ≡ DeepGuide
Fine Tuning LLMs on a Single Consumer Graphic Card
Learnings from fine-tuning a large language model on a single consumer GPU
24328Murphy ≡ DeepGuide
Apple M2 Max GPU vs Nvidia V100 (Part 2): Big Models and Energy Efficiency
Compare Apple Silicon M2 Max GPU performances and energy efficiency to Nvidia V100 for training CNN big models with TensorFlow
24601Murphy ≡ DeepGuide
Maximizing the Utility of Scarce AI Resources: A Kubernetes Approach
Optimizing the use of limited AI training accelerators
28730Murphy ≡ DeepGuide
Need for Speed: cuDF Pandas vs. Pandas
A comparative overview
20304Murphy ≡ DeepGuide
PyTorch Native FP8
Accelerating PyTorch Training Workloads with FP8 - Part 2
29569Murphy ≡ DeepGuide
Profiling CUDA using Nsight Systems: A Numba Example
Learn about profiling by inspecting concurrent and parallel Numba CUDA code in Nsight Systems
20606Murphy ≡ DeepGuide
How Bend Works: A Parallel Programming Language That "Feels Like Python but Scales Like CUDA&#
A brief introduction to Lambda Calculus, Interaction Combinators, and how they are used to parallelize operations on Bend / HVM.
25965Murphy ≡ DeepGuide
The Mystery Behind the PyTorch Automatic Mixed Precision Library
How to get 2X speed up model training using three lines of code
21800Murphy ≡ DeepGuide

1 2

HyperLogLog implemented using

We look at an implementation of the HyperLogLog cardinality estimati

K-means Clustering: An Introdu

Using clustering algorithms such as K-means is one of the most popul

The 4 Small but Powerful Ways

Level up Your Data Game by Mastering These 4 Skills

Benchmarking Machine Learning

Learn how to create an object-oriented approach to compare and evalu

The smart, flexible way to run

When I was a beginner using Kubernetes, my main concern was getting

How To Forecast With Moving Av

Tutorial and theory on how to carry out forecasts with moving averag

Information related to Tags Gpu