- Train your ML models on GPU changing just one line of codeUtilize cuML and ATOM to make your machine learning pipelines blazingly fast
- 24386Murphy ≡ DeepGuide
- Pro GPU System vs Consumer GPU System for Deep LearningWhy you might consider going pro
- 28894Murphy ≡ DeepGuide
- Implement Multi-GPU Training on a single GPUAn Advanced Guide for TensorFlow
- 26710Murphy ≡ DeepGuide
- Deploying PyTorch Models with Nvidia Triton Inference ServerA flexible high-performant model serving solution
- 20848Murphy ≡ DeepGuide
- Host Hundreds of NLP Models Utilizing SageMaker Multi-Model Endpoints Backed By GPU InstancesIntegrate Triton Inference Server With Amazon SageMaker
- 22660Murphy ≡ DeepGuide
- Matrix Multiplication on GPUHow to achieve state-of-the-art matrix multiplication performance in CUDA.
- 28521Murphy ≡ DeepGuide
- Apple M2 Max GPU vs Nvidia V100, P100 and T4Compare Apple Silicon M2 Max GPU performances to Nvidia V100, P100, and T4 for training MLP, CNN, and LSTM models with TensorFlow.
- 26683Murphy ≡ DeepGuide
- Running a SOTA 7B Parameter Embedding Model on a Single GPUIn this post I will explain how to run a state-of-the-art 7B parameter LLM based embedding model on just a single 24GB GPU. I will cover some theory and then show how to run it with the HuggingFace Transformers library in Python in just a few lines of cod
- 20692Murphy ≡ DeepGuide
- Unleashing the Power of Triton: Mastering GPU Kernel Optimization in PythonAccelerating AI/ML Model Training with Custom Operators - Part 2
- 25994Murphy ≡ DeepGuide
- Massive Energy for Massive GPU Empowering AIMassive GPUs for AI model training and deployment require significant energy. As AI scales, optimizing energy efficiency will be crucial
- 22359Murphy ≡ DeepGuide
- Programming Apple GPUs through Go and Metal Shading LanguageInvestigating Go, Cgo, Metal Shading Language, Metal Performance Shaders, and benchmarking different approaches to matrix multiplication
- 26624Murphy ≡ DeepGuide
- Metal Programming in JuliaLeveraging the power of macOS GPUs with the Metal.jl Framework.
- 30184Murphy ≡ DeepGuide
- Fine Tuning LLMs on a Single Consumer Graphic CardLearnings from fine-tuning a large language model on a single consumer GPU
- 24328Murphy ≡ DeepGuide
- Apple M2 Max GPU vs Nvidia V100 (Part 2): Big Models and Energy EfficiencyCompare Apple Silicon M2 Max GPU performances and energy efficiency to Nvidia V100 for training CNN big models with TensorFlow
- 24601Murphy ≡ DeepGuide
- Maximizing the Utility of Scarce AI Resources: A Kubernetes ApproachOptimizing the use of limited AI training accelerators
- 28730Murphy ≡ DeepGuide
- Need for Speed: cuDF Pandas vs. PandasA comparative overview
- 20304Murphy ≡ DeepGuide
- PyTorch Native FP8Accelerating PyTorch Training Workloads with FP8 - Part 2
- 29569Murphy ≡ DeepGuide
- Profiling CUDA using Nsight Systems: A Numba ExampleLearn about profiling by inspecting concurrent and parallel Numba CUDA code in Nsight Systems
- 20606Murphy ≡ DeepGuide
- How Bend Works: A Parallel Programming Language That "Feels Like Python but Scales Like CUDAA brief introduction to Lambda Calculus, Interaction Combinators, and how they are used to parallelize operations on Bend / HVM.
- 25965Murphy ≡ DeepGuide
- The Mystery Behind the PyTorch Automatic Mixed Precision LibraryHow to get 2X speed up model training using three lines of code
- 21800Murphy ≡ DeepGuide
 1 2
We look at an implementation of the HyperLogLog cardinality estimati
Using clustering algorithms such as K-means is one of the most popul
Level up Your Data Game by Mastering These 4 Skills
Learn how to create an object-oriented approach to compare and evalu
When I was a beginner using Kubernetes, my main concern was getting
Tutorial and theory on how to carry out forecasts with moving averag
