Tag: GPU optimization

Hardware-Friendly LLM Compression: How to Optimize Large Models for GPUs and CPUs
Hardware-Friendly LLM Compression: How to Optimize Large Models for GPUs and CPUs

Tamara Weed, Jan, 17 2026

Learn how LLM compression techniques like quantization and pruning let you run large models on consumer GPUs and CPUs without sacrificing performance. Real-world benchmarks, trade-offs, and what to use in 2026.

Categories:

Multi-GPU Inference Strategies for Large Language Models: Tensor Parallelism 101
Multi-GPU Inference Strategies for Large Language Models: Tensor Parallelism 101

Tamara Weed, Dec, 17 2025

Tensor parallelism is the key technique for running large language models across multiple GPUs. Learn how it splits model layers to fit bigger models on smaller hardware, its real-world performance, and how to use it with modern frameworks.

Categories: