Tag: GPU optimization
Tamara Weed, Jan, 17 2026
Learn how LLM compression techniques like quantization and pruning let you run large models on consumer GPUs and CPUs without sacrificing performance. Real-world benchmarks, trade-offs, and what to use in 2026.
Categories:
Tags:
Tamara Weed, Dec, 17 2025
Tensor parallelism is the key technique for running large language models across multiple GPUs. Learn how it splits model layers to fit bigger models on smaller hardware, its real-world performance, and how to use it with modern frameworks.
Categories:
Tags:

