Tag: LLM optimization

Sparse Attention and Performer Variants: Efficient Transformer Ideas for LLMs

Tamara Weed, Mar, 16 2026

Sparse attention and Performer variants solve the quadratic memory problem in transformers, enabling LLMs to process sequences up to 100,000+ tokens. Learn how these efficient architectures work, where they outperform standard models, and how they're being used in healthcare, legal tech, and genomics.

Categories:

Tags:

How to Choose Batch Sizes to Minimize Cost per Token in LLM Serving

Tamara Weed, Nov, 24 2025

Learn how to choose batch sizes for LLM serving to cut cost per token by up to 87%. Real-world examples, optimal batch sizes, GPU limits, and proven cost-saving techniques.

Categories:

Tags:

Tag: LLM optimization

Recent post

Categories

Archives

Tags