Google just dropped Gemma 4, a new family of open weight AI models that you can actually run on your own hardware. There are four sizes in the lineup: two beefier ones aimed at developers with serious GPU setups, and two lighter versions built specifically for mobile devices. The bigger variants, 26B Mixture of Experts and 31B Dense, are designed to run on an 80GB H100 at full precision or on consumer GPUs when quantized down. The 2B and 4B models are going after on-device use cases on phones and tablets.
The more interesting move here might actually be the licensing switch. Google has ditched its custom Gemma license and moved to Apache 2.0, which is about as developer friendly as it gets in the open source world. This was a real complaint in the developer community, and Google listened. Apache 2.0 means fewer legal headaches for teams trying to build products around these models.
The technical detail worth paying attention to is how the 26B MoE model works. It only activates 3.8 billion parameters during inference, which means you get much faster response times without sacrificing the depth that a larger model brings. That is a genuinely useful design choice for anyone building latency sensitive applications. The 31B Dense is slower but more capable, and Google is pushing it as a base for fine tuning on specific tasks.