Gemma 4 VLA Demo on Jetson Orin Nano Super
Gemma 4 VLA Demo on Jetson Orin Nano Super You speak → Parakeet STT → Gemma 4 → [Webcam if needed] → Kokoro TTS → Speaker Press SPACE to record, SPACE again to stop. This is a simple VLA: the model decides on its own whether to act based on the context of what you asked, no keyword triggers, no hardcoded logic. If your question needs Gemma to open her eyes, she'll decide to take a photo, interpret it, and answer you with that context in mind. She's not describing the picture, she's answering your actual question using what she saw. And honestly? It's pretty impressive that this runs on a Jetson Orin Nano. :) Get the code The full script for this tutorial lives on GitHub, in my Google_Gemma repo next to the Gemma 2 demos: 👉 github.com/asierarranz/Google_Gemma Grab…