NVIDIA

Nemotron 3 Ultra
Feature Showcase

A focused Gradio app exercising the model's agentic reasoning switches on the live endpoint.
NVIDIA Nemotron 3 Ultra header
REASONING MODES
ON / OFF / Low Effort + Budget

Toggle enable_thinking and low_effort live. Dial reasoning_budget to control thinking tokens.

STREAMING
Reasoning + Tool Calls

Watch green reasoning tokens stream, then orange incremental delta.tool_calls args build before the tool executes.

HARNESS FOCUS
Built for agents

The model is post-trained for long-horizon orchestration. This tiny harness makes the capabilities visible.

Screenshots from the app

Reasoning playground
Reasoning playground — modes, budget, streaming
Tool calling
Streaming tool calls + local execution
Model card
Model card + recommended settings
Intelligence vs speed
Intelligence vs output speed benchmark
Benchmark comparison
Benchmark comparison vs other models
Cost efficiency frontier
Cost efficiency frontier
Run locally

Exercise the live endpoint

git clone https://github.com/cobusgreyling/nemotron-3-ultra-showcase.git
cd nemotron-3-ultra-showcase
pip install -r requirements.txt
export NVIDIA_API_KEY="nvapi-..."
python app.py
# open http://localhost:7860

See the full analysis and build notes in BLOG.md. The same reasoning + tool patterns are productionized in nemotron-think.