Article

Why generative AI changes everything about product sound.

Everyone has heard what Suno and Udio can do — generate a complete song from a text prompt in seconds. Impressive. But if you're building a product that interacts with people, this is the wrong way to think about generative audio. The real shift isn't infinite content. It's that sound can finally react.

What generative audio is not

The current excitement around generative music is mostly about creation: type a prompt, get a track. This is powerful for content production — background music for videos, placeholder scores, mood boards. But it has almost nothing to do with product sound.

Product sound isn't content. It's behavior. A car's feedback tone isn't a piece of music — it's a real-time response to a system state. A robot's movement sound isn't a composition — it's a continuous expression of what the machine is doing right now. An AI assistant's thinking indicator isn't a jingle — it's a moment of communication that should feel different every time, because the context is different every time.

You can't solve this with a prompt. You can't pre-generate it. You need a system that generates sound in the moment, shaped by what's actually happening.

The scaling wall

Before generative AI, product sound had two approaches. Static files: a library of pre-produced sounds mapped to events. Or rule-based engines: a system that combines and modulates sound elements based on parameters.

Static files are simple but rigid. The notification tone is the same whether it's 3am or 3pm, whether the user is relaxed or stressed, whether it's the first alert today or the fiftieth. One sound for every situation.

Rule-based engines are more flexible — you can map parameters to audio behavior. Weather affects timbre. Time of day affects pitch. User state affects intensity. We've built systems like this, and they work. But they hit a wall: every additional parameter multiplies the complexity exponentially. With three parameters, you need dozens of rules. With eight parameters, you need thousands. At some point, the system becomes impossible to author manually.

This is the scaling wall. And generative AI is the first technology that breaks through it.

What actually changes

A generative model doesn't need explicit rules for every parameter combination. It learns patterns from training data and produces output that is coherent without being predetermined. Feed it the current context — driving dynamics, time of day, weather, user behavior — and it generates a sonic response that makes musical sense, without anyone having authored that specific combination.

This is fundamentally different from both static files and rule-based systems. It's not playback. It's not parameter mapping. It's generation — sound that didn't exist before this moment, created for this specific situation.

For product sound, this means:

Every moment can be unique. Not randomly unique — meaningfully unique. The startup sound on a cold Monday morning in rain can feel genuinely different from a warm Friday evening, because the model understands the relationship between context and musical expression.

The system scales. Adding a new parameter — say, cabin occupancy or road surface — doesn't require reauthoring the entire sound library. The model integrates it into its generation process.

Brand identity becomes generative. Instead of a fixed set of brand sounds, the brand has a sonic character that expresses itself differently in every situation but remains recognizably itself. Like a person who always sounds like themselves, even when they're saying something they've never said before.

Why you can't just use Suno

If generative audio is the answer, why not use the tools that already exist? Three reasons.

Real-time. Suno generates a three-minute track in fifteen seconds. That's fast for music production. It's useless for a product that needs to respond in under two seconds. Product sound generation must run at the edge — on the device, with minimal latency, no cloud dependency.

Control. A text prompt gives you approximate control. "Generate something calm and warm" might work for a mood board. It doesn't work for a product where the sound must reflect specific system states, adapt to measurable parameters, and remain consistent across thousands of interactions. Product sound needs precise, parameterized control — not a slot machine.

Provenance. The training data behind most generative music models is legally uncertain. Scraped from the internet, with unclear rights, no artist consent, no audit trail. For a product shipping in millions of vehicles, this is an unacceptable risk. The training data must be rights-cleared, traceable, and defensible.

What it actually takes

Building a generative sound system for products requires solving three problems at once:

A semantic layer that translates raw context data — sensor inputs, system states, user behavior — into meaningful sonic parameters. Not "the speed is 80 km/h" but "the driving feel is relaxed highway cruising." This is the intelligence layer, and it's where most of the design work happens.

A generative model that produces high-quality audio in real time, on embedded hardware, with fine-grained control over the output. Not a cloud API that returns a file, but an engine that runs on the device and generates sound continuously.

A training corpus that is musically distinctive, legally compliant, and built for this purpose — not scraped from the internet but contributed by real musicians whose work is properly licensed and compensated.

Each of these is hard. Together, they represent a new kind of infrastructure — not a feature you add to a product, but a platform that changes what product sound can be.

The shift is from tuning sounds to designing meaning.

We're building this infrastructure. CORPUS Reef is a real-time generative sound model trained on rights-cleared data — edge-deployable, controllable, and designed for products that interact with people.

See our technology → Get in touch →