Ch. 4 — Notes · § 012026·05·19 · — words
Ch. 4

I Used Codex on My Phone to Make a Song on My Mac mini

§ 01
COLOPHON
Source Serif 4 · JetBrains Mono · Forge Codex
TOOLS
Next 15 · MDX · framer-motion

An AI music experiment that started from a mobile Codex conversation: my Mac mini operated the browser, Suno generated candidates, and Codex downloaded and verified a 4-minute-08-second Chinese rap demo.

TL;DR: What I really built was not “an AI song.” It was a creative loop: mobile conversation, remote execution, Suno generation, file download, and result verification. The human value is not pressing generate. It is theme, judgment, and review.


Today I used Codex from my phone to remotely direct my Mac mini at home and finish a Chinese rap demo in Suno.

The experience felt counterintuitive.

When people talk about AI music, they usually imagine opening a website, typing a prompt, and waiting for a song to appear.

This time was different. I stayed on my phone. I gave the requirements, judged the direction, and pointed out what felt wrong. On the other side, Codex controlled Chrome on my Mac mini, logged into Suno, filled in the lyrics and style prompt, waited for generation, and downloaded the MP3 locally.

Codex is an AI assistant that can read files, edit files, run commands, and operate a browser. It does not only chat. It can finish a task step by step on my computer.

The phone was only the entry point. The Mac mini did the actual work.

In the past, turning an idea into a real song required lyrics, composition, arrangement, tools, downloads, and file management. This time, that chain became much shorter.

Whether an idea can land is not only about inspiration. It depends on whether the execution chain is connected.

§The Phone Gives Direction, the Computer Executes

I was not using remote desktop, and I was not mirroring my Mac to my phone.

More precisely, I continued the same Codex task from the mobile side, while the Codex desktop app on my Mac mini executed the work. The premise was simple: Codex was already open on the Mac, the workspace and permissions were ready, and Chrome was already logged into Suno.

So the phone did the directing: define the request, confirm the direction, and decide which version to download.

The Mac did the execution: read files, write lyrics, operate the browser, wait for generation, and verify the downloaded result.

The phone does not need to do the computer’s job. It only needs to send my judgment to the computer.

§The Hard Part Is Judgment

My first request was vague: I wanted a Chinese rap song with the night-time mood and narrative feeling of Jay Chou’s Nocturne.

The first generation felt stiff. The rhymes were too mechanical, and it did not sound like a real person speaking. I asked for more natural Chinese rap rhymes, but the result still was not good enough.

That step mattered.

The hardest part of AI music is not generation. It is judgment.

Many people run into the same problem with Suno: they can generate 20 songs, but they do not know which one is good, why it is good, or how to improve it.

I later broke the problem down again. A song needs an emotional premise, a hook, lyrical texture, singability, melodic motive, structure, vocal identity, and production taste.

In plain language: who is this song speaking for? Can the listener remember one line? Does the lyric flow when sung? Does the voice feel like a specific person?

So I asked Codex to update a Suno creator skill. A skill is a local instruction set that tells the Agent what standards to use, what to ask, how to write, how to revise, and how to operate the browser for this kind of task.

The point was not that AI wrote a song for me. The point was that I connected judgment, process, and execution into one loop.

§Define the Song Before Opening Suno

The final theme I chose was: everyone thinks I am fine, but only the night knows I am not completely over it.

I first asked the Agent to define the one-sentence theme, then the singer identity: a man between 25 and 35, restrained, stubborn, not self-pitying, like someone recording a late-night voice message that never gets sent.

Then came the hook.

The line I kept was:

Everyone thinks I am fine Only the night knows

It works because it is simple, singable, and visual. It does not directly say “I am sad.” It separates day and night. Day handles composure. Night handles the truth.

Once the theme, identity, and hook were clear, Suno had a much steadier direction.

The style prompt was Chinese melodic rap, close male vocal, relaxed rap rhythm, piano loop, warm electric piano, restrained verse, slightly opened chorus, 88 BPM, modern clean mix.

I did not ask it to imitate any specific singer. The reference song was only used to extract neutral creative language: night mood, piano, low-end, narrative, melodic rap. No melody copying, no voice imitation.

§Codex Actually Operated the Browser

After the creative direction was ready, I asked Codex to operate the browser.

It opened the Suno creation page, confirmed the account was logged in, filled in the lyrics, title, and style prompt, selected the model, and clicked generate.

Suno generated four candidates: two full v4.5 versions, 4 minutes 08 seconds and 4 minutes 17 seconds long, and two v5.5 Preview versions that were one-minute previews and required an upgrade for full generation.

I kept the 4-minute-08-second full version. The reason was simple: it was complete, shorter, and structurally tighter for a demo.

During download, Suno showed a commercial rights reminder: only songs generated with Pro and Premier plans are suitable for commercial use. I did not bypass that limitation. I only downloaded the personal listening version.

Then I asked Codex to inspect the file. The result showed that the MP3 was valid, about 5.9MB, and about 248 seconds long, which is 4 minutes 08 seconds.

Only at this point did the loop feel complete. The generate button is not the finish line. The file needs to download, and the result needs to be verified.

§The Problems Were Obvious

The first problem: style alone is not enough.

If the prompt only says Chinese rap, night, and sadness, Suno will likely produce something that sounds like an AI template. The theme, character, hook, and emotional boundary need to be clear before generation.

The second problem: denser rhymes are not always better.

Chinese rap depends on spoken flow. If the line exists only to rhyme, it sounds mechanical. Good rhyme should feel like something a person would naturally say.

The third problem: browser automation cannot rush.

Suno generation takes time. The save flow can trigger popups. The page state can change because of account status, credits, or permissions. The Agent has to inspect the page while operating it. It cannot blindly follow a fixed script.

The fourth problem: copyright boundaries need to be explicit.

For personal experimentation, this workflow is fun. For commercial release, you need to check the Suno plan, lyric originality, reference sources, and whether voice imitation is involved. That part cannot be ignored.

§How I Would Reuse This Workflow

I now see this less as a one-off toy and more as a creative system.

First, write the one-sentence theme. Do not write “a breakup song.” Write a concrete situation, such as: everyone thinks I am fine, but I am only not collapsing during the day.

Second, define the singer identity. Age, personality, speaking style, and emotional boundary all matter.

Third, write the hook first. The song needs at least one line that can be remembered.

Fourth, write the lyrics and style prompt. Lyrics carry the human voice. The prompt carries the sonic direction.

Fifth, let Suno generate multiple versions, but only change one variable at a time. This round changes lyrics. The next round changes style. Do not reset everything each time.

Sixth, review after download. Keep useful hooks, images, and style prompts. Also write down why a version failed.

AI can lower the cost of making things, but it cannot replace the judgment of which version deserves to stay.

That is the most useful part of this experiment. Remote-controlling a Mac from a phone is novel, but the deeper point is that an idea can now start as a phone conversation, pass through a local skill, browser operation, Suno generation, file download, and review, then become a real file.

Many ideas used to stay in my head. Now they can at least become a demo.

If you want to try something similar, do not start by chasing a hit song. Start with one piece that can be reviewed.

Make the theme precise, keep the hook short, define the character clearly, generate 2 to 4 versions, and then ask only one question:

Why would anyone listen to this song again?

SIGNED北京 · 2026·05·19 · git dev