ENGINEERING
Our intern built the first zero-person business
Listen's agent ran a loop: interview users, build, test with real people, fix issues, repeat. 2,000 interviews and 100 concepts later: an app with 100s of paying customers.

When he kicked off the experiment, our intern didn’t give the agent a product idea. Instead, Veer told it “You are an autonomous founder-agent and your goal is to create a viral app, using Listen Labs interviews”. We wanted the product to incorporate AI so we gave it a constraint to build the app on the new image models.
After running 2,000 interviews over 2 weeks, it shipped StyleFits, an AI personal stylist.
The app is simple and a bit janky. But we were excited to see an agent build a product people actually paid for. It didn’t guess what people wanted. Everything was informed by talking to real users.
Full methodology below:
Discovery
The agent first ran a discovery study (n=200) through Listen on the everyday frustrations people would actually pay to fix. It surfaced ~10 recurring pain points, from chores and commuting to scheduling, money, and how people present themselves.
Ideation + Concept Testing
Then, it ran a process similar to breadth-first search. Using subagents, it generated and built batches of 10 concepts based on each pain point, and launched studies in parallel to test the severity and people’s willingness to pay for each of the 100 products.
The findings:
Most daily pain was fragmented across commute, chores, scheduling, and meal planning, but not hair-on-fire friction (“important, but I manage”)
Debt and jobs were painful, but people felt an app couldn’t solve them
Self-presentation (how they look, what to wear) was a recurring pain for 18-25 year olds

Out of 100 concepts, it identified the best fit: an app for self-presentation. This was the top pain it could solve under its constraint to build on image models.
Building + validation
It then acted on the insights, building its first live product: LooksMax, an app that scored users’ looks from a photo and suggested color swatches.

Across the first rounds of user testing, NPS was -38. 57% thought they were getting a virtual try-on of new outfits and styles rendered on their own photo. 75% hesitated or refused to pay the $1 fee. 96% saw no credible privacy signals.
And the framing was controversial. 72% found the name “LooksMax” off-putting, 78% called a quantified attractiveness score actively harmful.
“It feels incredibly unappealing… actually a little bit offensive.”
The key insights: people wanted a real stylist, personalization, and a trust signal.
With this feedback, the agent pivoted the product from a looks score to an outfit, haircut, and color stylist: StyleFits.

The redesign, with updated product messaging and privacy verification, hit 100% comprehension.
“No face matching, no face database… it makes me feel more comfortable using this website.”
Iteration
From here, the agent looped studies to optimize the live product. Across several rounds of user testing, it evaluated usability and willingness to pay, iterated, and retested.
The first version charged $5 for 8 reports, but people anchored lower, against free ChatGPT instead of a human stylist. So the agent adjusted the pricing model: $3 for 8 reports, ~$0.38 each, first one free.
“I could ask ChatGPT to do it for free.”

People wanted to see the output before paying, so it changed the user flow to include a free first report. Interest increased to 88%, and people who refused to upload their photos decreased by 67%.

Users liked the recommended looks but weren’t willing to pay for that alone. So the agent added the most-requested feature (by 54% of users): shopping links to buy the items directly.
Message testing
In the last week, the agent used Listen to test ad messaging concepts, generated ad creative with an image model, and launched Meta ads via MCP.

The result: 400+ real, paying users.
What’s next
The product is still really janky. The image model distorts faces a bit and the UI is clearly designed by AI. It made $1,293 but spent $2,000 on Meta ads to acquire customers.
StyleFits isn’t a profitable business, but it shows an agent can build something people will pay for by continuously talking to real users.
Agents can build anything, but without knowing what to build, they'll run in circles. Real, customer feedback gives compute direction. We believe that is the true advantage.
While there’s still a lot to iterate on, we’re excited to see *the start of* what’s possible when you combine compute with a constant feedback loop with real users.
We’ll continue stress testing the boundaries of research and report back on what we learn.
Try out the app here: stylefits.ai
We’re building our research lab. If you’re interested in working on the future of AI research, we’d love to hear from you at hiring@listenlabs.ai
Note on the experiment
The agent ran the loop itself and used over 163.2 million tokens. It launched the studies through Listen, read the analysis, wrote the code, and deployed. It couldn’t pass human identity checks (provide a selfie and ID) to open a Meta ads account or paste live Stripe and Supabase keys. When it needed a human, the agent generated tasks and someone on our team executed them.
One limitation we noticed was that the agent didn’t compound its own research. An early concept study that tested willingness-to-pay hinted that people needed to trust the product before they’d pay. But when the agent built the first version (LooksMax) it didn’t act on that finding. It only added a trust signal after rediscovering that insight several studies (and a lot of tokens) later. A researcher would have carried that finding forward and made sure it was built into the product from the start. Next time, we’d add in more research guardrails, so the agent compounds insights across studies and prioritizes which insights matter most before building.
Written by Diana Lim