Seven Lessons Plantory Taught Us About AI in Production

Length:

7 min

Published:

April 22, 2026

Seven Lessons Plantory Taught Us About AI in Production

Most AI advice in 2026 is still written from the outside looking in: conference talks, Twitter threads, vendor blog posts. Useful, but not the same as operating an AI product with paying customers, a live Meta Ads account, and a cost dashboard you're afraid to open on Mondays.

We've run Plantory.ai, DX Heroes' in-house AI-native SaaS, long enough for demos to stop mattering. Production has shown us seven lessons.

Full context: the Plantory case study, the founder story, and the architectural playbook.

1. Context beats cleverness

Every moment when the model felt smart in Plantory traced back to better grounding, not better prompts.

The Gemini garden advisor doesn't feel useful because of prompt engineering. It feels useful because every call ships with the garden's canvas state, climate zone, soil type, sun exposure, and plant inventory. Strip that context away, and the same model gives generic forum-grade answers.

The practical rule: before you tune a prompt, audit what structured context the model receives. Nine times out of ten, that is where the useful improvement is.

2. AI writes the ad; humans approve the flight

We deploy programmatically to Meta Ads and Google Ads. AI writes creative, generates variants, and sets targeting. That saves real production time.

What's not autonomous is the budget decision. Scaling a flight, reallocating spend across campaigns, and shutting down an underperforming campaign still get a human in the loop. Not because the AI can't technically do them, but because the cost of being wrong is high and the cost of human review is low.

Rule of thumb: automate the production; keep humans on the capital allocation.

3. Cheap models first, smart models on demand

It's tempting to route everything through the biggest, smartest model. It's also the fastest way to blow up your unit economics.

In Plantory:

Plant identification from photos → mid-tier multimodal model
Task generation from canvas state → mid-tier with structured output
Freeform garden advice chat → flagship model, streaming
Generating social post copy → cheap fast model
Spec-driven development inside Claude Code → flagship, because that's where quality compounds

The pattern: start cheap, escalate only when output quality demands it, and review the routing quarterly.

4. Evals > vibes

For the first months, we leaned on "does this feel right?" to evaluate changes. That worked until a model update silently shifted behavior, we didn't notice for a week, and users noticed first.

Now every AI endpoint has a tiny eval set: 20 to 50 inputs with expected shapes of outputs. They run on every deploy. They're not fancy. They catch the dumb stuff fast.

Small evals on day one beat a perfect eval system that you build on day ninety.

5. Media automation is leverage, not decoration

Our Satori + Resvg + Gemini pipeline produces every social post image, every SEO hero, and every article cover across eight locales. At the start, it felt like a nice-to-have. Now it's the difference between shipping one marketing asset per locale per week and shipping dozens.

But here's the honest part: automation exposes workflow debt. Once you can generate assets cheaply, you immediately need a content calendar, a review gate, and a publishing pipeline. Otherwise, you produce volume with no coherence.

Build the human workflow before you ramp the generation. Not after.

6. A plugin beats a prompt

The biggest productivity gain on the build side wasn't a clever prompt. It was building our own Claude Code plugin marketplace.

The plantory plugin ships 20+ skills: /plantory:spec-plan, /plantory:board-work, /plantory:blog-article, /plantory:paid-performance-review, /plantory:social-media-posting, and more. Each one packages a workflow we used to do ad hoc.

Why does this work? Because prompts are volatile. Small wording changes produce different outputs, people forget the shape, and new team members can't reproduce what the old team did. Skills make the workflow reviewable, versioned, and shareable. It's the same reason we write functions instead of pasting the same code in five places.

If you're running AI coding at scale, stop polishing prompts. Start shipping skills.

7. Ship it to paying users or you're bluffing

The hardest and most boring lesson.

Internal demos, test accounts, and friendly beta users all let you tell yourself the product works. Real Stripe customers across eight countries, with different languages, expectations, and devices, do not let you do that.

Every hard truth on this list came from production contact with paying users. The ad pipeline worked beautifully until it didn't. The advisor was "great" until a German user asked about a plant we hadn't localized the recommendations for. The eval set looked good until a silent model drift started degrading plant ID.

The only AI system that's real is one with paying users on it. Everything else is a rehearsal.

Where this leaves us

Plantory.ai isn't a client project. It's our own AI testbed, live in production, taking the hits so we can hand clients a playbook we've tested in the real world.

If you want the story of why we built it: Why We Built Plantory. If you want the architecture: The AI-Native SaaS Playbook. If you want the polished case study: Plantory.ai: the case study.

If you want a team that's done this and wants to help you do it: talk to us.

Back to insights

Want to stay one step ahead?

Don't miss our best insights. No spam, just practical analyses, invitations to exclusive events, and podcast summaries delivered straight to your inbox.