The Shopify automations no one builds

Tagging is 14% of every Shopify workflow merchants ship. Returns and refunds — the work that actually scales with order volume — are under 1%. A field study of 1,374 playbooks and the gap between what's easy to automate and what's worth automating.

There is a merchant we'll call Imani. She runs a forty-SKU candle brand out of a converted garage in Brooklyn. By her own count, her Shopify backend has nineteen tag rules — VIP, repeat, fragrance-preference-vanilla, fragrance-preference-citrus, and on. She built every one of them inside Shopify Flow. They run silently in the background. She has not opened the Flow editor in four months.

What still sits open on her laptop, at 11pm on a Tuesday: an email from a customer named Marcus. The candle arrived broken. He's attached three photos. He'd like a refund, but he'd also like the same candle reshipped — but please not packed the same way. Imani writes back. She refunds in Shopify. She creates a new order with a discount code that bypasses inventory deduction, because she's already short on that SKU. She emails her 3PL to please use the extra bubble wrap. She tags the customer broken-shipment-feb so she can find them later.

Forty-five minutes. One customer. Imani has nineteen tag rules and zero return-handling automation.

She is not unusual.

The 14% / 0.8% gap

We pulled telemetry from 1,374 Shopify automation workflows shipped on third-party tools in Q1 2026. Tagging — adding a label to an order or a customer — was 14% of every workflow built. Returns and refunds, combined, were 0.8%. Abandoned-checkout flows were 1.4%. Fraud screening was below 1%.

That asymmetry doesn't track with where the work is. Talk to any Shopify merchant doing more than fifty orders a day and they'll tell you the same thing: returns eat their afternoons. Fraud reviews eat their mornings. Tagging? Tagging happens once, in the background, forever.

So why does everyone build the tag and not the return?

The unbuilt automations aren't unbuilt because no one wants them. They're unbuilt because the tools can't compile them.

Easy doesn't mean valuable

Tagging is one line of prose. "When a customer's lifetime value crosses $500, tag them VIP." The automation is the rule. The work is done.

A return is a conversation. "When a customer requests a refund, ask for a photo, decide whether to require return shipping, issue the refund or a partial, restock if applicable, update the 3PL, decide whether to send a replacement, write a brand-appropriate response, tag the customer accordingly, and surface it in a weekly report if it's the third return from the same buyer." That is not one line. That is twelve lines, three judgement calls, and a tone-of-voice problem.

In a node-based builder, this is roughly impossible. You end up with a Flow chart that has nineteen branches and still hands off to a human four times. So you don't build it. You answer the email instead.

◆ DATA Across the 1,374 workflows we studied, 67% had four or fewer steps. The median workflow length on natural-language tools was eleven steps. The work merchants actually have lives in the gap between those numbers.

Four Shopify automations that pay back faster than tagging

We sat with twelve merchants over two weeks and asked the same question: of everything you do manually, what would you most like to hand off? Four answers came up again and again. None of them were tagging.

1. VIP detection that actually means something

The tag-by-LTV pattern is a blunt instrument. A customer who spent $600 once is not a VIP. A customer who's spent $400 across nine orders, refers two friends a quarter, and writes five-star reviews — that's a VIP. Detecting them takes three signals braided together, not one threshold.

We've watched merchants try to build this in Flow and give up after twelve nodes. The same logic in prose:

# trigger
When an order is paid

# condition
Tag as "VIP" if all are true:
  · customer's order count is > 4
  · their average review score is > 4.5
  · their lifetime value is in the top 20% of the cohort
    that joined the same month they did

# action
Send a Slack ping to the founders' channel
Email a hand-written note offer (template: brand-voice)

Six lines. Compiled. One fashion brand in our cohort saw a 34% lift in repeat purchase rates from customers who received VIP recognition inside twenty-four hours of qualifying. The threshold isn't magic; the timeliness is. And timeliness is what manual review costs you.

2. Refund triage with evidence parsing

The merchant doesn't need the refund to issue itself. They need the email-plus-photos to arrive on a pre-decided path: low-value damage gets auto-approved with a restock skip; high-value damage routes to a human review with the photos already extracted into a shared sheet; a suspected serial returner gets flagged and routed to the founder.

This is one prose paragraph. It is also four years of opt-in scaffolding in any drag-and-drop tool — because the photos need vision, the LTV check needs a Shopify lookup, the tone of the reply needs a brand-voice file, and the routing needs a fork that no one ever wants to draw.

Written as prose, it looks like this:

# trigger
When a refund request email lands in support@

# steps
1. Read the message and attached photos
2. If damage looks under $50 → approve refund, skip restock
3. Else → route to me with photos in Sheets
4. If this is the customer's 3rd return in 60 days →
   flag for review, do not auto-approve
5. Reply with a brand-appropriate message (warm if
   first return, firm if pattern)

The compiler calls Shopify's refunds API, calls a vision model on the attached image, looks up the customer's return history, checks the brand-voice file, and writes the draft reply — all without the merchant naming any of those services. The merchant described what they wanted; the compiler picked the tools.

3. Chargeback evidence assembly

When a dispute lands, the merchant has seventy-two hours to ship a defense packet: order details, IP logs, shipping confirmations, prior conversations, proof of delivery, AVS match, the customer's order history. Manually collected, that's two to three hours per case. ChargePay reports their customers save ten-plus hours a week on this exact task.

The work isn't analytical. It's clerical. But it lives in five systems. Shopify holds the order; Stripe holds the auth log; Gorgias holds the conversation; ShipStation holds the tracking; the 3PL holds the proof-of-delivery scan. A compiled playbook stitches the packet together while the merchant is asleep, drops the PDF into a folder, and posts a one-line summary to Slack with a "submit / hold" button.

◆ NOTE The economics here are unusually clean. The average disputed order is $80–$120. The merchant fee for a lost dispute is $15. A compiled response packet has a 65–70% win rate. Every hour spent assembling one returns $50–$80 in recovered revenue. Most merchants don't bother — not because the math is wrong, but because the assembly is awful.

4. Abandoned-checkout intent classification

The native Shopify abandoned-cart email goes to everyone. It treats the customer who bounced because their card was declined the same as the customer who bounced because they're price-shopping — and the same as the customer who put a $400 item in the cart at 2am and woke up reconsidering.

An AI compiler can read the cart contents, the time of day, the customer's prior history, and the exit pattern, and send a different message — or no message — accordingly. A declined card gets a "your card didn't go through, here's a one-tap retry" email in six minutes. A price-shopper gets a different treatment at twenty-four hours, with a soft discount only if they've never bought before. A 2am cart gets nothing for ten hours, then a warm "still thinking about this?" — not a discount, because they almost certainly weren't price-sensitive.

Most merchants run the default flow not because they don't know about this — they read the same blog posts you do — but because the cost of building it exceeded the value. In a node graph, that's an eighteen-branch tree. In prose, it's eleven lines.

The compiler is the unlock

What ties these four workflows together is that they are all conversations, not rules. Each one requires the automation to know something the prose doesn't explicitly state — the appropriate tone for a refund denial, the heuristics for "high-value damage," the difference between a price-shopping bounce and an abandoned-by-distraction bounce.

This is exactly where a natural-language compiler earns its keep. The merchant describes what they want. The compiler handles what it should be — the retry logic, the timeout, the AVS lookup, the brand-voice pass on the reply. The merchant never writes a node, never picks an integration from a menu of four thousand, never draws a line between two rectangles.

The Shopify ecosystem has spent a decade optimizing the easy automations. There are entire SaaS companies whose product is "Shopify Flow but with one more trigger type." Tagging, tagging, tagging. The next decade is about the hard ones — the ones that require an automation to think about a refund instead of just route it.

Stop building tags

Tagging isn't bad. It's just done. If you're a Shopify merchant looking at your automation backlog and the next item on it is "auto-tag customers who buy from the candle line" — consider skipping it.

The next workflow worth building is the one that's been sitting in your inbox for four months. The one where you wrote "ugh, I should automate this" in the margin of a Notion doc and never came back to it. The one you can describe in two paragraphs but couldn't draw in a node graph if your fund-raise depended on it.

That's the one. Build that one.

◆ READING If this resonates, three pieces to follow it up with: our essay on the death of the drag-and-drop builder, the interview with Kira Hartman on consolidating 400 zaps, and the post-mortem on the night our retry logic ate itself.

If you're a Shopify merchant building one of the workflows above — or stuck on one — the inbox is open: field-notes@dugong.live. We're collecting case studies for issue 50.

The Shopify automations no one builds.

The 14% / 0.8% gap

Easy doesn't mean valuable

Four Shopify automations that pay back faster than tagging

1. VIP detection that actually means something

2. Refund triage with evidence parsing

3. Chargeback evidence assembly

4. Abandoned-checkout intent classification

The compiler is the unlock

Stop building tags

The death of the drag-and-drop builder

"We deleted 400 Zapier zaps in a weekend."

A taxonomy of "the workflow broke"