ConvoProbeConvoProbe

Your chatbot "works." But does it?

Playwright-style visual testing for multi-turn conversations. No code.

ConvoProbe dashboard showing conversation scenario test results with quality scores

You know what it should do. You do not know what it does.

Those are two different things, and pretending they are the same is how bad answers reach customers.

"Should do" and "does" are two different things.

You know, roughly, what your chatbot is supposed to do. You do not know, precisely, what it actually does. Pretending they are the same is how teams ship chatbots that quietly disappoint users for weeks before anyone notices.

Most chatbot QA today is a vibe.

Someone opens the bot, asks three or four questions, reads the answers, and says "looks fine." That is not testing. That is hoping. And one person checking a handful of flows cannot cover the thousands of conversation paths real users will take.

"Does it work" is the wrong question.

The honest question is "how would I know if it stopped working." If the answer is "a customer would tell us," your chatbot is already worse than you think it is. You just have not measured it yet.

A Quality Gate for Every Release

You wouldn't ship a product without testing it. Why ship conversations without testing them?

Catch It Before Deploy

Define the conversations your chatbot must handle correctly. ConvoProbe runs them automatically and tells you exactly which ones fail — before a single user is affected.

Replace "Looks Good" with Data

Accuracy, consistency, goal achievement — scored and tracked. Stop relying on a PM clicking through 3 conversations and calling it QA.

Every Prompt Change, Tested

Changed a system prompt? Run your full scenario suite in one click. See a diff of what improved and what broke. No surprises in production.

Detailed evaluation result showing turn-by-turn conversation analysis

The Cost of Not Testing

1 in 10

Conversations contain errors

Most teams don't know their chatbot's actual failure rate until they measure it

Hours → Minutes

QA cycle per release

What took a team half a day now runs automatically

10 min

To your first test result

Connect your chatbot, generate scenarios, run them. No setup meetings required.

Three steps. Ten minutes.

No SDK. No code in your repo. No integration project.

1

Point it at your Dify app.

Paste the endpoint URL and key. That is the entire integration.

2

Build scenarios visually.

Drag turns, branch on conditions, set what a good answer looks like. Or import your Dify DSL and let ConvoProbe draft the first pass for you.

3

Run them and read the report.

Every turn is scored. Every failure shows you what the bot said versus what it should have said.

ConvoProbe scenario editor showing multi-turn conversation test design

Where One Wrong Answer Costs You

Customer Support

Your bot tells a customer the wrong return policy. They show up at the store. Now you have a complaint and a broken promise to fix.

E-commerce

Your bot recommends an out-of-stock product or hallucinates a discount code. The customer tries to check out. It doesn't work. They leave.

Internal Operations

Your HR bot gives wrong information about leave policy. An employee makes a decision based on it. Now legal is involved.

Be Honest — This Is Your Current QA

What You Do Today

Chat with your bot 3 times and say "looks fine"

No idea if last week's prompt change helped or hurt

Find out about failures from customer complaints

Spend half a day on QA before every release

With ConvoProbe

Run 100+ conversation scenarios automatically

See a scored diff after every prompt change

Find failures before they reach a single user

Ship the same day you make changes

Pricing

Start with the Free plan. Every feature is included.

Free

$0

Everyone starts here.

What's included

  • Visual multi-turn scenario editor
  • Conditional branching with LLM judgment
  • LLM-as-Judge evaluation with custom criteria
  • Dify-native integration
  • Auto-generate scenarios from Dify DSL
  • Batch evaluation for regression tests
  • Evaluation history and trends
  • Score reasoning for every turn
  • Multi-provider LLM evaluator (OpenAI, Claude, Gemini)
Coming Soon

Pro

Details coming later.

Stop guessing whether your chatbot is good.

Connect your Dify app, run your first scenario, and know the answer in ten minutes.

    ConvoProbe — Your chatbot "works." But does it?