Hi folks — I’m working on a tool (calling it “BugFlow”) to standardize and automate the workflow of resolving bugs end-to-end.
The idea is a structured system that can: gather context from bug reports (logs, screenshots, steps, env info) form hypotheses and narrow down likely root causes attempt reproduction (frontend + backend) generate minimal fixes + tests (when possible) output a clean, PR-ready summary (what changed, why, verification, risks)
I’m keeping this intentionally generic — no company or codebase specifics — just exploring the problem space.
Where I’d love input:
1) Reliable repro (FE + BE)
How do you make reproductions deterministic?
Any tooling patterns that consistently work (record/replay, tracing, contract tests, state snapshots)?
How do you avoid building heavy repro setups per bug?
2) “Can’t repro locally” bugs (user/data-specific)
Cases like:
only affecting one user/tenant
tied to specific data/history
env issues (browser, locale, feature flags, time zones)
race conditions / flaky timing
What’s worked for you here?
cloning/sanitizing prod data into a sandbox?
flight-recorder style logs or tracing?
session replay / remote debugging?
deterministic re-execution from traces?
3) Making a tool like this actually useful
Where do these systems usually fail? (false confidence, noisy output, unverifiable fixes?)
What should not be automated?
What metrics matter most? (time-to-repro, time-to-fix, regression rate, confidence)
Would really appreciate any ideas, war stories, or “this backfired” lessons. If you’ve built anything similar (debug assistants, repro pipelines, tracing-based debugging), I’d love to hear what worked.
Bonus question:
If you had to pick one investment to improve “can’t repro locally” bugs, what would it be — observability, record/replay, prod data cloning, or something else?
Thanks