Payments Platform Case Study
Cover
Navigate ― click a slide or press its number
Esc  or  J  to close
navigate
J jump panel
F fullscreen
HomeEnd first/last
ANDRES GARCIA
SENIOR PRODUCT MANAGER
Deep-Dive Case Study — Payments Platform | $200B+ Annual Volume
Designing a
High-Scale Payments
System at $200B+.
0%
Revenue Growth
Completion, not features
0%
Fraud Reduction
Context-aware scoring
+4%
Approval Rate
Same direction as fraud
99.99%
Platform Uptime
Designed failure behavior
"Payments are trust systems. Clarity and reliability are the product — not the UI."
PaymentsFraud PreventionRisk SystemsFinTech 14 slides | left / right arrow keys
Case Study — Executive Summary
Payments Platform: At a Glance
THE PROBLEM
Payment workflows were fragmented across ACH, Wire, Zelle, and mobile deposit. Transaction friction was high, fraud controls were rigid and undifferentiated, and the system had no feedback mechanism to improve over time. Every failure mode — from false positives to timeouts — was handled inconsistently.
THE COMPLEXITY
Three constraints made this non-trivial: every product decision had direct financial consequence, latency was a hard constraint (risk decisions must happen in milliseconds), and systems must behave correctly under partial failure — because in payments, partial failure means lost funds.
MY ROLE
Owned the full payment product lifecycle across ACH, Wire, Zelle, and mobile deposit. Defined completion rate, fraud rate, and approval rate as connected metrics. Partnered with engineering, risk, and operations to ship high-confidence changes. Made three critical product calls against significant pressure.
THE RESULT
+17% revenue, -12% fraud, +4% approval rate, 99.99% uptime across a $200B+ annual payments platform. Driven by improved decision accuracy at the risk layer, reduced friction in high-volume flows, and better handling of failure and retries — not by adding new features.
Outcome Scorecard
+17%
Revenue
-12%
Fraud
+4%
Approval
99.99%
Uptime
Payments are trust systems. Clarity and reliability are the product — not the UI.
What Makes Payments Non-Trivial
This was not a standard product optimization. Three compounding constraints.
1
Every Decision Has Direct Financial Consequence
A wrong decision is not a bug — it is lost money or fraud exposure
In payments, a bad decision has an immediate financial consequence — lost funds, fraud exposure, or customer trust damage. Every product requirement for the risk layer had to answer: "What is the financial consequence if this is wrong?"
2
Latency Is a Hard Product Constraint — Not an Engineering Detail
Better risk models are useless if they slow down transactions
A fraud model that is 5% more accurate but adds 300ms of latency is not a better system — it is a worse one. Model selection, feature engineering, and inference architecture are product decisions, not engineering ones. Latency budgets were defined as explicit product requirements.
3
Systems Must Be Correct Under Partial Failure
Payments do not fail gracefully — partial failure means financial exposure
If funds leave a source account and the destination is not credited, that is not a graceful failure — that is an operational crisis and potential regulatory exposure. Failure behavior was defined as a first-class product requirement, not an engineering edge case.
End-to-End Payment Lifecycle — What I Owned
The full system — not just initiation to execution.
Every stage required explicit product definition: success criteria, failure behavior, and system consistency across all payment types.
User Initiates
ACH/Wire/Zelle
ENTRY
->
Auth + Valid.
Identity+session
HIGH RISK
->
Amount Valid.
Limits+status
RISK
->
Risk Scoring
Real-time fraud
CRITICAL
->
Decision Engine
Approve/Decline
CRITICAL
->
Routing+Exec
ACH/Wire/Zelle
HIGH RISK
->
Settlement
Funds confirmed
OWNED
->
Exception Hdlg
Retry/recon
HIGH RISK
Exception and Post-Execution Layer — Reconciliation | Retry Logic | Idempotency | Operations Alerts | Dispute Handling
RISK SCORING
Context-aware: same transaction from trusted device vs. new location = different risk profile. Required product specs — not ML defaults.
DECISION ENGINE
Approve/decline/review thresholds by transaction type and amount band. Dynamic — not one-size-fits-all rules.
EXCEPTION HANDLING
Timeouts, retries, partial execution, idempotency keys, retry limits, reconciliation logic — written as product requirements.
COMPLETION METRICS
Completion rate = north star. An approved transaction that fails silently in execution is worse than one correctly declined.
Where Most PMs Stop Short
Most PMs write user stories for the happy path.
I defined: What the system does when funds leave source and destination fails to credit.
Most PMs set approval rate as the primary success metric.
I defined: Completion rate = initiated AND successfully executed AND settled. Silent failures became visible.
Core Reframe — The Signature Move
Not "is this transaction legitimate?" — What is the risk from this context, right now?
BEFORE — GLOBAL THRESHOLDS
Single fraud threshold applied uniformly. Same risk tolerance for a $500 Zelle to a new payee AND a $50K wire from a 10-year customer. Every threshold change was a blunt instrument with unpredictable cross-type effects.
AFTER — CONTEXT-AWARE RISK SCORING
Risk is not binary. Context drives the decision — device trust, amount tier, customer tenure, payment type, transaction history.
LOW — Previously trusted device + consistent session pattern
Standard approval. Trust reinforced. Zero added friction.
LOW-MED — Amount within established customer band
Expedited processing. No step-up. Transaction history confirms legitimacy.
MEDIUM — First payment to new recipient + high amount
Enhanced verification. Step-up triggered. Friction proportional to risk.
HIGH — Behavioral anomaly + new location + velocity spike
Block or manual review. Fraud team alerted. No money moves.
Result: -12% fraud AND +4% approval simultaneously. These moved in the same direction — not as a tradeoff.
The Impact — Fraud vs. Approval Rate
Context Dimensions in the Scoring Model
Device TrustCustomer TenureAmount TierPayment TypeTransaction HistoryVelocity SignalsLocation Pattern
The Reframe
When you solve for context, fraud and approval rate are no longer competing objectives — they move in the same direction simultaneously.
Critical Product Decisions I Owned — Not Engineering Calls
Three decisions that defined how the system performed.
Each had a technically simpler alternative and required owning the financial consequence — not just the product outcome.
1
Reduce Fraud Without Increasing Friction
Context-aware over blunt thresholds
PRESSURE: Security team pushed to tighten fraud thresholds globally — uniform risk tolerance across all types and contexts.
Introduced context-aware risk scoring: same threshold logic applied differently based on device trust, transaction history, amount tier, and payment type. Differentiated high-risk from low-risk at signal level, not threshold level.
OUTCOME: -12% fraud rate, +4% approval rate — simultaneously. These moved in the same direction, not as a tradeoff.
2
Prioritize Transaction Completion Over Feature Expansion
Completion quality before surface expansion
PRESSURE: Roadmap pressure to add new payment types, new surfaces, and new features. Engineering capacity was split.
Focused all product investment on improving completion rate, error recovery, and retry logic before expanding the feature surface. No new payment types shipped until existing flows had measurable completion quality gates.
OUTCOME: +17% revenue — not from new features, from existing flows completing reliably at $200B+ volume.
3
Design for Failure Before Designing for Scale
Failure behavior is a product requirement — not an edge case
PRESSURE: Engineering priority was scaling infrastructure. Assumption: failure handling could come later.
Blocked scaling work until failure behavior was explicitly defined: timeout handling, retry idempotency, partial execution recovery, reconciliation logic. Wrote product requirements for wrong states before defining how it handles right states.
OUTCOME: 99.99% uptime. Zero duplicate transaction incidents post-implementation. Reconciliation time cut significantly.
Every product decision had a technically simpler alternative. Owning the financial consequence — not just the feature — is what separated these calls.
Production Reality — Where Payments Systems Fail
What breaks — the exact financial impact — and how I addressed it.
CRITICAL
Duplicate Transaction
Uncontrolled retry logic allowed users to re-submit in-flight transactions. System processed both. Funds withdrawn twice with no visibility into the double debit.
Immediate financial loss + irreversible trust damage
Idempotency key requirements enforced. Zero incidents post-implementation.
CRITICAL
Partial Execution
Funds leave source. Timeout prevents destination credit. Transaction in indeterminate state — neither executed nor cleanly failed.
Operational crisis + regulatory exposure + funds in limbo
Explicit partial execution recovery defined as product requirement.
HIGH
False Positive Fraud Block
Rigid global thresholds blocked legitimate high-value transactions. A $50K wire from a 10-year customer treated identically to a new account with no history.
Revenue loss on blocked legitimate transactions + churn risk
-12% fraud AND +4% approval. Context-aware scoring fixed the classification problem.
HIGH
Settlement Timing Mismatch
ACH, Wire, Zelle all have different settlement timelines. System showed uniform processing status regardless of actual state. Users made financial decisions on pending funds.
Support volume spike + financial decisions on incorrect data
Payment-type-specific settlement status messaging. Support volume reduced.
HIGH
Rigid Threshold Cascade
An ACH threshold change inadvertently tightened Zelle and mobile deposit. Each payment type has different behavioral patterns — no one owned the cross-type analysis.
Unintended suppression across unrelated payment types
Payment-type impact analysis required before any risk change shipped.
MEDIUM
Latency-Triggered Abandonment
Risk model inference added 3+ second latency. Users abandoned — not because declined, but because they gave up waiting. Masked as low intent in completion metrics.
Revenue loss from abandoned transactions that would have approved
Latency budgets defined as hard product requirements. Abandonment rate visibly reduced.
Tradeoffs I Navigated — Director-Level Signal
Balancing fraud, risk, and customer experience under constraint.
Fraud Reduction vs. Approval Rate
RISK ANSWER
Tighten fraud thresholds globally. Every point of fraud reduction looks like success on the dashboard — regardless of how many legitimate transactions are blocked.
PRODUCT ANSWER
Dynamic thresholds: context-aware risk scoring that reduces fraud on genuinely high-risk transactions while maintaining approval rates on established patterns. Measured both simultaneously.
Completion Rate vs. Approval Rate
RISK ANSWER
Maximize approval rate. Every approved transaction is a win. Optimize the decision engine to approve more.
PRODUCT ANSWER
Completion rate — not approval rate — as primary north star. An approved transaction that fails silently is worse than a correctly declined one. Completion = initiated AND executed AND settled.
Speed vs. Risk Accuracy
RISK ANSWER
More accurate risk models — more signals, more complexity. Every marginal improvement feels like a win on the model card.
PRODUCT ANSWER
Defined latency budgets as hard product constraints. A 2% more accurate model adding 400ms to every transaction is a net negative product outcome.
Global Consistency vs. Payment-Type Optimization
RISK ANSWER
Optimize each payment type independently. Maximum per-type performance with separate risk models.
PRODUCT ANSWER
Consistent system model across all payment types — same framework, same monitoring. Type-specific tuning within a shared framework, not independent systems.
The risk answer optimizes a metric. The product answer optimizes the system.
System Evolution — What Actually Changed
From fragmented to reliable — the specific transformation.
BEFORE — RISK MODEL
Rigid, undifferentiated thresholds. Same fraud rules applied to all transaction types, amounts, and customers. Context-blind.
AFTER — CONTEXT-AWARE SCORING
Device trust, customer tenure, amount tier, payment type — differentiated signals for differentiated decisions.
BEFORE — SUCCESS METRIC
Approval rate was the north star. Silent execution failures went undetected. Approved did not equal completed.
AFTER — COMPLETION RATE AS NORTH STAR
Transaction tracked through execution and settlement. Silent failures caught, measured, and remediated.
BEFORE — FAILURE DESIGN
Retry logic was inconsistent. Timeouts handled differently across payment types. Partial execution equaled crisis.
AFTER — EXPLICIT FAILURE BEHAVIOR
Idempotency keys, retry limits, partial execution recovery — written as product requirements before code was written.
BEFORE — LEARNING
Risk model never updated based on outcomes. Threshold changes were manual, infrequent, cross-type effects unknown.
AFTER — DATA-DRIVEN GOVERNANCE
Completion, fraud, and approval metrics drive roadmap priorities. Threshold changes require impact analysis across all payment types.
BEFORE — NORTH STAR
Success = shipping features. Roadmap measured by new capabilities — not completion quality of existing high-volume flows.
AFTER — RELIABLE COMPLETION
Every roadmap decision tied to completion rate, fraud rate, or approval rate improvement at $200B+ volume.
Measured Impact — What Changed Because of This Work
What the numbers mean — and what drove them.
+17%
Revenue Growth
Not features. Completion.
-12%
Fraud Rate
Context-aware scoring
+4%
Approval Rate
Same direction as fraud
99.99%
Uptime
Designed failure in
All Four Metrics — Pre vs. Post Timeline
The +17% revenue and -12% fraud moved in the same direction. That is only possible when the product decision is context-aware, not blunt.
What Each Number Means
+17% Revenue Growth
Not from new features — from existing flows completing reliably. At $200B+ volume, even fractional completion improvements translate to enormous revenue. This came from visibility into silent failures that approval rate had hidden for months.
-12% Fraud Rate
Context-aware risk scoring reduced fraud on genuinely high-risk transactions without tightening globally. The model learned to differentiate, not just block. Fraud reduction and approval improvement are not competing objectives when the scoring is context-aware.
+4% Approval Rate — Same Direction as Fraud
Dynamic thresholds stopped blocking legitimate transactions from established customers. This is only possible when product decisions are context-aware, not blunt.
Zero Duplicate Transactions — 99.99% Uptime
Idempotency key requirements eliminated duplicate transactions post-implementation. A failure mode accepted as occasional became non-existent. Completion rate tracking caught execution failures that approval rate metrics had hidden.
Where I Changed the Outcome
What would have been different without my specific involvement.
WITHOUT MY DECISION
I defined completion rate before approval rate as the north star
The system would have continued optimizing approval rate. Silent execution failures would have gone undetected. At $200B+ volume, even a small silent failure rate represents significant financial exposure — going unmeasured for months.
WITH MY DECISION
Completion rate tracking caught the full lifecycle. Operations gained visibility into execution failures for the first time. These failures became measurable and therefore fixable. The +17% revenue came from this visibility.
WITHOUT MY DECISION
I blocked scaling work until failure behavior was defined
Infrastructure scaling would have proceeded before idempotency keys, retry limits, and partial execution recovery were defined. Scaling an undefined failure behavior scales the failures too — at higher volume, proportionally more incidents.
WITH MY DECISION
Zero duplicate transaction incidents. 99.99% uptime. Failure handling was designed in — not bolted on after the first production incident.
WITHOUT MY DECISION
I held context-aware scoring against global threshold pressure
Security team global threshold tightening would have been implemented. Fraud rate would have improved but approval rate would have declined. Revenue impact from blocked legitimate transactions would have partially offset the fraud reduction benefit.
WITH MY DECISION
-12% fraud AND +4% approval simultaneously. Context-aware scoring made these non-competing objectives. The revenue math was dramatically better than the blunt alternative.
WITHOUT MY DECISION
I defined payment-type impact analysis as a release requirement
Threshold changes on one payment type would have continued causing unintended ripple effects. Each fix created new problems. The system was optimized locally without being reasoned about holistically.
WITH MY DECISION
Threshold changes became predictable. Full impact analysis across ACH, Wire, Zelle before any risk change shipped. +4% approval rate improvement across all payment types.
What This Case Demonstrates
Five capabilities — proven at $200B+ annual volume.
1
Operate Payments Systems at Massive Scale ($200B+)
Full lifecycle ownership: initiation to execution to settlement to exception handling. All four payment types. One coherent system model.
Proof: Full lifecycle ownership across ACH, Wire, Zelle, mobile deposit — success criteria, failure behavior, system consistency.
2
Balance Fraud, Risk, and Customer Experience
Made fraud and approval rate move in the same direction — not as a tradeoff. Context-aware risk scoring is a product decision, not a model parameter.
Proof: -12% fraud AND +4% approval simultaneously. The product answer, not the risk answer.
3
Design Systems That Behave Correctly Under Failure
Defined partial execution, timeout handling, idempotency, and retry logic as first-class product requirements. At $200B+ volume, the failure behavior is the product.
Proof: Zero duplicate transaction incidents. 99.99% uptime. Failure designed in — not bolted on.
4
Make High-Stakes Decisions with Direct Financial Impact
Every product call had an immediate financial consequence. Held completion over approval rate, context scoring over global thresholds, failure definition over scaling — against real pressure.
Proof: Three decisions held against significant organizational pressure. All three proved out in production.
5
Align Engineering, Risk, and Operations in Regulated Environments
Payments require cross-functional alignment where risk, compliance, engineering, and operations all have legitimate veto power. The PM role is to hold the system model across all of them.
Proof: One shared system model. Payment-type impact analysis as a mandatory release gate.
Program Impact — All Metrics Visualized
Context-aware scoring: both metrics moving in the same direction.
Fraud Rate vs. Approval Rate — Context-Aware vs. Blunt
Completion Rate Improvement — Quarterly
Risk Decision Distribution — After Context-Aware
Revenue Waterfall — Where the +17% Came From
Payments Case Study — $200B+ Annual Volume
I build systems where money moves reliably —
even when they fail.
+17%
Revenue Growth
From completion quality — not new features.
-12% / +4%
Fraud Down. Approval Up.
Same direction. Context-aware scoring.
99.99%
Uptime + Zero Duplicates
Failure behavior designed before scaling.
Payments product management is not about the flow. It is about designing systems that behave correctly when the flow breaks.
ANDRES GARCIA
SENIOR PRODUCT MANAGER
andres.garcia.product@gmail.com | linkedin.com/in/andygarcia23
Full Portfolio | Thinkorswim Deep-Dive | AI Deep-Dive | TDV Deep-Dive