Andres Garcia — Payments Case Study

ANDRES GARCIA

SENIOR PRODUCT MANAGER

Deep-Dive Case Study — Payments Platform | $200B+ Annual Volume

Designing a

High-Scale Payments

System at $200B+.

Revenue Growth

Completion, not features

Fraud Reduction

Context-aware scoring

+4%

Approval Rate

Same direction as fraud

99.99%

Platform Uptime

Designed failure behavior

"Payments are trust systems. Clarity and reliability are the product — not the UI."

PaymentsFraud PreventionRisk SystemsFinTech 14 slides | left / right arrow keys

Case Study — Executive Summary

Payments Platform: At a Glance

THE PROBLEM

Payment workflows were fragmented across ACH, Wire, Zelle, and mobile deposit. Transaction friction was high, fraud controls were rigid and undifferentiated, and the system had no feedback mechanism to improve over time. Every failure mode — from false positives to timeouts — was handled inconsistently.

THE COMPLEXITY

Three constraints made this non-trivial: every product decision had direct financial consequence, latency was a hard constraint (risk decisions must happen in milliseconds), and systems must behave correctly under partial failure — because in payments, partial failure means lost funds.

MY ROLE

Owned the full payment product lifecycle across ACH, Wire, Zelle, and mobile deposit. Defined completion rate, fraud rate, and approval rate as connected metrics. Partnered with engineering, risk, and operations to ship high-confidence changes. Made three critical product calls against significant pressure.

THE RESULT

+17% revenue, -12% fraud, +4% approval rate, 99.99% uptime across a $200B+ annual payments platform. Driven by improved decision accuracy at the risk layer, reduced friction in high-volume flows, and better handling of failure and retries — not by adding new features.

Outcome Scorecard

+17%

Revenue

-12%

Fraud

+4%

Approval

99.99%

Uptime

Payments are trust systems. Clarity and reliability are the product — not the UI.

What Makes Payments Non-Trivial

This was not a standard product optimization. Three compounding constraints.

Every Decision Has Direct Financial Consequence

A wrong decision is not a bug — it is lost money or fraud exposure

In payments, a bad decision has an immediate financial consequence — lost funds, fraud exposure, or customer trust damage. Every product requirement for the risk layer had to answer: "What is the financial consequence if this is wrong?"

Latency Is a Hard Product Constraint — Not an Engineering Detail

Better risk models are useless if they slow down transactions

A fraud model that is 5% more accurate but adds 300ms of latency is not a better system — it is a worse one. Model selection, feature engineering, and inference architecture are product decisions, not engineering ones. Latency budgets were defined as explicit product requirements.

Systems Must Be Correct Under Partial Failure

Payments do not fail gracefully — partial failure means financial exposure

If funds leave a source account and the destination is not credited, that is not a graceful failure — that is an operational crisis and potential regulatory exposure. Failure behavior was defined as a first-class product requirement, not an engineering edge case.

End-to-End Payment Lifecycle — What I Owned

The full system — not just initiation to execution.

Every stage required explicit product definition: success criteria, failure behavior, and system consistency across all payment types.

User Initiates

ACH/Wire/Zelle

ENTRY

Auth + Valid.

Identity+session

HIGH RISK

Amount Valid.

Limits+status

RISK

Risk Scoring

Real-time fraud

CRITICAL

Decision Engine

Approve/Decline

CRITICAL

Routing+Exec

ACH/Wire/Zelle

HIGH RISK

Settlement

Funds confirmed

OWNED

Exception Hdlg

Retry/recon

HIGH RISK

Exception and Post-Execution Layer — Reconciliation | Retry Logic | Idempotency | Operations Alerts | Dispute Handling

RISK SCORING

Context-aware: same transaction from trusted device vs. new location = different risk profile. Required product specs — not ML defaults.

DECISION ENGINE

Approve/decline/review thresholds by transaction type and amount band. Dynamic — not one-size-fits-all rules.

EXCEPTION HANDLING

Timeouts, retries, partial execution, idempotency keys, retry limits, reconciliation logic — written as product requirements.

COMPLETION METRICS

Completion rate = north star. An approved transaction that fails silently in execution is worse than one correctly declined.

Where Most PMs Stop Short

Most PMs write user stories for the happy path.

I defined: What the system does when funds leave source and destination fails to credit.

Most PMs set approval rate as the primary success metric.

I defined: Completion rate = initiated AND successfully executed AND settled. Silent failures became visible.

Core Reframe — The Signature Move

Not "is this transaction legitimate?" — What is the risk from this context, right now?

BEFORE — GLOBAL THRESHOLDS

Single fraud threshold applied uniformly. Same risk tolerance for a $500 Zelle to a new payee AND a $50K wire from a 10-year customer. Every threshold change was a blunt instrument with unpredictable cross-type effects.

AFTER — CONTEXT-AWARE RISK SCORING

Risk is not binary. Context drives the decision — device trust, amount tier, customer tenure, payment type, transaction history.

LOW — Previously trusted device + consistent session pattern

Standard approval. Trust reinforced. Zero added friction.

LOW-MED — Amount within established customer band

Expedited processing. No step-up. Transaction history confirms legitimacy.

MEDIUM — First payment to new recipient + high amount

Enhanced verification. Step-up triggered. Friction proportional to risk.

HIGH — Behavioral anomaly + new location + velocity spike

Block or manual review. Fraud team alerted. No money moves.

Result: -12% fraud AND +4% approval simultaneously. These moved in the same direction — not as a tradeoff.

The Impact — Fraud vs. Approval Rate

Context Dimensions in the Scoring Model

Device TrustCustomer TenureAmount TierPayment TypeTransaction HistoryVelocity SignalsLocation Pattern

The Reframe

When you solve for context, fraud and approval rate are no longer competing objectives — they move in the same direction simultaneously.

Critical Product Decisions I Owned — Not Engineering Calls

Three decisions that defined how the system performed.

Each had a technically simpler alternative and required owning the financial consequence — not just the product outcome.

Reduce Fraud Without Increasing Friction

Context-aware over blunt thresholds

PRESSURE: Security team pushed to tighten fraud thresholds globally — uniform risk tolerance across all types and contexts.

Introduced context-aware risk scoring: same threshold logic applied differently based on device trust, transaction history, amount tier, and payment type. Differentiated high-risk from low-risk at signal level, not threshold level.

OUTCOME: -12% fraud rate, +4% approval rate — simultaneously. These moved in the same direction, not as a tradeoff.

Prioritize Transaction Completion Over Feature Expansion

Completion quality before surface expansion

PRESSURE: Roadmap pressure to add new payment types, new surfaces, and new features. Engineering capacity was split.

Focused all product investment on improving completion rate, error recovery, and retry logic before expanding the feature surface. No new payment types shipped until existing flows had measurable completion quality gates.

OUTCOME: +17% revenue — not from new features, from existing flows completing reliably at $200B+ volume.

Design for Failure Before Designing for Scale

Failure behavior is a product requirement — not an edge case

PRESSURE: Engineering priority was scaling infrastructure. Assumption: failure handling could come later.

Blocked scaling work until failure behavior was explicitly defined: timeout handling, retry idempotency, partial execution recovery, reconciliation logic. Wrote product requirements for wrong states before defining how it handles right states.

OUTCOME: 99.99% uptime. Zero duplicate transaction incidents post-implementation. Reconciliation time cut significantly.

Every product decision had a technically simpler alternative. Owning the financial consequence — not just the feature — is what separated these calls.

Production Reality — Where Payments Systems Fail

What breaks — the exact financial impact — and how I addressed it.

CRITICAL

Duplicate Transaction

Uncontrolled retry logic allowed users to re-submit in-flight transactions. System processed both. Funds withdrawn twice with no visibility into the double debit.

Immediate financial loss + irreversible trust damage

Idempotency key requirements enforced. Zero incidents post-implementation.

CRITICAL

Partial Execution

Funds leave source. Timeout prevents destination credit. Transaction in indeterminate state — neither executed nor cleanly failed.

Operational crisis + regulatory exposure + funds in limbo

Explicit partial execution recovery defined as product requirement.

HIGH

False Positive Fraud Block

Rigid global thresholds blocked legitimate high-value transactions. A $50K wire from a 10-year customer treated identically to a new account with no history.

Revenue loss on blocked legitimate transactions + churn risk

-12% fraud AND +4% approval. Context-aware scoring fixed the classification problem.

HIGH

Settlement Timing Mismatch

ACH, Wire, Zelle all have different settlement timelines. System showed uniform processing status regardless of actual state. Users made financial decisions on pending funds.

Support volume spike + financial decisions on incorrect data

Payment-type-specific settlement status messaging. Support volume reduced.

HIGH

Rigid Threshold Cascade

An ACH threshold change inadvertently tightened Zelle and mobile deposit. Each payment type has different behavioral patterns — no one owned the cross-type analysis.

Unintended suppression across unrelated payment types

Payment-type impact analysis required before any risk change shipped.

MEDIUM

Latency-Triggered Abandonment

Risk model inference added 3+ second latency. Users abandoned — not because declined, but because they gave up waiting. Masked as low intent in completion metrics.

Revenue loss from abandoned transactions that would have approved

Latency budgets defined as hard product requirements. Abandonment rate visibly reduced.

Tradeoffs I Navigated — Director-Level Signal

Balancing fraud, risk, and customer experience under constraint.

Fraud Reduction vs. Approval Rate

RISK ANSWER

Tighten fraud thresholds globally. Every point of fraud reduction looks like success on the dashboard — regardless of how many legitimate transactions are blocked.

PRODUCT ANSWER

Dynamic thresholds: context-aware risk scoring that reduces fraud on genuinely high-risk transactions while maintaining approval rates on established patterns. Measured both simultaneously.

Completion Rate vs. Approval Rate

RISK ANSWER

Maximize approval rate. Every approved transaction is a win. Optimize the decision engine to approve more.

PRODUCT ANSWER

Completion rate — not approval rate — as primary north star. An approved transaction that fails silently is worse than a correctly declined one. Completion = initiated AND executed AND settled.

Speed vs. Risk Accuracy

RISK ANSWER

More accurate risk models — more signals, more complexity. Every marginal improvement feels like a win on the model card.

PRODUCT ANSWER

Defined latency budgets as hard product constraints. A 2% more accurate model adding 400ms to every transaction is a net negative product outcome.

Global Consistency vs. Payment-Type Optimization

RISK ANSWER

Optimize each payment type independently. Maximum per-type performance with separate risk models.

PRODUCT ANSWER

Consistent system model across all payment types — same framework, same monitoring. Type-specific tuning within a shared framework, not independent systems.

The risk answer optimizes a metric. The product answer optimizes the system.

System Evolution — What Actually Changed

From fragmented to reliable — the specific transformation.

BEFORE — RISK MODEL

Rigid, undifferentiated thresholds. Same fraud rules applied to all transaction types, amounts, and customers. Context-blind.

→

AFTER — CONTEXT-AWARE SCORING

Device trust, customer tenure, amount tier, payment type — differentiated signals for differentiated decisions.

BEFORE — SUCCESS METRIC

Approval rate was the north star. Silent execution failures went undetected. Approved did not equal completed.

→

AFTER — COMPLETION RATE AS NORTH STAR

Transaction tracked through execution and settlement. Silent failures caught, measured, and remediated.

BEFORE — FAILURE DESIGN

Retry logic was inconsistent. Timeouts handled differently across payment types. Partial execution equaled crisis.

→

AFTER — EXPLICIT FAILURE BEHAVIOR

Idempotency keys, retry limits, partial execution recovery — written as product requirements before code was written.

BEFORE — LEARNING

Risk model never updated based on outcomes. Threshold changes were manual, infrequent, cross-type effects unknown.

→

AFTER — DATA-DRIVEN GOVERNANCE

Completion, fraud, and approval metrics drive roadmap priorities. Threshold changes require impact analysis across all payment types.

BEFORE — NORTH STAR

Success = shipping features. Roadmap measured by new capabilities — not completion quality of existing high-volume flows.

→

AFTER — RELIABLE COMPLETION

Every roadmap decision tied to completion rate, fraud rate, or approval rate improvement at $200B+ volume.

Measured Impact — What Changed Because of This Work

What the numbers mean — and what drove them.

+17%

Revenue Growth

Not features. Completion.

-12%

Fraud Rate

Context-aware scoring

+4%

Approval Rate

Same direction as fraud

99.99%

Uptime

Designed failure in

All Four Metrics — Pre vs. Post Timeline

The +17% revenue and -12% fraud moved in the same direction. That is only possible when the product decision is context-aware, not blunt.

What Each Number Means

+17% Revenue Growth

Not from new features — from existing flows completing reliably. At $200B+ volume, even fractional completion improvements translate to enormous revenue. This came from visibility into silent failures that approval rate had hidden for months.

-12% Fraud Rate

Context-aware risk scoring reduced fraud on genuinely high-risk transactions without tightening globally. The model learned to differentiate, not just block. Fraud reduction and approval improvement are not competing objectives when the scoring is context-aware.

+4% Approval Rate — Same Direction as Fraud

Dynamic thresholds stopped blocking legitimate transactions from established customers. This is only possible when product decisions are context-aware, not blunt.

Zero Duplicate Transactions — 99.99% Uptime

Idempotency key requirements eliminated duplicate transactions post-implementation. A failure mode accepted as occasional became non-existent. Completion rate tracking caught execution failures that approval rate metrics had hidden.

Where I Changed the Outcome

What would have been different without my specific involvement.

WITHOUT MY DECISION

I defined completion rate before approval rate as the north star

The system would have continued optimizing approval rate. Silent execution failures would have gone undetected. At $200B+ volume, even a small silent failure rate represents significant financial exposure — going unmeasured for months.

WITH MY DECISION

Completion rate tracking caught the full lifecycle. Operations gained visibility into execution failures for the first time. These failures became measurable and therefore fixable. The +17% revenue came from this visibility.

WITHOUT MY DECISION

I blocked scaling work until failure behavior was defined

Infrastructure scaling would have proceeded before idempotency keys, retry limits, and partial execution recovery were defined. Scaling an undefined failure behavior scales the failures too — at higher volume, proportionally more incidents.

WITH MY DECISION

Zero duplicate transaction incidents. 99.99% uptime. Failure handling was designed in — not bolted on after the first production incident.

WITHOUT MY DECISION

I held context-aware scoring against global threshold pressure

Security team global threshold tightening would have been implemented. Fraud rate would have improved but approval rate would have declined. Revenue impact from blocked legitimate transactions would have partially offset the fraud reduction benefit.

WITH MY DECISION

-12% fraud AND +4% approval simultaneously. Context-aware scoring made these non-competing objectives. The revenue math was dramatically better than the blunt alternative.

WITHOUT MY DECISION

I defined payment-type impact analysis as a release requirement

Threshold changes on one payment type would have continued causing unintended ripple effects. Each fix created new problems. The system was optimized locally without being reasoned about holistically.

WITH MY DECISION

Threshold changes became predictable. Full impact analysis across ACH, Wire, Zelle before any risk change shipped. +4% approval rate improvement across all payment types.

What This Case Demonstrates

Five capabilities — proven at $200B+ annual volume.

Operate Payments Systems at Massive Scale ($200B+)

Full lifecycle ownership: initiation to execution to settlement to exception handling. All four payment types. One coherent system model.

Proof: Full lifecycle ownership across ACH, Wire, Zelle, mobile deposit — success criteria, failure behavior, system consistency.

Balance Fraud, Risk, and Customer Experience

Made fraud and approval rate move in the same direction — not as a tradeoff. Context-aware risk scoring is a product decision, not a model parameter.

Proof: -12% fraud AND +4% approval simultaneously. The product answer, not the risk answer.

Design Systems That Behave Correctly Under Failure

Defined partial execution, timeout handling, idempotency, and retry logic as first-class product requirements. At $200B+ volume, the failure behavior is the product.

Proof: Zero duplicate transaction incidents. 99.99% uptime. Failure designed in — not bolted on.

Make High-Stakes Decisions with Direct Financial Impact

Every product call had an immediate financial consequence. Held completion over approval rate, context scoring over global thresholds, failure definition over scaling — against real pressure.

Proof: Three decisions held against significant organizational pressure. All three proved out in production.

Align Engineering, Risk, and Operations in Regulated Environments

Payments require cross-functional alignment where risk, compliance, engineering, and operations all have legitimate veto power. The PM role is to hold the system model across all of them.

Proof: One shared system model. Payment-type impact analysis as a mandatory release gate.

Program Impact — All Metrics Visualized

Context-aware scoring: both metrics moving in the same direction.

Fraud Rate vs. Approval Rate — Context-Aware vs. Blunt

Completion Rate Improvement — Quarterly

Risk Decision Distribution — After Context-Aware

Revenue Waterfall — Where the +17% Came From

Payments Case Study — $200B+ Annual Volume

I build systems where money moves reliably —

even when they fail.

+17%

Revenue Growth

From completion quality — not new features.

-12% / +4%

Fraud Down. Approval Up.

Same direction. Context-aware scoring.

99.99%

Uptime + Zero Duplicates

Failure behavior designed before scaling.

Payments product management is not about the flow. It is about designing systems that behave correctly when the flow breaks.

ANDRES GARCIA

SENIOR PRODUCT MANAGER

andres.garcia.product@gmail.com | linkedin.com/in/andygarcia23

Full Portfolio | Thinkorswim Deep-Dive | AI Deep-Dive | TDV Deep-Dive