API Latency: Why p95 Matters More Than Average

one-line definition

API latency is the time between sending a request to an API and receiving the response — it directly affects how fast your product feels to users.

formula: API Latency = Time of response received − Time of request sent. Measured in milliseconds (ms). Track p50 (median), p95, and p99 percentiles.

tl;dr

Users do not think about latency — they think about whether your app feels fast or slow. Every 100ms of API latency above 200ms erodes trust in your product. Measure p95 latency, not averages. Your average can look great while 5% of users have a terrible experience.

Simple definition

API latency is the round-trip time for a request: your app asks the server for something, the server processes it, and sends a response. That total time — measured in milliseconds — is the latency. It includes network transit, server processing, database queries, and any external API calls your server makes. For solo founders, latency matters because it directly shapes the user experience. A dashboard that loads in 150ms feels instant. The same dashboard at 2 seconds feels broken. Users will not tell you "your API is slow" — they will just stop using your product.

Why this matters

API Latency is a critical metric for bootstrapped founders because it represents the truth about your business. Before product-market fit, this metric may feel abstract. But once you have paying customers and recurring revenue, ignoring this metric becomes dangerous to your growth trajectory.

Most solo founders make the mistake of focusing on the wrong metric at the wrong time. Before $1k MRR, the best metrics are activation and product-market fit. Between $1k-$10k MRR, api latency becomes highly relevant. Beyond $10k MRR, it becomes one of your top three growth levers.

The reason solo founders rarely fail due to lack of brilliant ideas. They fail because they don't systematically measure metrics that matter and don't iterate on improvements.

Common mistakes

1. Calculating too early. If you have 5 customers, this metric is noise, not signal. Wait until you have at least 50 customers and 2-3 months of data before drawing conclusions. Too early and you'll see random variance, not real patterns.

2. Ignoring variations by segment. Your customers acquired via blog may behave differently than those acquired via paid ads. Your enterprise customers may function differently than your small-biz customers. Always segment your metrics to see the true signal.

3. Optimizing without context. Improving this metric by 10% means 10% more revenue? Not necessarily. Understand upstream and downstream impact before optimizing. Focus on the change that will have the biggest impact on revenue.

4. Forgetting causality flows both directions. A low metric may indicate a product issue, a positioning issue, or that you're attracting the wrong customers. Before optimizing, understand why it's low.

How to act on this

Calculate this metric for your last 30 customers right now. Do you have the data? If yes, establish a baseline and write it down. That's your first step toward improvement.

Identify your highest-value customer segment. Is it a specific monthly cohort? An acquisition channel? A customer type? Focus on that segment and try to improve this metric for them.

Run one small experiment to improve this metric by 5-10%. Measure, learn, iterate. The compounding of these small improvements over 12 months creates a huge difference.

How to calculate it

API Latency = Response timestamp − Request timestamp

Measure it at three levels:

p50 (median): Half of requests are faster than this. Your typical user experience.
p95: 95% of requests are faster. Your baseline for "fast enough."
p99: 99% are faster. Your worst-case regular experience.

Use tools like Vercel Analytics, Datadog, or even a simple middleware that logs response times. Track latency per endpoint — your /api/users might be 50ms while /api/reports is 800ms. Fix the slow ones first.

Example

You build a project management tool. Your main dashboard API call returns in 180ms on average. Feels fine in testing. But your p95 is 1,400ms — meaning 1 in 20 page loads takes over a second. You dig in and find the slow requests happen when users have 200+ tasks. The database query scans the full table instead of using an index. You add a composite index on (user_id, status, created_at). The p95 drops from 1,400ms to 220ms. Average barely changes — it was already fine. But the users who were having the worst experience now have a dramatically better one, and your 7-day retention in that cohort improves by 12%.

Related terms

Time to Value
Session Duration
Activation Rate

FAQ

What is acceptable API latency for a SaaS product?+

Under 200ms for user-facing actions (loading data, saving changes). Under 500ms for complex operations (search, report generation). Above 1 second and users start noticing. Above 3 seconds and they start leaving. Always measure p95, not just average — your slowest users have the worst experience.

How do I reduce API latency on a budget?+

Start with the biggest wins: add database indexes for slow queries, implement response caching for data that doesn't change every request, use a CDN for static assets, and move your server closer to your users geographically. These four changes alone can cut latency by 50-80% for most bootstrapped apps.

API Latency: Why p95 Matters More Than Average

Simple definition

Why this matters

Common mistakes

How to act on this

How to calculate it

Example

Related terms

FAQ

Put the concept to work

Related terms

Weekly builds, experiments, and growth playbooks

Simple definition

Why this matters

Common mistakes

How to act on this

How to calculate it

Example

Related reading

Related terms

FAQ

Put the concept to work

Related terms

Weekly builds, experiments, and growth playbooks