All articles Product

Why We Replaced Star Ratings with Dimension-Specific Feedback

Marta Fonseca
Abstract feedback and rating concept for a home services marketplace

For the first several months of OSCAR, we asked homeowners one question after each job: rate your experience from 1 to 5 stars. It was the obvious thing to do. Every marketplace does it. It is what people expect to see.

The problem is that the data was nearly useless. Our network-wide average sat at 4.6 stars and barely moved between professionals. We had a plumber who arrived 40 minutes late to every third job averaging 4.5. We had another who cleaned up meticulously and documented every step averaging 4.7. Those two numbers look almost identical. They should not. One professional has a punctuality problem. The other has a documentation practice worth highlighting to new professionals.

A single-score rating conflates everything the homeowner felt about the job into one number — and most people, when they have had a broadly okay experience, give a 4 or a 5. The number ends up reflecting general satisfaction rather than specific performance. It tells you almost nothing actionable.

What We Changed and Why

In Q3 2025, we replaced the single-score post-job review with three dimension-specific questions. Each is answered with a simple yes/no or a thumbs-up/thumbs-down, with an optional short text note:

  • Punctuality: Did the professional arrive within the booked arrival window? (We define the window as ±20 minutes from the confirmed start time.)
  • Price accuracy: Did the final price match the quote you accepted when booking?
  • Work quality: Is the problem solved? Would you book this professional again?

We also kept a single optional 1–5 overall rating for homeowners who want to give a summary impression, but it is not the primary signal we use for professional performance management. The three dimensions are.

The reason for yes/no rather than a 1–5 scale on each dimension is cognitive friction. When you ask someone to rate punctuality on a five-point scale, they start wondering what a 3 means versus a 4. When you ask "did they arrive on time, yes or no," the answer is binary and honest. Response rates went up when we simplified the format — from 58% of completed jobs receiving a review to 74%.

What the Dimension Data Revealed

Once we had a few hundred jobs with dimension-specific data, patterns appeared that the single score had hidden.

Punctuality is our lowest-scoring dimension network-wide, with a 91% on-time rate. Price accuracy is our highest, at 97%. Work quality — the broadest and most subjective question — sits at 94%. Those numbers tell a story: our fixed-price model is working (homeowners are not surprised by the final price), our professionals do good work (problem-solved rate is high), but travel scheduling still produces lateness in about one in eleven jobs.

At the individual professional level, dimension scoring separates clearly. One of our most experienced plumbers has a 98% work quality score but an 81% punctuality score. He is technically excellent and chronically late. Before dimension scoring, his 4.7 overall average obscured that. Now we can have a direct conversation about the punctuality pattern — and more importantly, our scheduling system can apply a conservative buffer to his arrival estimates so homeowners are given a more accurate expected window.

How Ratings Feed Back into the System

Dimension scores are used in three ways operationally.

First, they affect scheduling priority. Professionals with higher punctuality scores are preferentially matched to time-sensitive bookings — a homeowner who has specified a narrow arrival window, or a job that needs to be completed before another tradesperson arrives for a follow-on task.

Second, they trigger review processes. A professional who drops below 85% on any single dimension over a rolling 20-job window gets flagged for a conversation with our operations team. We do not automatically suspend anyone — the data can reflect an unusual cluster of difficult circumstances, and we want to understand the pattern before acting on it. But we do not ignore persistent dips either.

Third, they are surfaced to homeowners at the point of booking. When a homeowner sees a professional profile, they see the three dimension percentages alongside the optional overall rating. That means a homeowner who knows they have a strict arrival window can make an informed choice.

What Dimension Ratings Do Not Solve

We want to be clear about the limitations of this system. Dimension-specific ratings are better than a single score, but they are still self-reported homeowner perceptions, and those perceptions are subject to halo effects and recency bias like any subjective rating. A job that ends with a minor dispute — even if the underlying work was excellent — tends to depress all three dimension scores simultaneously. A job that ends with a homeowner feeling particularly grateful tends to inflate all three.

We do not claim our dimension scores are objective measurements of professional performance. They are structured homeowner feedback with better signal-to-noise than single-score ratings. The difference matters when making decisions about individual professionals, and we treat dimension data as one input among several — alongside job documentation quality, scope adherence records, and callback rates — rather than as the sole truth.

We are also aware that professional-side feedback matters. Professionals can currently flag homeowners who were absent at the confirmed start time, who described the job inaccurately, or who made the job materially more difficult than stated. We do not publish these flags publicly, but they inform how we handle repeat booking requests and dispute cases. A marketplace rating system that only flows one direction is incomplete.

What We Are Still Building

The dimension scoring system launched in its current form in August 2025. We have been running it for about six weeks at the time of writing, and the data set is still relatively small. Some things we want to add: a clearer display of dimension history on professional profiles, a mechanism for professionals to add a short response to dimension feedback they believe is inaccurate, and better tooling for our team to spot unusual rating clusters that suggest gaming or coordinated negative reviews.

The goal is not a perfect rating system — that does not exist. The goal is a rating system that generates data specific enough to be actionable, honest about what it measures, and fair to both homeowners and professionals. We think we are closer to that now than we were six months ago.