Skip to main content

Reimagining Cupping Competition Scoring, Part 1: The Problem

Photo by Nick Brown/Daily Coffee News

Since 1987, there has been a cupping competition at the annual Kona Coffee Cultural Festival. In this competition, 50 to 80 farms compete to be recognized as the best in Kona. Like other coffee competitions, a handful of judges cup through the coffees a number of times to determine the winner.

Over the years, the scoring system used for the competition has evolved and has recently settled on the fairly commonplace 100 point, Specialty Coffee Association scoring system. While the SCA scoring system is familiar and revered by many in the specialty coffee industry, it is imperfect.

I want to be clear, there is no perfect scoring system for a competition — this rings true for any product, not just coffee. They all have two major imperfections.

  1. No system defines what a winning item would taste like. This lack of definition prevents judges from scoring accurately as they have no standard of excellence to compare to. Moreover, competitors never know what to submit if they don’t know which taste is actually competitive. Without a definition of quality, every competition score is based somewhat on the whims of the judges.
  2. Scoring in general, including in competitions, is very subjective. When the preferences and expectations of individual judges influences the scoring, the composition of judges can be more important than the items being judged.

The value of a 100 point scoring system for a competition is that it is a familiar concept to people and, more importantly, it is very easy to figure out who wins.

For the upcoming festival in November, Daylight Mind Coffee Company was asked to host and run the cupping competition. While this is exciting in and of itself, it affords us the opportunity to try to invent an improved competition scoring system. We hope to address the flaws inherent to 100 points systems and offer something that is objective and locally adaptable to every competition.

As we move along in this adventure, we’ll be sharing our methodology and findings through a series of posts. We hope that readers here will be eager and willing to offer us feedback and criticism to help us improve the design. We aspire to transparency every step of the way, both to our peers and to the competition participants. We know we won’t get it perfect the first time, but if we don’t start, we’ll never finish.

Feel free to comment with your thoughts below, and stay tuned for Part 2: “Defining the Winning Coffee.”

Comment

13 Comments

Dixon Ip

In my opinion, the scoring system is just a quantitative common language. I strongly agree who’s on the judge panel matters a lot. With that said, I suppose all coffee competitions invite not only qualified judge but well calibrated judges.

Kevin Knox

This is excellent Shawn – and timely.

Obviously as an old-timer in the trade I’ve used numerical cupping forms regularly but I’ve always found descriptive notes much more useful both for buying and as a source for consumer marketing.

As with so many other things I’ve also found it very useful to stay on top of the use and abuse of numerical scoring in the wine trade, which remains far ahead of coffee in describing its products to consumers (sadly there is no Robert Parker or Steven Tanzer for coffee, though I’ve tried unsuccessfully to follow in their footsteps from time to time).

The famous importer (and wonderful writer) Kermit Lynch has been railing against both numerical ratings and the sheer idiocy of tasting wines solo rather than alongside the cuisines they are meant to compliment for years – admittedly to no avail from a commercial point of view.

Consumers are pressed for time and addicted to sub-bumper sticker length forms of (dis)information. “90+ point coffees” is just as easy a sell as “I only buy organic/fair trade/direct trade.”

In. terms of coffee competitions, one of the objections I have raised to Cup of Excellence in particular is that the jurors selected as a rule – not the exception! – have diametrically opposed criteria for excellence. I’ll never forget being on one memorable panel in Guatemala with top Japanese and Italian cuppers for whom high acidity was virtually a defect while Jim Reynolds (from Peet’s) and I were looking for just the opposite characteristics given our intended uses for the coffee. One comment Jim made at that time has stuck with me ever since: “there’s a big difference between being a coffee taster and a coffee BUYER.”

It seems to me that really the only even semi-viable model we have for tastings as a price and quality discovery mechanism are the Kenya auctions of old (back when essentially all of the country’s production was sold through them). Events like COE and the program in Hawaii you’re describing are marketing programs pure and simple, not price or discovery tools. With COE the high price per pound paid for what are at the end of the day commercially meaningless quantities of coffee (except perhaps to the tiniest of microroasters) creates media headlines but it also can seriously distort farmer expectations of what full containers of coffee ought to sell for, while also falsely conveying to consumers the notion that a $10-100 green coffee is invariably exponentially better than coffees from the same farm or region that sell in quantity at real-world prices that both support farms and allow consumers to enjoy the product every day.

It’ll be especially interesting to me to see what you have to say about Kona coffee, which necessarily sells at a high price and which to me (and many other professional cuppers) would even at its very best never score much higher than the mid-80’s if cupped alongside a decent selection of top coffees from other origins.

Rachel

I recall Mark Overly of Kaladi in Denver calling the 100-point system basically an “acidity index,” succintly. It’s useful information as part of the whole picture, but not on its own. Your point about cultural palate differences is well taken.

As a roaster, cupping scores are just so valuable for communicating quality. To have a standard, even perhaps an arbitrary one, is too useful to want to abandon. All my competitors say they go to origin, select only the highest-quality beans, roast them expertly, and sell them fresh. Well, yeah. High scores on top of those claims, though, might have an advantage.

Shawn Steiman

You hit upon the greatest asset of 100 point scoring systems: They convey a lot of information quickly. Unfortunately, you have to completely understand the system to make any sense of it. Moreover, if one isn’t aligned to the system’s narrow definition of quality, it is almost useless for them.

What we all want is not so much a standard, but an efficient and quick way of transferring quality information from one person to another. Alas, I don’t have a solution for this, yet.

Thanks for the thoughts!

K.C. O’Keefe

I agree that many competitions are only pushing the pinnacle of the price pyramid . . . but I’m not sure if they are meant to do much more than that.

To honor & promote the middle of the pyramid perhaps we need a distinct competition method/format?

Exporter with the highest average export per pound by country, with export size categories . . . or something to this effect. This data exists in every country custom agency.

And why not have a metric and competition for the average farm gate price by farmer organization? Every Fair Trade certified sale tracks this information.

Shawn Steiman

As always, Kevin, it is a pleasure to read your thoughts!

I’m going to respond to part of you here and the other part under K.C.’s comment.

As you might imagine, I have a lot to say about the quality of the coffees, their prices, and the rationale for it all. I’ll just dabble, here.

The simple version is Hawaii coffee, like any other coffee origin (or product, for that matter) can be mapped onto a bell curve. Everyone and everything has a tail of glory and a tail of sucky. Most fall in the middle. There are coffees in Hawaii that can complete, easily, on the world stage against anything. They are few and far between, for sure. But, in statistical light of a bell curve, the number is perfectly logical. (The curious angle is that farmers in Hawaii have access to all the knowledge and resources they need to skew the curve. Alas, that doesn’t happen.)

I can’t honor enough your comment about price discovery vs quality discovery. I’ve never seen price discovery in Hawaii and I can imagine why. It would be great to have that in this unique industry. Some day, maybe… 🙂

Thanks for the comment!

1.5l50ozmax

Wow, this is a good post, dead on. It’s true that to many acidity is considered a defect, when in fact high acidity is a defining characteristic of quality coffee. International competitions and cupping event holders should consider what part of the world their judges come from and how that will influence scores. If the majority of judges for instance at brewers cup are Asian, they will have a preference for the coffees brewed by Asian competitors due to a similar taste in coffee (or should I say mouthfeel because that is what is most important to them even if they won’t admit it).

K.C. O’Keefe

What about just embracing the fact that each judge is an individual with personal preferences, which are expressed at a given time in a unique event? . . . throw out the entire scoring sheet and give each person a simple vote for the best to worst coffee on each table 1-6? No points at all, just 1st-30th ranking at this event.

Secondly, if we are truly concerned with fairness, what about testing each jury member for consistency threshhold prior to allowing their votes to count? In my experiences most professional cuppers cannot repeat their scores within a 2 point range more than 80% of the time, and the higher the scores the more irradiate we are. . . The most valid taster opinion starts with one that is consistent with themselves.

Third, what about multiple “Best Categories” similar to what Best of Panama has done . . . Best Washed Arabica, Best washed Geisha, Best Natural . . . and we could even get more specific “Best Washed Intense Acid”, “Best Natural Fruity”, “Best full body”, “Best Earthy” and (forgive me) “Best 2nd crack”, “Best Espresso” or process specific “Best Anaerobic Macerated Natural” . . . in my experiences while points and preferences vary, it does apear that we as tasters commonly agree on taste descriptors. . . perhaps much more than rated quality.

Shawn Steiman

If I had more time today, K.C., I’d love to write a spiel about sensory science and why humans are lousy machines. But, that will have to wait for another day. 🙂

I hope you’ll see that the system I’m working on addresses many of your ideas. On a grander scale, though, this competition system is nothing more than application of sound sensory science principles to a competition. Generating valuable, objective (within reason), meaningful quality assessment data is doable and easy (once a person let’s go of their ego). Sadly, aside from the recent WCR revelation that current cupping systems aren’t scientific, nobody has (as best I can tell) pursued a scoring system that minimized the judges subjectivity and the industry’s quality blinders (e.g., acid vs. not, as Kevin demonstrated). The systems I use in my companies are the same scoring system this competition will be using. I’m just trying to apply them to a competition, where you need a winner.

I especially like your comment about no competition (or score sheet) fits all coffees. That’s one of my biggest frustrations with competitions. I have a solution for it (which I’ll write about in this series, of course). The solution is much as you describe, only more specific, and it is applicable to anything.

Thanks for your thoughts! It is a pleasure to hear from you again (I suspect you have no recollection that we corresponded about a decade ago when I was writing my first book. 🙂

1.5l50ozmax

You could only allow Q certified judges, then all coffees would score 84 :p

José Lassiter

Cupping competitions are advertising and marketing vehicles to peripheral industry interests. They attract an insecure customer base which is gone with the next winner.

A ‘win’ can cause frustration with a brands core customer base when resulting in rising prices; a ‘win’ may further risk short supplies with then loss of shelf space or loss of migrating customers.

The inflation of ever more cupping competitions, plus its often ridiculous sub categories, leads to increased consumer confusion and a never ending flood of stickers and ribbons on merchandise. Some coffee vendors therefore invent organizations, competitions, and call themselves their respective ‘winners’ on their own merchandise. Go figure!

Well researched purchase decisions for buying wine are overwhelmingly made by label, then by price, and to a very minute percentile by ratings. Why would it be different in coffee? Taste is subjective. A buyer knows that, a shop owner knows this. Tasting the marketability (!) of a coffee is a very different thing than detecting its top notes in a controlled setting to a trained professional.

Shawn Steiman

Well said, Jose! You’ll find no disagreement, here!

I want to distill what you’re saying to two very important ideas, just to say it out loud.

First, what a person likes is subjective and globally arbitrary. As an extension, everyone gets to like what they like and nobody else should be passing judgement on it.

Second, the purpose of numbering systems is ostensibly to define quality and, consequently, whether a person will like the product or not. Of course, the numbering systems fail at that miserably. What would be much more effective, as Kevin said, was to ditch the notion that we need to rank anything and merely be descriptive. If I describe a coffee to you by saying it is sweet, roasty, and reminiscent of sweaty gym socks, you’ll know whether it will appeal to you or not. Customers are the arbiters of quality; it is our job to empower them with the information they need to make the best decisions for themselves.

Stephen James Davidson

If you are a hammer all you see are nails. In my experience doing QC for BB, depending on what role a taster played in their day-in-day-out determined how they scored coffee. Our green buyer loved high acidity, green tasting coffee while our QC team was looking for our bench mark with flavors of dark chocolate, mellow acidity, etc. When sharing a cupping table, it was challenging to calibrate. I appreciate this call to action for conversing about what scoring looks like at a competition level cupping, especially when it has the potential to change the way we taste coffee in the specialty industry and how that very tasting can effect livelihoods and reputation.

Comments are closed.