Skip to main content

Reimagining Cupping Competitions, Part 5: The Results Are In

Cupping at the Kona Coffee Cultural Festival. Photo by Laura Aquino.

As the title above suggests, there have been four other pieces in this series. They laid out some of the problematic issues with current competition scoring systems, ways in which we might better define a winning coffee in a competition, and how we’d use that definition in practice for a competition. The fourth part illustrated the process whereby the stakeholders settled on characteristic definitions and defined the profiles. This final post is a recap of the competition and how this system survived its first test.

Overall, in an actual competition application, the system proved to be a success. That said, there are certainly tweaks and improvements to be made.

My concern with the judges — most of whom were Q and/or R certified, plus one or two who easily could be — was that’d they’d have too difficult a time letting go of their preferences and rating the characteristics just on intensity. After all, we’re never taught to just measure “what is here,” something that is very hard to do without training. We spent some time training, but it was far too little and clearly insufficient.  

Photo by Laura Aquino.

Nonetheless, the winning coffees were not the favorite coffees of the judges; they were more like the defined profiles. This was exactly the desired result of the system! So, even with minimal training, the system was robust enough to handle the judges’ old habits. Thus, we can define specific profiles and measure coffees against them. This not only has ramifications for how competitions could operate in coffee and other industries, but it is a profound reality for any quality control department: You can define your target profile and assess against this ideal objectively.

The fact that the favorite coffees — as discussed after the competition — didn’t win exposes one of the biggest challenges with using this system. Everyone is used to the 100-point SCA/COE-type system, and those who use it understand it; basically, the higher the number, the more coffee geeks like that coffee in that moment. If that is the only system you know, it is difficult to explain: 1) Why the coffee the experts (i.e., judges) liked the most didn’t win; and 2) That even if a coffee didn’t fit the defined profile, it says nothing about whether someone would like the coffee or not. There are many excellent coffee profiles, but in a competition, only one can win.

Side note: the coffees that made it to the finals in the Modern profile, a.k.a. the coffee geek profile, were all complex, acid-forward coffees that would have scored well using the SCA system. So, the system does function as a quality discovery tool, within some reasonable variation of a given profile definition.

Photo by Laura Aquino

Of course, an expected challenge — and the most legitimate criticism — of the system is that nobody knows it. To understand the results, a person must take the time and energy to learn and understand the system. Thus, the greatest downside of this system is that the entire world has no idea what it means for the winning coffees to have won! Farmers can’t easily use the competition to sell their coffee and buyers can’t easily understand what the scores mean. This is always true, no matter the system being used, but to introduce one with a well-established system already reigning poses a challenge for buy-in at any level.

The major tweaks that need to make it into the next iteration are

  • Train the judges more intensively on the characteristics and the range of their intensities.
  • Redefine the modern profile so that it has higher intensities of sweetness and acidity. Of course, the profile itself could be entirely redefined.
  • Change how “complexity” and “defects” are measured.  
    • While the definitions we used were logical — a certain percentage of judges had to use equivalent descriptive terms — they didn’t work in practice. This may have been because we only had six judges, resulting in too much variation between judges. Sometimes, it was clear that a coffee should earn a point for one of these characteristics even though it shouldn’t earn one using the rubric.
    • Actually scoring these characteristics was very time-consuming. Not only did all the words need to be transcribed into a computer but I had to look at each set of words for each coffee, do some math, make decisions, and enter the score. This took a lot of time, and a more efficient way of capturing these ideas and scoring them needs to be developed.
  • Limit the number of coffees to cup each day to no more than 50 — regrettably, the judges cupped 76 the first day. Even 50 is probably higher than humans should be cupping in a day but some compromises need to be made for the sake of time efficiency and resource limitations.
  • Refine the data-capturing system to output the data in such a way as to make it easy to pass on to entrants. All of the scores and descriptors, but not the raw scores, were shared with each entrant. Many hours of cutting and pasting were spent to ensure each entrant received the proper data. Giving the data to the entrants is very important, but it shouldn’t be such a burden on the volunteers running a competition.

Though no system is perfect, this system has proved itself to at least be robust and usable. Most surprising to me was that several of the judges commented about the potential value of the system for the industry, especially when most of them arrived skeptical of it. I was hoping the judges wouldn’t strangle me at the end of the competition. The fact that they were complimenting it says a lot about it.

As always, I encourage you comment below or write me directly if you have questions, comments, or death threats; I appreciate any criticism you have to offer. As an industry, we should be examining our systems and beliefs constantly so that we can refine them to be as efficient, useful, and meaningful as possible.

Comment