The Psychometric Properties of Probability and Quantile Forecasts

In this study, we compared how different forecasting elicitation methods affect assessments of forecaster accuracy.

Sophie Ma Zhu^*,1,2, David Budescu^3,2, Nikolay Petrov^4,2, Ezra Karger^5,2, Mark Himmelstein^6,2 ,

1 University of British Columbia
2 Forecasting Research Institute
3 Fordham University
4 University of Cambridge
5 Federal Reserve Bank of Chicago
6 Georgia Institute of Technology
* Corresponding Author. Contact: zhuma@student.ubc.ca

Published: Nov 19, 2024
Revised: Aug 24, 2025

Sophie Ma Zhu^*,1,2, David Budescu^3,2, Nikolay Petrov^4,2, Ezra Karger^5,2, Mark Himmelstein^6,2

Abstract

What is the best way to elicit forecasts if the goal is to measure the skill of forecasters? For continuous outcomes, there are two common approaches: elicit probabilities at fixed quantiles of the outcome distribution (e.g., what is the probability gas will cost less than $4.00 per gallon in one month?), or elicit quantiles at fixed probabilities (e.g., what is the price such that there is a 25% chance a gallon of gas will cost less in one month?). We compared these methods using a simulation study and a longitudinal survey in which 1,194 participants were randomly assigned sets of real-world forecasting questions from several domains in each format. Although probability elicitation is more ubiquitous in the literature, our simulation and survey data showed that estimates of forecasters’ mean accuracy had lower measurement error when using quantile elicitation. The simulation study indicated this was due to idiosyncrasies that emerged from the discretization of the outcome distribution into fixed bins when using probability elicitation. Ultimately, this means that each additional question provides more information about forecasters’ skill under quantile elicitation than under probability elicitation. Despite its psychometric efficiency, quantile elicitation was more time-consuming and associated with more comprehension difficulty, indicated by violations of monotonicity. However, there was a clear reduction in these violations over time, suggesting comprehension improved with practice. Further research is needed to make quantile elicitation more accessible to forecasters while maintaining its psychometric advantages.

Read the full paper