Good predictive models are an asset in medicine. Ultimately, a model is a “simplification or approximation of reality”,[
1,
2] but by distilling complicated data into the chance of a given outcome occurring, such models support clinical and shared decision-making. Reliable model development and validation are crucial in creating a good predictive model, and this requires - among other things- formal statistical incorporation of each prognostically important variable into the model.[
2-
4]
We consider recent discussions surrounding the use of the race and/or ethnicity (R&E) in medicine to be of utmost importance.[
5-
7] We add to these discussions by noting the Fracture Risk Assessment Tool (FRAX
®) is importantly problematic in its handling of R&E.
FRAX
® formally incorporates many variables into a model aiming to predict a person’s 10-year risk of hip or major osteoporotic fracture (MOF). The variables include age, sex, body mass index, history of osteoporotic fracture, parental history of hip fracture, current smoking, current or previous glucocorticoid use for >3 months at a prednisolone (or equivalent) dose of ≥5 mg/day, rheumatoid arthritis, secondary osteoporosis, ≥3 servings of alcohol per day, and bone mineral density (BMD). In contrast to what FRAX
® does for most countries, the FRAX
® model for the USA (FRAX
®-USA) offers different risk estimates based on R&E. Unfortunately, R&E was never formally evaluated as part of the modeling effort or incorporated into the FRAX
®-USA model; thus, FRAX
® failed to determine whether R&E have any independent predictive value when considered concurrently with all the other variables in FRAX
®-USA. Instead, the base model was built using data from people who were predominantly White. If the user indicates the person is White, the base model risk is returned as is. For the options of Asian, Black, or Hispanic, FRAX
®-USA applies to the base model
post hoc “correction factors” (ranging from 0.43 to 0.64) that are
de facto offsets that were derived outside the multivariable predictive model.[
8-
11] Such an approach is below established standards for modeling and results in R&E having extreme influence on predicted fracture risk.
For example, a female 65 years of age with a low-trauma wrist fracture who weighs 140 pounds, measures 65 inches tall, and has a femoral neck T-score of −2.5 is estimated to have a 21% risk of MOF in the next 10 years if she is White. Changing her race to Black reduces her risk by more than half (to 9.6%) due to the correction factor. For R&E to have such a profound impact raises serious skepticism. For example, the female who is Black would not reach the risk of a female who is White if one adds any other single dichotomous risk factor, increases her age to 85, or decreases her T-score to −3.5. Indeed, she would have to have a T-score of −4.0 instead of −2.5 for her MOF risk to be 21% (and recall this is someone who has already had a low-trauma fracture). Such an outsized impact compared with other variables seems implausible.
Although the authors of FRAX
® have described much about their methods and results, they have never published or otherwise publicly shared precise details of the models themselves (e.g., coefficients). Such lack of transparency leaves users with an incomplete picture, meaning attempts to clarify the role of the correction factors (let alone the model as a whole) are limited to piecing together descriptions that appear in the literature and running test cases as just described. The opaque aspect of the FRAX
® models stands in stark contrast both to expected practice in predictive modeling and to the tenet of transparency in science.[
12]
Beyond use of
post hoc correction factors, the studies used (
Table 1) [
8,
13-
20] to create the FRAX
®-USA R&E correction factors heighten our concerns about use of R&E. Indeed, the studies do not even demonstrate there is a need for correction factors, as none demonstrate that R&E are independently associated with fracture after controlling for all the other variables FRAX
® considers. For example, none of these studies had sufficient control for BMD. Instead, most studies only controlled for age and sex, or gave age- and/or sex-stratified estimates. In addition, some studies employed flawed methods of ascertaining R&E (e.g., by surname). The International Osteoporosis Foundation and International Society for Clinical Densitometry have also reviewed these correction factors.[
10] However, their review - in large part, a comparison of the correction factors used by FRAX
® with subsequent literature that examined ratios of fracture incidence in people who are Asian, Black, or Hispanic compared with people who are White - is insufficient and unconvincing to corroborate FRAX
®’s use of correction factors. Even if there were evidence that R&E retained some association or predictive capacity after sufficient control for all other potentially mediating variables, R&E would still need to be formally incorporated - using accepted methods - into a multivariable modeling effort. Finally, even if R&E were independently associated with fracture after controlling for all other variables in FRAX
®, such a finding could reflect important health disparities, but assuming R&E modifies fracture risk at the level of the individual is at serious risk of ecological fallacy and could perpetuate health disparities rather than help them.
We emphasize the increasing (and overdue) recognition that R&E are poor surrogates for individual health outcomes and that they have fundamentally different implications for population health analyses than they do for predictive models or clinical algorithms.[
5-
7] We have heard defenses of the R&E correction factors by invocation of various arguments, ranging from arguments that effectively depend on the existence of biologic differences that underpin fracture risk to suggesting the risk of falling (and therefore, by extension, fracture) varies by R&E. However, we consider these arguments either fundamentally flawed or demonstrative of poor insight into clinical epidemiology and best modeling practices. For example, even if fall risk is associated with R&E, why would one want to model R&E instead of fall risk? Surely, there would also be challenges with including fall risk as a variable, but this seems rather better than attempting to use R&E as a surrogate for fall risk.
We are not the first to question the FRAX
®-USA model’s use of R&E,[
5,
6] and even while preparing this work, another group called into question the FRAX
®-USA model.[
21] We are also aware of the unconvincing response from the FRAX
® authors [
22] to some of these critiques [
5] that resulted in no change to FRAX
®-USA. We are encouraged by the creation of an American Society for Bone and Mineral Research task force to reassess the inclusion of R&E in FRAX
®.[
23] We hope the concepts added by our critique will support collegial steps toward productive remedy, akin to recent removal of R&E from the prediction of vaginal birth after cesarean delivery and estimation of glomerular filtration rate.[
24,
25] We also urge the authors of FRAX
® to provide a full, transparent accounting for all critical components of each of its models.[
12]
Until then, we call for clinicians to reject the FRAX®-USA model’s correction factors that lead to different fracture risk estimates based on R&E. The best interim solution, if using FRAX®-USA, may be to use the base model for all people, though we consider a dedicated effort sincerely considering the concepts delineated herein to be critical, including exploration of reparameterization in a more contemporary and diverse population. Likewise, although we focused on FRAX®-USA, we note FRAX® forwards different estimates for different groups in other geographic areas (e.g., South Africa and Singapore); thus, such models should receive similar scrutiny. In summary, we call for a moratorium on use of R&E within FRAX®-USA.