ABSTRACT
Background:
Pulmonary hypertension is a complex syndrome that encompasses a diverse group of pathophysiologies predisposed by different environmental and genetic factors. It is not clear to which extent the universal risk classification schemes can be applied to cohorts in individual pulmonary hypertension centers with differing environmental backgrounds, genetic pools, referral networks.
Aims:
To explore whether the recommended risk classification schemes could reliably be used for mortality prediction in an unselected pulmonary hypertension population of a tertiary pulmonary hypertension center.
Study Design:
A retrospective cross-sectional study.
Methods:
We retrospectively screened our hospital database for the patients with pulmonary hypertension between 2015 and 2022. The grouping of pulmonary hypertension was made as follows in accordance with current guidelines: Group 1: patients with pulmonary arterial hypertension, Group 2: patients with pulmonary hypertension associated with left heart disease, Group 3: patients with pulmonary hypertension associated with lung disease and/or hypoxia, and Group 4: patients with pulmonary hypertension associated with pulmonary artery obstructions. Then, we compared the predicted and observed mortality rates of four different risk classification schemes (REVEAL, REVEAL-Lite, ESC/ERS and COMPERA).
Results:
We identified 723 cases in our pulmonary hypertension database, the final study population consisted of 549 patients. The REVEAL, REVEAL-Lite and European Society of Cardiology/European Respiratory Society risk scores significantly underestimated the mortality risk in the low-risk stratum (5.3% vs. 1.9%, P < 0.001; 5.3% vs. 2.9%, P = 0.015 and 6.3% vs. 1%, P < 0.001, respectively) and overestimated the mortality risk in the high-risk stratum (11.8% vs. 25.8%, P < 0.001; 10.4% vs. 25.1%, P < 0.001 and 13.2% vs. 30%, P < 0.001, respectively). Although the COMPERA 4-strata model significantly underestimated the risk in low- and intermediatelow risk strata (4.9% vs. 1.5%, P < 0.001 and 6.8% vs. 2.8%, P = 0.001, respectively), it was accurate in intermediate-high and highrisk groups (10.1% vs. 8.7%, P = 0.592 and 15.6% vs. 22%, P = 0.384, respectively). The analyses limited only to group 1 pulmonary hypertension patients gave similar results.
Conclusion:
The established risk classification schemes may not perform as good as expected in unselected pulmonary hypertension populations and this may have important implications on management decisions. Tertiary centers should not uncritically accept the published risk prediction models and consider modifying current risk scores according to their own patient characteristics.
INTRODUCTION
Pulmonary hypertension (PH) is a major health problem that is manifested by debilitating symptoms and reduced life expectancy. Rather than being a single disease, it is a complex syndrome that has been assigned to 5 groups by the World Health Organization, as follows:1 pulmonary arterial hypertension (PAH),2 PH associated with left heart disease,3 PH associated with lung disease and/or hypoxia,4 PH associated with pulmonary artery obstructions5 and PH with unclear and/or multifactorial mechanisms.1 Although this classification facilitates specific therapeutic recommendations according to groups formed based on similarities in pathophysiology and clinical manifestations, these groups themselves encompass a diverse group of pathophysiologies, each of which is predisposed by different environmental and genetic factors. Therefore, a decent prognostic classification is essential to guide clinicians in the use of the existing limited treatment options, especially considering their high cost and unpredictable success rates.
International societies have recommended several risk classification schemes for treatment selection and escalation.2,3,4,5,6,7 Since a wide variety of socioeconomic, epidemiologic, environmental, and genetic causes determine the frequency of PH groups and their subgroups, PH prognosis demonstrated a wide variability across demonstrated a wide variability across geography.8 Every PH center has a specific referral network that influences the distribution of these groups and subgroups, and as a corollary, the baseline risk profile in their patient cohorts. Therefore, the extent to which the recommended risk classification schemes derived from international registries can be applied to individual PH centers remains unclear. For example, the European Society of Cardiology/European Respiratory Society (ESC/ERS) guidelines for PH2 recommends the use of the ESC/ERS risk classification scheme or the Registry to Evaluate Early and Long-Term PAH Disease Management (REVEAL)3 risk scores for prognostication and treatment selection; however, these risk scores have never been validated in a Turkish population. Moreover, they are neither recommended nor validated for PH groups other than group 1,2 although several variables used in the risk classification models are expected to have the same prognostic implications for all groups. Moreover, some preliminary studies demonstrated the utility of these risk classification models in other PH groups, especially in group 4.9,10,11 Nevertheless, the robustness of the recommended risk scores warrants further elucidation in unselected populations and real-world scenarios.
In this study, we explored whether the recommended risk classification schemes could reliably be applied for mortality prediction in an unselected PH population of a tertiary PH center. We also attempted to identify the potential causes, if any, that may limit the predictive power of the current risk scores in a real-life setting.
MATERIALS AND METHODS
Study Protocol
The study was undertaken at Marmara University, Pendik Training and Research Hospital, a tertiary center for PH. A local ethical committee approved the study, and the study was undertaken in accordance with the principles of the Declaration of Helsinki. We retrospectively screened our hospital database for patients with PH between 2015 and 2022.
Only patients who were referred to our clinic with a suspicion of PH were enrolled in the study, while those with an established diagnosis of PH were excluded. A multidisciplinary PH team including a cardiologist, a cardiovascular surgeon, a pulmonologist, a rheumatologist, and a radiologist evaluated all patients. All patients underwent a comprehensive examination, which included their medical assessment, transthoracic echocardiography, multi-slice computed tomography, ventilation/perfusion scintigraphy, right heart catheterization (RHC), and selective pulmonary angiography, as required. PH grouping was performed through the multidisciplinary PH team consensus according to this comprehensive evaluation and in accordance with the current guidelines, as follows:1,2 group 1: patients with PAH, group 2: patients with PH associated with left heart disease, group 3: patients with PH associated with lung disease and/or hypoxia, group 4: patients with PH associated with pulmonary artery obstructions, group 5: patients with PH with unclear and/or multifactorial mechanisms. The demographics and laboratory results were obtained via chart review, and the values on the index assessment were applied for risk classifications. RHC was performed via the right jugular vein, femoral vein, or antecubital vein using a Swan-Ganz catheter (Edwards Lifesciences, Irvine, USA), and the cardiac output was measured by the indirect Fick method. Dead or alive status within the 5-year period after the index visit was checked via national healthcare database.
Risk Classification
Risk classifications were applied as described previously.2,3,4,5,6,7 The REVEAL risk score was calculated according to the previously published point score system, as follows:3 score ≤ 6 = low risk, score 7-8 = intermediate risk, and score ≥ 9 = high-risk. For the REVEAL Lite, score ≤ 5 = low risk, score 6-7 = intermediate risk, and score ≥ 8 = high-risk.5 The Comparative, Prospective Registry of Newly Initiated Therapies for PH (COMPERA) 4-strata point score was calculated as described elsewhere:7 score < 1.5 = low risk, score 1.5-2.5 = low-intermediate risk, score 2.5-3.5 = intermediate-high-risk, score ≥ 3.5 = high-risk.5 The ESC/ERS risk score was calculated according to the ESC/ERS guidelines.2,6 A total score of < 1.5 indicated low risk, 1.5-2.5 indicated intermediate risk, and ≥ 2.5 indicated high-risk.
Statistical Analysis
The SPSS (version 26.0; SPSS Inc., Chicago, IL, US) statistics software was used for statistical analysis. Continuous variables were expressed as the mean ± standard deviation or median (interquartile range, IQR), and the categorical variables were expressed in counts (percentages). The normality of continuous variables was assessed by using Shapiro-Wilk’s test and through visual inspection of histograms. The mortality rates were compared between groups 1 and 4 by the chi-square test. The diagnostic power of risk classification schemes was calculated and compared through receiver operating characteristics analysis. The observed and expected mortality rates were compared by using chi-square goodness-of-fit test. The expected mortality rates were obtained from the original studies.4,5,6,12 Mortality curves were constructed via Kaplan-Meier analysis. For all statistical analyses, P < 0.05 was considered to indicate statistical significance.
RESULTS
We identified 723 cases in our PH database, 174 of which were excluded because of incomplete clinical, echocardiographic, or RHC data. Therefore, the final study population consisted of 549 cases comprising 147 (26.8%) in group I, 53 (9.7%) in group II, 6 (1.1%) in group III, and 343 (62.5%) in group IV patients. Among the group I patients, there were 47 (31.9%) patients with idiopathic PAH, 42 (28.5%) with connective tissue diseases, 49 (33.3%) with congenital heart diseases, 4 (2.7%) with portopulmonary-PH, and 5 (3.4%) with venoocclusive disease. Among group IV patients, 143 (44.5%) had been operated on for CTEPH, 135 (42.2%) had been deemed as inoperable, and 42 (13.1%) had undergone pulmonary balloon angioplasty. The baseline characteristics of the patients are summarized in Table 1. The baseline echocardiographic and invasive hemodynamic parameters are presented in Table 2.
The mortality rates at 1, 3, and 5 years were 7.3% (40/549), 12.5% (69/549), and 14% (77/549), respectively, in the whole cohort. The 5-year mortality rate was 12.9% (19/147) in group 1 PH, 3.8% (2/53) in group 2 PH, 0% (0/6) in group 3 PH, and 16.3% (56/343) in group 4 PH. As groups 2 and 3 had a limited number of patients, a statistical comparison for mortality was meaningful only between the mortality rates of groups 1 and 4, which revealed no significant difference (12.9% vs. 16.3%, respectively, P = 0.338).
The baseline risk estimates according to different risk calculation schemes are presented in Table 3. When the REVEAL, REVEAL Lite, ESC/ERS, and COMPERA risk classification scores were considered as continuous variables in the whole population, their predictive power for 1-year mortality was limited; as their area under the curve (AUC) values were 0.638 [95% confidence interval (CI), 0.541-0.735, P = 0.004], 0.619 (95% CI, 0.525-0.712; P = 0.012), 0.579 (95% CI, 0.511-0.698; P = 0.094), and 0.605 (95% CI, 0.511-0.698; P = 0.027), respectively. The predictive values for 5-year mortality were even worse (AUC, 0.480, 0.678, 0.528, and 0.539; respectively). These values were significantly lower than the predictive value of a single baseline NT-proBNP measurement (AUC; 0.759, 95% CI, 705-0.820; P < 0.001) or 6MWT (AUC, 0.782; 95% CI 0.729-0.836; P < 0.001) for 5-year mortality (P < 0.001 for all comparisons), but these two parameters showed a low diagnostic accuracy for 1-year mortality (AUC, 0.540 and 0.518, respectively). When the analyses were repeated according to specific PH subgroups (groups 1 and 4), the prediction capabilities were found to be similar. Namely, the predictive power for 1-year mortality in group 1 patients was low for REVEAL (AUC, 0.581; 95% CI, 0.368-0.795, P = 0.391), REVEAL Lite (AUC, 0.584; 95% CI, 0.395-0.773, P = 0.378), ESC/ERS (AUC, 0.497; 95% CI, 0.302-0.692, P = 0.975), and COMPERA risk classification scores (AUC, 0.570; 95% CI, 0.377–0.764, P = 0.458) when they were considered as continuous variables. The predictive values for 5-year mortality were similarly nonsignificant (AUC, 0.591, 0.615, 0.572, and 0.644, respectively).
One-year survival and mortality rates according to the REVEAL, REVAL-Lite, and ESC/ERS risk classification and the COMPERA 4-strata model are presented in Figures 1 and 2. The REVEAL risk score, low-, intermediate-, and high-risk tiers showed a mortality rate of 5.3% (15/282), 3.7% (3/81), and 11.8% (22/186), whereas these rates were 5.3% (15/281), 7.6% (8/105), and 10.4% (17/163) for the REVEAL Lite score, respectively. For the ESC/ERS risk classification, low-, intermediate-, and high-risk classes displayed a similar mortality rate of 6.3% (7/111), 5.7% (19/332), and 13.2% (14/106), respectively. According to the COMPERA 4-strata model, the mortality rates in the low-, intermediate-low-, intermediate-high-, and high-risk groups were 4.9% (10/206), 6.8% (13/192), 10.1% (12/119), and 15.6% (5/32). When analyses were limited to group 1 patients, the REVEAL low-, intermediate-, and high-risk tiers showed a mortality rate of 5.1% (4/78), 5.0% (1/20), and 10.2% (5/49), whereas these rates were 3.9% (3/76), 14.3% (4/28), and 7.0% (3/43) for the REVEAL Lite score, respectively. For the ESC/ERS risk classification, low-, intermediate-, and high-risk classes displayed a similar mortality rate with 8.3% (2/24), 6.1% (6/98), and 8.0% (2/25), respectively. According to the COMPERA 4-strata model, the mortality rates in low-, intermediate-low-, intermediate-high-, and high-risk groups were 5.7% (3/53), 5.6% (3/54), 9.4% (3/32), and 12.5% (1/8), respectively.
The REVEAL and ESC/ERS risk scores significantly underestimated the mortality risk in the low-risk stratum (5.3% vs. 1.9%, P < 0.001; 5.3% vs. 2.9%, P = 0.015 and 6.3% vs. 1%, P < 0.001, respectively) and overestimated the mortality risk in the high-risk stratum (11.8% vs. 25.8%, P < 0.001; 10.4% vs. 25.1%, P < 0.001 and 13.2% vs. 30%, P < 0.001, respectively). Although the COMPERA 4-strata model significantly underestimated the risk in low- and intermediate-low risk strata (4.9% vs. 1.5%, P < 0.001 and 6.8% vs. 2.8%, P = 0.001, respectively), it was accurate in the intermediate-high and high-risk groups (10.1% vs. 8.7%, P = 0.592 and 15.6% vs. 22%, P = 0.384, respectively). When the analyses were limited to group 1 PH patients, the REVEAL and ESC/ERS risk scores significantly underestimated the mortality risk in the low-risk stratum (5.1% vs. 1.9%, P < 0.001 and 3.9% vs. 1%, P < 0.001, respectively). The REVEAL and ESC/ERS risk scores overestimated the mortality risk in the high-risk stratum (10.2% vs. 25.8%, P < 0.001; 7.0% vs. 25.1%, P < 0.001 and 8.0% vs. 30%, P < 0.001, respectively). Although the COMPERA 4-strata model significantly underestimated the risk in low-risk strata (5.7% vs. 1.5%, P = 0.013), it was accurate in the intermediate-low-, intermediate-high-, and high-risk groups (5.6% vs. 2.8%, P = 0.220; 9.4% vs. 8.7%, P = 0.892 and 12.5% vs. 22%, P = 0.517, respectively).
DISCUSSION
When prediction tools derived from randomized controlled trials are applied in a real-life setting, it should be assured that the target population is similar to the population from which the prediction tool was derived. If the frequency of a predicted outcome or the distribution of baseline characteristics is significantly different between these two population groups, then the prediction tool may not work as reliably. PH is a particularly difficult disorder for the widespread application of such tools because both the outcomes and demographics of PH patients may change across centers owing to the difference in environmental factors, genetic pool, and referral bias.13
First, according to the Bayesian principles, all diagnostic tools operate on pre-test probability. Therefore, the average mortality rate of a cohort strongly influences the discriminatory power of a risk classification tool. As multiple factors determine the frequency of PH groups and their subgroups, which, in turn, govern the overall mortality rate, it is not surprising that PH mortality indicated a significant variability according to the studied population. The respective mortality rates at 1 and 3 years were 10% and 25% in the REVEAL registry,14,15 8% and 21% in PH Association Registry (PHAR),16 10.6% and 31.7% in the COMPERA study,17 and 15% and 29% in the Swedish PAHR (SPAHR).18 The corresponding mortality rates were significantly lower in our cohort (7.3% and 12.5%, respectively), which is one of the most important factors to undermine the utility of the risk classification systems. The lower mortality rates may partly be explained by the improved PH care, as several of these abovementioned studies enrolled patients approximately a decade ago and showed decreasing mortality due to PH.19,20 Correspondingly, Kaymaz et al.21 recently reported a lower (19.4%) 3-year mortality rate when compared to the rates reported in the abovementioned studies from another tertiary center in Turkey, despite having a different PH group distribution than ours. An alternative explanation for this may be the different genetic or environmental factors peculiar to Turkish PH patients, which warrants elucidation in further studies. These results suggest that PH centers should start by comparing their average mortality rates with those observed in the risk stratification studies before using specific risk classification schemes.
Second, the distribution of risk grades may differ across the cohort. It has been observed that different age-associated PH phenotypes arise due to changes in the demographics and epidemiology over the past years, which may have resulted in a change in the etiology, pathophysiology, and prognosis in the lower-risk stratum when compared to that in the higher risk stratum.22 As the distribution of these different risk strata in a cohort is strongly influenced by the referral network of that particular PH center, a particular risk stratum may indicate significant deviations from those reported previously. The studies reported from our country demonstrated important differences among different centers in the baseline characteristics, and therefore, in the distribution of different risk grades.21,23,24,25,26 Assessing these different subgroups with the same tool may result in an under- or overestimation of specific risk strata. Indeed, the REVEAL and ESC/ERS risk scores systematically under- and overestimated the risk of mortality in our low- and high-risk strata, respectively, which has important practical implications, particularly in the management of group 1 PH patients. Since the ESC guidelines recommend triple-combination therapy, including a parenteral prostacyclin analog for high-risk patients and oral double-combination therapy for low- and intermediate risk patients, any miscalculation of the baseline risk may influence inappropriate treatment-related decision-making.2 Furthermore, it is unclear as to whether the continuation of treatment in patients achieving an apparently low-risk status with their initial PAH therapy is reassuring, as the mortality risk of low-risk strata was significantly underestimated in our cohort. Notably, the COMPERA 4-strata model predicted mortality reasonably well in the upper-intermediate and high-risk subgroups. This 4-strata model was introduced to detect a higher risk subgroup in the intermediate risk class in 3-strata models and claimed to be more sensitive to changes in risk from the baseline until follow-up.2,7,27 Our data seem to support that this incremental stratification has real clinical relevance, especially for group 1 PH patients.
Third, the parameters employed in the risk classification schemes may not have the same meaning in a specific population when compared to the population they were derived from. For example, it is known that patients with PH due to congenital heart disease are relatively younger and have a higher exercise capacity, lower NT-proBNP levels, a better hemodynamic profile, and a longer stable clinical course when compared to those in the other PAH subgroups.28,29,30 A sub-study of the REVEAL registry suggested that several prognostic parameters used in the risk-scoring systems and their respective hazard ratios were significantly different in patients with PH due to the presence of congenital heart diseases when compared to that in patients with idiopathic PAH.30 This observation indicated that any deviation from the distribution of PH in the original studies may endanger the use of risk classification in another cohort. Finally, the discriminative power of prognostic parameters is time-sensitive. For instance, NT-proBNP and 6MWT demonstrated a better predictive value for 5-year mortality when compared to that for 1-year mortality in our study. This observation indicates that the cut-off values and time-sensitivity of the model parameters may need to be redefined for a target population.
Our study has several limitations. As our study was undertaken at a tertiary center exclusively specialized in pulmonary interventions for CTEPH, our cohort may not match the general PH population. The groups 2 and 3 PH patients were significantly under-represented, whereas group 4 PH patients were over-represented. The observed lower mortality rate may be associated with the inclusion of a higher percentage of patients with CTEPH, although, it should be noted that (1) the mortality in group 1 PH patients was still lower than that in the previous registries; (2) the mortality in the CTEPH subgroup was not significantly lower than that in the PAH subgroup, rather, it was numerically higher. The majority of patients with group 4 PH in our study had residual CTEPH, and the corresponding PH-specific drug treatment was used in almost 80% of these patients. Therefore, the lower prediction capabilities of the risk-scoring systems cannot be explained solely with the CTEPH patients with presumably lower mortality risk. Nevertheless, the inclusion of operated CTEPH patients and their substantial proportion in a cohort may have affected the predictive power of the risk classification schemes, which are not specifically designed for use in these patients. The risk-scoring systems have been less studied in CTEPH patients, and the practical impact of risk stratification on treatment selection in CTEPH is unclear. However, although several of the aforementioned studies mainly focused on group 1 PH mortality, some of them included CTEPH patients.9,10,18,31 Benza et al.9 applied the REVEAL risk score in patients with group 4 PH who undertook the riociguat treatment. The authors reported found that the REVEAL risk score at the baseline and week 16 and the change in the REVEAL risk score from that at the baseline predicted survival and clinical worsening-free survival. They concluded that the REVEAL risk score in patients with inoperable or persistent/recurrent CTEPH had a utility in indications beyond group 1 PH. The same authors further evaluated REVEAL 2.0 in another study10 and found that REVEAL 2.0 may have utility in predicting outcomes and monitoring treatment response in patients with inoperable or persistent/recurrent CTEPH. In an analysis by Delcroix et al.31 the ESC/ERS risk assessment seemed applicable to patients with CTEPH under medical therapy. Therefore, the use of the aforementioned risk scores in group 4 PH was at least partially externally validated. As the number of patients in group 1 PH was limited, we cannot exclude the possibility that the diagnostic accuracy of the risk-scoring models may have decreased due to the low statistical power of the study. Another limitation of this study is related to the disease time course and treatments. As the components of the risk score and treatments change over time, the mortality risk inevitably changes. We were therefore unable to account for the changes in clinical variables and treatment that may have affected our risk calculations. However, it should be noted that this limitation is an inherent one to the original risk scores, which do not include treatment effects as input in their models.
In conclusion, the established risk classification schemes may not perform as well as expected in real-life scenarios due to multiple factors. Therefore, tertiary centers should not uncritically accept the published risk prediction models and consider modifying the current risk scores according to individual patient characteristics.