NESPS - The Whitaker Classification of Craniosynostosis Outcomes: An Assessment of Inter-Rater Reliability for Frontal-Orbital Surgery

Back to 2016 Annual Meeting

The Whitaker Classification of Craniosynostosis Outcomes: An Assessment of Inter-Rater Reliability for Frontal-Orbital Surgery
Wen Xu, Ari Wes, Patrick Gerety, Jing Li, Phuong Nguyen, Scott Bartlett, Jesse Taylor.
Children's Hospital of Philadelphia, Philadelphia, PA, USA.

Background: The Whitaker classification is a simple and widely used system for describing aesthetic outcomes after craniosynostosis surgery. The purpose of this study is to evaluate its inter-rater reliability for patients who have undergone frontal-orbital surgery.
Methods: A retrospective review of craniosynostosis patients who underwent surgical intervention at a tertiary referral center was conducted. Inclusion criteria were: single-suture craniosynostosis, surgical intervention before age two years, and photographs taken between age 5 and 18 years and prior to revisions. Four craniofacial surgeons (PN, PG, JT, SB) reviewed the Whitaker classification for craniosynostosis revision and independently rated a set of 30 patient photographs. The Whitaker classification ranges from I (“excellent result, no revisions necessary”) to IV (“unacceptable result, repeat craniotomy and/or fronto-orbital reshaping necessary”). Inter-rater reliability was assessed with Cohen’s kappa. Additional statistical analysis included one-way ANOVA and Wilcoxon rank sum test.
Results: The study included 30 single-suture craniosynostosis patients; 13 (43%) were male and average age was 13 years. The κ value for all four raters was 0.1353 (p=0.0033), indicating “slight agreement”. Pairwise comparisons demonstrated κ values ranging from 0.1353 (PN vs. JT) to 0.4059 (PG vs. SB) (Table 1). The average rating for the set of 30 photos differed among the four raters (1.8±0.7 vs. 2.4±0.9 vs. 2.6±0.8 vs. 2.3±0.8, p=0.0007) (Figure 1). Two of the raters were also the primary surgeons for a subset of the subjects so we investigated whether a surgeon scored his own patients differently; one rater gave his own patients an average of 2.9±1.0 vs. the other raters’ average of 2.1±0.6 (p=0.0939) while another rater gave his own patients an average of 2.8±0.8 vs. the other raters’ average of 2.1±0.6 (p=0.005). Finally, we found that patients who underwent subsequent cranioplasty and/or fronto-orbital advancement tended to have higher Whitaker scores (subsequent procedures 2.3±0.6, no subsequent procedures 1.7±0.6, p=0.072).
Conclusions: While the Whitaker system is a very powerful tool for communicating aesthetic outcomes relating to the perceived need for revision in long-term craniosynostosis patients, it does suffer from poor inter-rater reliability. Therefore, it is incumbent upon craniofacial surgeons to create new evaluation tools that produce more consistent inter-rater agreement.
Table 1
Rater 1 Rater 2 Rater 3 Rater 4
Rater 1 1
Rater 2 0.1353 1
Rater 3 0.1752 0.2711 1
Rater 4 0.3735 0.2963 0.4059 1

Back to 2016 Annual Meeting