➢ Several different radiographic grading systems for lumbar disc degeneration have been developed; however, none of these systems have been established as a single standard by the spine community.
➢ There is a general consensus regarding the standard for evaluating disc degeneration with use of magnetic resonance imaging (MRI); however, many systems continue to be developed and used, complicating comparisons among studies.
➢ Although of limited clinical application, histological analysis is the most sensitive measure of disc degeneration, particularly for early-stage degenerative changes.
➢ Few studies have assessed radiographic or MRI-based lumbar disc degeneration grading systems by comparing the results against histological measurements.
➢ Studies focusing on the reliability and standardization of imaging techniques for the evaluation of disc degeneration will provide useful data for clinicians and researchers alike.
Low back pain affects >54 million patients annually in the United States alone1. Many patients initially consult general practitioners or orthopaedic surgeons who do not specialize in spine care1. Therefore, knowledge of diagnostic methods for low back pain, including their limitations, may serve as an invaluable tool for all physicians. Imaging is routinely used for the diagnosis of low back pain, with radiographs reportedly being used for 12% (n = 16,567)2 to 31% (n = 17,148)3 of patients and magnetic resonance imaging (MRI) being used for 16% (n = 17,148)3 to 21% (n = 13,760)4 of patients. For the diagnosis of disorders associated with disc degeneration, a variety of grading systems involving both radiographs and MRI have been developed. Different grading systems have been proposed for research and clinical purposes. However, rather than quantifying disc degeneration, these measurements are often combined with subjective clinical symptoms for the purpose of making treatment decisions.
The major limitation of spinal imaging is the poor correlation between imaging findings and clinical pain5,6. Instead of revealing a clear source of pain, such as an infection or a tumor, imaging results commonly reveal degenerative changes7,8. Furthermore, both false-positive and false-negative diagnoses are made with use of current imaging techniques and classification systems. That is, severe degeneration is routinely identified in otherwise asymptomatic individuals6,9-13 whereas severe pain can be reported by patients with normal imaging findings14. To establish the accuracy and sensitivity of radiographs and MRI, an objective and sensitive independent measurement, such as histological analysis, is needed as a baseline for comparison. While histological analysis is clearly limited by its invasiveness, it may provide the most sensitive measurement of degeneration as it provides a direct assessment at the cellular level15. As such, histological analysis may offer a unique role for research in quantifying and comparing the sensitivities of current imaging techniques as well as in serving as a baseline for newly developed technologies.
Despite the limitations of radiographic and MRI-based systems, they remain by and large the predominant tools for the diagnosis of disc degeneration and the planning for its treatment. Consequently, physicians at-large, including general practitioners, stand to benefit from the knowledge base that has been developed to take advantage of imaging technologies. Moreover, in order to maximize the inherent potential of these systems, a qualitative and quantitative assessment of the knowledge base is needed. Therefore, the purposes of the present review are (1) to identify commonly used radiographic, MRI, and histological grading systems for lumbar degenerative disc disease; (2) to determine, for each of the three types of systems, whether one grading system has been established as the standard in the community; and (3) to determine if histological analysis has been used to quantify the sensitivity and accuracy of radiographic or MRI analysis.
Materials and Methods
Two systematic MEDLINE searches were performed to address the goals of the present study. First, a systematic MEDLINE search was performed with use of the search terms “lumbar degeneration” and “classification” to assess the available classification systems for the lumbar spine. Articles describing classification systems for the grading of intervertebral disc degeneration with use of radiographs, MRI scans, and histological studies were thus identified. Once an article was found, the “Related Citations” option of PubMed was used for each classification criterion to identify additional articles that had been missed with the initial search criteria. In addition, the bibliography of each identified article was scanned, and articles that were deemed relevant were then obtained. Only lumbar classification systems that evaluated the intervertebral discs were included. Classification systems that did not evaluate the discs, such as those used for the classification of facet joint and end plate degeneration, were excluded. Classification modalities that could only be performed in vitro, such as those involving macroscopic grading, were excluded. Computed tomography (CT)-based classification systems were excluded as CT has been primarily replaced by MRI, which has superior resolution for visualizing discs16. During the investigation of these articles, any studies that compared imaging modalities were noted.
Once the articles had been identified, the popularity of each classification system was assessed by identifying the earliest dated article that established the specific classification system. Then, Google Scholar and Web of Science were used to determine the number of times the original article describing the classification system had been cited. In addition, among the research articles that cited the original classification systems, the number of National Institutes of Health (NIH)-funded studies that used each classification system since 2008 was determined with use of PubMed Central (PMC) to identify studies of higher quality.
In order to determine whether histological analysis had been used to quantify the sensitivity of either radiographs or MRI scans for detecting disc degeneration, two additional search criteria were used. First, a systematic MEDLINE search was performed with use of the search terms “X-ray,” “lumbar disc degeneration,” and “histology.” Second, a systematic MEDLINE search was performed with the search terms “MRI,” “lumbar disc degeneration,” and “histology.” In order to identify studies that were not found with use of the above search criteria, the references cited in each identified article were also reviewed. Only studies evaluating human lumbar intervertebral discs with use of either radiographs and histological measurements or MRI and histological measurements were included.
Imaging-Based Grading Systems for Lumbar Degeneration
The initial search, that is, “lumbar degeneration” and “classification,” yielded 2196 articles involving radiographs and 3964 articles involving MRI scans. A total of twenty-seven grading systems were identified, including eleven systems that utilized radiographs17-27 and seventeen systems that utilized MRI scans6,17-19,25,28-39, with one of these systems utilizing both radiographs and MRI scans25.
The radiographic grading systems ranged from two-tier to six-tier systems and were mostly based on the evaluation of indirect disc features, including disc height, end plate sclerosis, and osteophytes (Table I). None of the eleven radiographic classification systems has been widely used as the so-called gold standard. Specifically, the classification systems developed by Mimura et al.24, Kellgren and Lawrence21, Gordon et al.20, and Lane et al.22 each have been cited >100 times in both Google Scholar and Web of Science (Table I). The two most recently developed systems, proposed by Wilke et al.27 and Madan et al.23, have been cited much less frequently, with fewer than sixty studies citing each of these systems, although one has to consider the shorter span of time for which they have been available. The only radiographic grading systems that were used in NIH-funded studies were those described by Lane et al.22 (six studies), Wilke et al.27 (two studies), Weiner et al.26 (one study), Benneker et al.17 (one study), and Frobin al.19 (one study).
The MRI-based grading systems included ratings ranging from 0 to 2 and from 1 to 8 and were based on characteristics intrinsic to the disc, such as nuclear signal, disc height, and the distinction between the nucleus and the anulus (Table II). Among the seventeen MRI classification systems, the system described by Pfirrmann et al.32 (Fig. 1) was the dominant system cited for the classification of lumbar degeneration (Table II). Although the systems proposed by Butler et al.29, Schneiderman et al.34, and Tertti et al.38 have all been widely cited, the system proposed by Pfirrmann et al.32 has been cited more than twice as many times as any other system (933 and 440 times according to Google Scholar and Web of Science, respectively). In addition, the system described by Pfirrmann et al.32 was used in thirty-six NIH-funded studies, whereas that described by Raininko et al.33 (the only other MRI-based system cited in NIH-funded studies) was cited in only three NIH-funded studies.
Histological Grading Systems for Lumbar Degeneration
The search terms for histological grading systems yielded five unique histologically based classification criteria15,31,40-42 from 967 papers that were reviewed, including four in vitro systems (Boos et al.15, Gunzburg et al.31, Rutges et al.42, and Berlemann et al.40) and one in vivo system (Weiler et al.41). The in vitro system proposed by Boos et al.15 is the most widely used (Fig. 2), cited 542 times according to Google Scholar and 297 times according to Web of Science (Table III). The next most popular system, proposed by Gunzburg et al.31, was only cited 110 and fifty-nine times according to Google Scholar and Web of Science, respectively. Only the system described by Boos et al.15 was used in NIH-funded studies.
Histological Grading Systems Versus Radiographic or MRI-Based Systems
The systematic MEDLINE searches using the terms (1) “X-ray,” “lumbar disc degeneration,” and “histology,” and (2) “MRI,” “lumbar disc degeneration,” and “histology” resulted in a total of 921 articles. Of these, thirty-five articles14,15,31,38,40,43-72 were identified that utilized both radiographs and histological studies, MRI scans and histological studies, or all three techniques. Nineteen articles evaluated disc degeneration with use of imaging and histological studies, whereas the remaining sixteen articles analyzed histological variables unrelated to degeneration. Of the eighteen studies that analyzed degeneration, nine31,38,44,50,57,67,68,70,71 analyzed degeneration by comparing images (radiographs and/or MRI scans) with histological measurements made from tissues.
Three studies directly compared degeneration as seen on radiographs with that seen on histological samples; all three studies involved the evaluation of cadaveric discs44,57,70. In the study by Quint and Wilke44, histological measurements identified a larger number of discs as being degenerative as compared with radiographic measurements. That study involved the use of a modified histological scoring system based on the original microscopic classification system described by Vernon-Roberts73. Specifically, the modified histological scoring system included grades ranging from Grade A to Grade D and was based on the number of reactive chondrocytes; the number and type of fissures, clefts, and/or splints; the number and pattern of areas of necrosis; and the extent of damage to the anular layers (number of rings)44. Specifically, on the basis of histological analysis, fourteen discs were identified as degenerative according to the modified Vernon-Roberts classifications (with twelve discs classified as Grade B and two classified as Grade C), whereas only four discs were identified as degenerative according to the radiographic criteria of Mimura et al.24 (with two discs classified as Grade B and two classified as Grade C). In addition, the authors concluded that the microscopic criteria were preferable to radiographic criteria for assessing early-stage degenerative disc disease. Schiebler et al.57 also found more abnormal findings, such as Schmorl nodes, on histological analysis than on radiographs or MRI scans. Finally, Siepe et al.70 found that a loss of disc space height on radiographs was not associated with the histological measures of degeneration as described by Boos et al.15 (r = 0.3) but was associated with the MRI degeneration grade as described by Pfirrmann et al.32 (r = 0.79). This finding suggests that histological changes may occur before radiographic loss of disc space height.
Seven studies directly compared MRI with histological analysis for the evaluation of degeneration31,38,50,57,67,68,71; all but one of these studies involved the use of cadaveric specimens. Yu et al. compared histological findings in discs with MRI findings and concluded that MRI can accurately reveal anular tears67,68,71. In addition, Tertti et al.38 concluded that no obvious differences in disc structure could be found between histological analysis and MRI scans. In contrast, several studies have shown that histological analysis can detect changes not found on MRI scans. Gunzburg et al.31 found that a normal MRI scan may occur in the presence of a considerable reduction in nuclear material and reported that 48% of discs with abnormal histological features had a normal MRI scan. Similarly, Schiebler et al.57 reported more abnormal findings on histological analysis than on four different MRI sequences, including more Schmorl nodes and tears. Furthermore, in the one study that involved the use of surgical samples instead of cadaveric samples, Cevei et al.50 found that histological analysis identified more cases of lower-grade degeneration than MRI did (Type 1, 10.63% compared with 8.5%; Type 2, 27.65% compared with 10.63%); however, MRI identified more cases of higher-grade degeneration (80.65% compared with 57.44%).
Back pain is often frustrating and challenging to physicians because of the frequent disconnect between patient symptomatology and imaging findings. Unlike hip or knee pain, for which imaging findings frequently correlate with the severity of the etiology, low back pain may not demonstrate an etiology on imaging studies or may not parallel the severity of the findings observed on imaging studies. These discrepancies between imaging findings and the severity of low back pain may be in large part due to the complex three-dimensional geometry and multiple joints within each spinal unit that make the back a more complicated structure than either the hip or the knee. This complex relationship is further complicated by psychiatric and financial incentives associated with back pain (e.g., Workers’ Compensation claims). While the diagnosis of disc degeneration may remain subjective and complex, standardization of schemes that quantify disc degeneration may serve as a first step in both the clinical and research settings to provide objective measures. The standardization of objective grading criteria for disc degeneration may provide several advantages, including better comparisons across clinical and in vitro studies, more objective measurements, improved evaluations of cadaveric materials used for orthopaedic research, and decreased bias within research studies.
Despite the potential advantages of standardized grading systems, the results of the present review indicated a wide range of available techniques. Specifically, no clear grading system has been established for radiographic findings. Furthermore, the number of grading systems is unreasonably large, considering that nearly all of the systems use essentially the same parameters, that is, disc space narrowing, osteophytes, and sclerosis, while they differ only marginally in their numerical scales and subdivisions (Table I).
There is a consensus that, among MRI grading systems, the system described by Pfirrmann et al.32 constitutes the gold standard; however, several other grading systems continue to be developed and used. Moreover, similar to radiographic grading systems, the existing MRI grading systems often evaluate similar parameters, including, for example, signal intensity differences and distinction between the nucleus and the anulus (Table II). Furthermore, only a fraction of the studies have been validated on the basis of an analysis of intraobserver reliability74. Rather than continuing to develop additional classification systems, more widespread adoption of a single classification system would decrease the confusion and would simplify comparison among different studies and imaging modalities.
Determining the sensitivity of a given imaging-based grading system is important in order to understand its strengths and limitations. To establish the accuracy and sensitivity of current diagnostic imaging, a reference is needed that can be considered to provide the true value, regardless of how inconvenient or invasive that reference test may be. The results of the present study suggest that histological analysis can detect degenerative changes, including early changes that cannot be detected on radiographs or MRI scans. Clearly, histological analysis has limited clinical usefulness because of its invasiveness and cost; however, it nevertheless provides a baseline with which to assess the accuracy and sensitivity of current imaging-based grading systems. Although histological analysis may be a valuable research tool because of its ability to detect degeneration at a cellular level, the results of the present study demonstrate that few studies have directly compared imaging-based grading systems against histological findings.
Ideally, an optimal imaging-based classification system will be developed to provide a high correlation between disc degeneration and low back pain. In order to achieve this goal, the ability of current imaging technologies to quantify degenerative changes must first be established. Unfortunately, as indicated by the present review, this quantitative basis has not been well defined as the imaging systems remain largely qualitative. Therefore, the present review may provide a foundation for the development of an imaging-based classification system that ultimately could correlate with pain. Clinical and laboratory studies are likely necessary to further advance this process. Such studies then may provide the means to maximize the inherent potential of existing tools for the diagnosis of low back pain and the planning for its treatment. If, on completion of the studies, the potential is deemed inadequate, the studies will provide a basis of comparison with which new and improved diagnostic technologies can be assessed. Clearly, all physicians involved in various stages of diagnosing and treating low back pain stand to benefit from advancing the classification systems based on the current imaging technologies as well as from the development and evaluation of new technologies.
In summary, the present review points out some voids and troubles with the current literature that can only be addressed with further research to help bridge the disconnect between imaging and clinical findings. As a first step toward bridging this gap, we propose that more research is needed first to establish gold-standard imaging-based and histological grading systems for both in vitro and in vivo studies alike. The second step is to establish the accuracy and sensitivity of each measurement technique.
Despite the need for gold-standard classification systems, few NIH-funded studies have utilized the most widely cited systems identified in this review. Funding agencies, such as the NIH, clearly recognize the importance of research surrounding lumbar disc degeneration. For example, the amount of NIH funding for studies involving lumbar disc degeneration has steadily increased since 2000, paralleling the increase in the number of patients diagnosed with these diseases (Fig. 3)75,76. Moreover, in 2009 and 2010, the NIH provided over $8 million of funding for projects that included the term “lumbar disc degeneration” in the title, terms, or abstract75. However, the present study identified very few studies in which radiographic or MRI findings were compared against disc degeneration with use of a histological classification system20,56. Given the large amount of funding for research on spine disorders, we hope that the recommendations of the present study may simplify comparisons and allow improved comparisons between future research studies.
Several different systems have been developed for the grading of lumbar disc degeneration with use of radiographs; however, none of these systems have been established as a single standard by the community. In contrast, there is a general consensus that the grading system developed by Pfirrmann et al.32 is the gold standard for evaluating disc degeneration with use of MRI. However, several other grading systems involving MRI have been developed and continue to be used. Histological analysis is a powerful tool to be used as a reference for the assessment of the accuracy and sensitivity of any imaging-based grading system. Unfortunately, few studies have assessed radiographic or MRI grading systems by comparing the results against histological findings. Future studies should focus on such comparisons, which may provide a greater understanding of each technique and may help to improve the clinical correlation between imaging findings and pain. In turn, unnecessary tests and treatment costs may be avoided while optimizing the selection of surgical candidates.
Source of Funding: No funding was received in support of this work.
Investigation performed at the Orthopaedic Institute for Children, Los Angeles, California
Disclosure: None of the authors received payments or services, either directly or indirectly (i.e., via his or her institution), from a third party in support of any aspect of this work. One or more of the authors, or his or her institution, has had a financial relationship, in the thirty-six months prior to submission of this work, with an entity in the biomedical arena that could be perceived to influence or have the potential to influence what is written in this work. Also, one or more of the authors has had another relationship, or has engaged in another activity, that could be perceived to influence or have the potential to influence what is written in this work. The complete Disclosures of Potential Conflicts of Interest submitted by authors are always provided with the online version of the article.
- Copyright © 2015 by The Journal of Bone and Joint Surgery, Incorporated