In the past blog posts of this series on how machine learning detects breast cancer in medical imaging we discussed how computers can learn to interpret mammograms or ultrasounds. In this last article of the series, we discuss how artificial intelligence can assist doctors in the interpretation of histopathology exams.
Why we need to improve the diagnostic performance in breast biopsy
We previously mentioned that biopsy (or histopathology) is the last technique used for the detection and classification of breast cancer, after mammography and/or ultrasound have been performed. For this analysis to be made, pathologists collect samples of lymph nodes near the breast and look for possible metastases, i.e. whether the cancer has spread. Every year, one hundredth of women (1.6 million) in the U.S. undergo breast biopsies. This analysis is so frequent because it is usually considered the gold standard for cancer identification and treatment; a study from 2015 found, however, that its diagnostic performance is not that good. The researchers showed that pathologists agreed on the interpretation of breast biopsies in 75% of the cases, and atypia - a "marker" for increased risk of developing cancer - was under-interpreted in approx. 1 out of 3 cases. With these odds, it is clear why pathologists need to find a way to improve the diagnostic performance in breast biopsies and reduce their error rates.
One approach to solve this issue comes straight out of Google, where researchers developed a solution to automatically detect and localize tumors in breast biopsies. To be fair, the topic was already addressed last year by the Camelyon16 Challenge, the first competition to automate the interpretation of whole-slide images in histopathology, and the results were excellent. The Camelyon16 winning software reached an AUC value of 9.94, greatly outperforming human pathologists (AUC=9.66). But researchers at Google went even further and improved the software’s ability to detect and localize tumor cells.
However, if Google’s new solution is an improvement on this initial Camelyon16 algorithm, how come the best AUC value from Google is below that of the previous solution, i.e. below 9.94? Because, in certain cases, the AUC is not a sufficient indicator for diagnostic performance, and other metrics better describe how effective a software (or a human physician) is.
When AUC is not sufficient: FROC to evaluate software localization accuracy
In this case, Google gave less importance to the AUC and used instead the free-response ROC (FROC), which is defined as the sensitivity at 0.25, 0.5, 1, 2, 4, 8 average false positives per tumor-negative slide. The researchers justify their decision to use the other parameter: “We focused on the FROC as opposed to the AUC because there are approximately twice as many tumors as slides, which improves the reliability of the evaluation metric”. More precisely, ROC (and AUC) analysis is useful for assessing the performance at the slide level (right or wrong diagnosis), whereas FROC analysis evaluates localization accuracy, i.e. whether the software identifies the correct position of the tumor. Being certain that the software correctly locates the issue is particularly important in cancer computer-aided detection and diagnosis, as opposed to other conditions, e.g. detection of meniscus tear, where it is known where the issue can be.
It is exactly in the FROC that Google’s new solution outperforms both Camelyon16’s winning algorithm and human pathologists. While the latter reached FROC values of 80.7 and 73.3 respectively, the highest FROC from Google was as high as 88.5 (AUC=97.7). The researchers commented that their algorithm succeeded in “reducing the false negative rate to a quarter of a pathologist and less than half of the previous best result (Camelyon16 winner)”. What’s more, this new algorithm found out that two slides in the Camelyon16 training set were erroneously labeled as normal!
This latter proves that the results are certain to improve the efficiency of cancer diagnosis in histopathology images and improve the lives of all patients whose breast biopsies may otherwise be misdiagnosed, in particular false negatives.