Researchers in the UK evaluated the effectiveness of a commercially available artificial intelligence (AI) algorithm with human readers of screening mammography using a standardized assessment. The Radiological Society of North America (RSNA) journal Radiology published the outcomes of their research.
Not all breast cancers are found during mammogram screening. The needless use of imaging and biopsy in women who do not have breast cancer can be brought on by false-positive interpretations. One alternative is to have two readers evaluate each mammogram in order to increase the sensitivity and specificity of screening mammography.
The researchers found that twice reading lowers recall rates while increasing breast cancer detection rates by 6 to 15%. In times of reader shortages, this technique is labor-intensive and challenging to implement.
To evaluate the performance of human readers and AI, Prof. Chen and her study team employed test sets from the Personal Performance in Mammographic Screening, or PERFORMS, quality assurance assessment used by the UK’s National Health Service Breast Screening Programme (NHSBSP). 60 tough exams from the NHSBSP with aberrant, benign, and normal findings make up a single PERFORMS test. The reader’s score is compared to the actual AI results for each test mammography.
She stated that it was crucial for human readers involved in breast cancer screening to function satisfactorily. “The same will apply to AI once it is used in clinical settings.”
The research team assessed the performance of the AI programme using data from two consecutive PERFORMS test sets, or 120 screening mammograms. The results of the 552 human readers, including 315 (57%) board-certified radiologists and 237 non-radiologist readers (206 radiographers and 31 breast doctors), were compared to the AI test scores.
“The 552 readers in our study represent 68% of readers in the NHSBSP, so this provides a robust performance comparison between human readers and AI,” Professor Chen stated.
There were 161/240 (67%) normal breasts, 70/240 (29%) malignant breasts, and 9/240 (4%) benign breasts when each breast was treated separately. Malignant mammographic features that were most frequently observed were masses (45/70 or 64.3%), calcifications (9/70 or 12.9%), asymmetries (8/70 or 11.4%), and architectural deformities (8/70 or 11.4%). Malignant lesions had an average size of 15.5 mm.
In 120 assessments, no distinction in performance between AI and human readers was found for the identification of breast cancer. The performance of a human reader showed a mean 90% sensitivity and 76% specificity. AI readers were comparable to human readers in terms of sensitivity (91%) and specificity (77%).
The study’s findings, according to Prof. Chen, “provide strong supporting evidence that AI for breast cancer screening can perform as well as human readers.”
Prior to AI being employed as a second reader in clinical practise, according to Prof. Chen, additional research is required.
She said, “I don’t think it is too early to say exactly how we will use AI in breast screening in the end.” The current major prospective clinical trials will provide further information. No matter how we deploy AI, its success will depend on our capacity to continuously assess performance.
It’s crucial to understand that algorithms can be impacted by changes in the operating environment and that AI performance might deteriorate with time, according to Prof. Chen.
Once AI enters clinical practise, “it’s critical that imaging centers have a process in place to provide ongoing monitoring of AI,” the expert emphasized. “Since no studies to date have compared the performance of human readers in common quality assurance test sets to that of AI, this study may provide a model for evaluating AI performance in a real-world setting,” the authors write.