HTT Manuscript Pre-publication is available
Abstract:
Purpose:
Validating artificial intelligence (AI) algorithms for clinical use in medical images is a challenging endeavor due to a lack of standard reference data (ground truth). This topic typically occupies a small portion of the discussion in research papers since most of the
efforts are focused on developing novel algorithms. In this work, we present a
collaboration to create a validation dataset of pathologist annotations for algorithms that
process whole slide images (WSIs). We focus on data collection and evaluation of
algorithm performance in the context of estimating the density of stromal tumor
infiltrating lymphocytes (sTILs) in breast cancer.
Methods:
We digitized 64 glass slides of hematoxylin- and eosin-stained ductal carcinoma
core biopsies prepared at a single clinical site. A collaborating pathologist selected 10
regions of interest (ROIs) per slide for evaluation. We created training materials and
workflows to crowdsource pathologist image annotations on two modes: an optical
microscope and two digital platforms. The microscope platform allows the same ROIs to
be evaluated in both modes. The workflows collect the ROI type, a decision on whether
the ROI is appropriate for estimating the density of sTILs, and if appropriate, the sTIL
density value for that ROI.
Results:
In total, 19 pathologists made 1,645 ROI evaluations during a data-collection event
and the following two weeks. The pilot study yielded an abundant number of cases with
nominal sTIL infiltration. Furthermore, we found that the sTIL densities are correlated
within a case, and there is notable pathologist variability. Consequently, we outline
plans to improve our ROI and case sampling methods. We also outline statistical
methods to account for ROI correlations within a case and pathologist variability when
validating an algorithm.
Conclusion:
We have built workflows for efficient data collection and tested them in a pilot study.
As we prepare for pivotal studies, we will consider what it will take for the dataset to be
fit for a regulatory purpose: study size, patient population, and pathologist training and
qualifications. To this end, we will elicit feedback from the FDA via the Medical Device
Development Tool program and from the broader digital pathology and AI community.
Ultimately, we intend to share the dataset, statistical methods, and lessons learned.”