HTT Manuscript Pre-publication is available

Abstract:

Purpose:

Validating artificial intelligence (AI) algorithms for clinical use in medical images is a challenging endeavor due to a lack of standard reference data (ground truth). This topic typically occupies a small portion of the discussion in research papers since most of the

efforts are focused on developing novel algorithms. In this work, we present a

collaboration to create a validation dataset of pathologist annotations for algorithms that

process whole slide images (WSIs). We focus on data collection and evaluation of

algorithm performance in the context of estimating the density of stromal tumor

infiltrating lymphocytes (sTILs) in breast cancer.


Methods:

We digitized 64 glass slides of hematoxylin- and eosin-stained ductal carcinoma

core biopsies prepared at a single clinical site. A collaborating pathologist selected 10

regions of interest (ROIs) per slide for evaluation. We created training materials and

workflows to crowdsource pathologist image annotations on two modes: an optical

microscope and two digital platforms. The microscope platform allows the same ROIs to

be evaluated in both modes. The workflows collect the ROI type, a decision on whether

the ROI is appropriate for estimating the density of sTILs, and if appropriate, the sTIL

density value for that ROI.


Results:

In total, 19 pathologists made 1,645 ROI evaluations during a data-collection event

and the following two weeks. The pilot study yielded an abundant number of cases with

nominal sTIL infiltration. Furthermore, we found that the sTIL densities are correlated

within a case, and there is notable pathologist variability. Consequently, we outline

plans to improve our ROI and case sampling methods. We also outline statistical

methods to account for ROI correlations within a case and pathologist variability when

validating an algorithm.


Conclusion:

We have built workflows for efficient data collection and tested them in a pilot study.

As we prepare for pivotal studies, we will consider what it will take for the dataset to be

fit for a regulatory purpose: study size, patient population, and pathologist training and

qualifications. To this end, we will elicit feedback from the FDA via the Medical Device

Development Tool program and from the broader digital pathology and AI community.

Ultimately, we intend to share the dataset, statistical methods, and lessons learned.”

Previous
Previous

HTT Data Collection Relaunch and Project Updates - HELP WANTED

Next
Next

Alliance 2020 pre-USCAP meeting