Truthing & Validation
Artificial intelligence and machine learning (AI/ML) algorithms in digital pathology have enormous potential to increase diagnostic speed and accuracy. However, the performance of these algorithms must be validated against a reference standard, or “ground truth”, before deployment in clinical practice.
Need
Methods and tools to establish and use ground truth for validating AI/ML or other digital pathology tools/applications.
Problem
Importance of establishing ground truth:
The ground truth is difficult to establish, and current methods have drawbacks:
Truth by pathologist:
Committee of experts is used to establish “truth”
Algorithmic performance can be tested for equivalence to experts’ performance accounting for their variability – however, the “truth” is biased towards the experts’ clinical behavior
Truth by Independent assay:
This may destroy or alter specimens, limiting further assessment
When the primary specimen is unavailable, use of adjacent/interleaved samples introduces variability
Since assays might not be considered “gold standard”, this approach leads to a methods comparison approach that estimates total error between modalities
Truth by patient outcome:
Comparison against prospectively collected outcomes is time consuming and expensive
Comparison against retrospectively collected samples requires special considerations with respect to sample quality and storage – further, Data on retrospective outcomes might be noisy or incomplete depending on the source and/or purpose of collection (e.g. registries, hospital databases, etc.)
Clear application to adaptive algorithms and "leaves room” for people to improve their programs
Eliminates or reduces bias in the current methods of arriving at a “ground truth,” presumably improving overall algorithm performance downstream
A standardized framework makes a lot of questions disappear – people can follow rules and generate data fast, and can easily share the datasets with others since everyone will follow the same established guidelines (e.g. interoperability of datasets)
Ground truth datasets improve speed to market for vendors, and faster access to treatment/algorithms for patients
Best practices from our efforts to establish ground truth as well as guidelines for future datasets will be shared with the community to standardize dataset development as technology evolves.
Workgroup Focus
The focus of the workgroup is to engage the community of end users (clinicians, professional societies, health providers, and patients) to take an active role in evaluating AI/ML performance. They are the best positioned to establish truth and help create deliverables.
We will focus on discussing and disseminating methods, tools, frameworks, and pipelines for creating datasets with ground truth. We will encourage the development and demonstration of rigorous statistical methods for validating AI/ML for the diversity of use cases.
We will support the sharing of data, creation of templates, and check lists for reporting AI/ML validation studies. These deliverables are essential elements of FDA submissions.
Current Projects
FDA High-throughput truthing project, lead Brandon Gallas
Project synopsis: a project to crowdsource pathologists and collect data (images + pathologist annotations) that can be qualified by the FDA/CDRH Medical device development tool program (MDDT). The MDDT qualified data would be available to any algorithm developer to be used to validate their algorithm performance in a submission to the FDA/CDRH.
Project overview manuscript accepted to JPI [LINK]
Project Webpage [LINK]
Deliverables:
Validation dataset
Statistical methods (pathologist-pathologist agreement, algorithm-pathologist agreement)
Lessons learned while pursuing qualification of the data as an FDA Medical Device Development Tool
Looking for projects to highlight and discuss
Email Hetal Marble, PhD (hmarble@mgh.harvard.edu) with proposals or submit to the PIcc Project Proposals page [LINK]:
Projects that address different data types: quantitative measurements, image marks and segmentations, quasi-quantitative and qualitative grading scales
Projects that address variability in ground truth during study design, data collection, and AI/ML validation (statistical analysis, including study sizing and powering)
Projects that address truth by independent assays and truth by prospective or retrospective outcomes
Projects that share data and tools for AI/ML development and validation, especially when the data and tools are for regulatory submissions
Relevant Publications
Presentation: High Throughput Truthing (HTT): Pathologist Agreement from a Pilot Study
Date: May 2021
Authors: Gallas et al.
Link: Slides Available Here
News & Updates
Group Leaders
Hetal Marble, PhD
Hetal Marble leads translational research and clinical trials for the Center for Integrated Diagnostics at MGH. She is primarily responsible for the translation of novel technologies into clinical practice, working in close partnership with clinical colleagues to ensure clinical utility and sustainability of novel implementations. Hetal received her Ph.D. at Brown University, where her graduate work focused on development of biomarkers for lineage-specific stem cell differentiation. Hetal’s current interests are in clinical trial design and incorporation of real-world evidence into trials and clinical practice, and in sustainable implementation of novel technologies into clinical practice.
Brandon Gallas, PhD
Brandon D. Gallas PhD provides mathematical, statistical, and modeling expertise to the evaluation of medical imaging devices at the FDA. His main areas of research are image quality, computer-aided diagnosis, imaging physics, and the design, execution, and statistical analysis of reader studies. Recently, he has been investigating pathologist performance and agreement using whole slide imaging devices and the microscope.
Katherine Elfer, PhD, MPH
Coming soon..
Evangelos (Vangelis) Hytopoulos, PhD
Dr. Hytopoulos is currently Senior Director of Data Science at iRhythm Technologies, leading the development of Deep Learning algorithms that are at the heart of the detection of arrhythmias through iRhythm’s wearable biosensing technology. He has over 20 years of experience in statistical learning, algorithm development, high performance computing and visualization. Prior to joining iRhythm, he was Senior Director of Biostatistics and Data Management at Genomic Health (GHI). Prior to joining GHI, Dr. Hytopoulos led the Computational Sciences group that was instrumental in the development of MIRISK VP, Aviir’s Myocardial Infarction risk assessment test. Previously, he held positions at BioSeek (now Eurofins Discovery) leading the development of the information systems, database and computational methods for the BioMAP technology; at X-Mine leading the development of cutting-edge microarray analysis algorithms; at CFD Research Inc., promoting the adoption of numerical simulation technologies in the Semiconductor and Bioengineering (microfluidics) industries; and at Silicon Graphics, where he was a high performance computing applications expert. He has also served as a consultant and advisor to a number of diagnostic and pharmaceutical companies providing guidance at all stages of product development and assisting with their submissions to the FDA. Dr. Hytopoulos has Master's and Ph.D. degrees in Aerospace Engineering and an undergraduate degree in Naval Architecture and Marine Engineering.