Clarifying Validation


Clarifying Validation: Collaborative Approach to a Consistent Terminology

This project is addressing the inconsistent use of the term "validation" across disciplines. We are compiling relevant publications and aim to create a concise resource to clarify terminology and support a peer-reviewed publication on the topic.

We will advocate for a more harmonized, context-aware approach to the term ’validation’, particularly considering the growing influence of AI/ML technologies in diagnostics.

The aim of the paper will be to raise awareness of the inconsistent and context-dependent use of the word, promote collaboration for better definitions, and ensure clearer, more consistent performance metrics in laboratory practice and related fields.

Please send any relevant publications ahead of this meeting. 
We look forward to this discussion.


Overview of the Term

Based on the literature review and supplementary files, the term "validation" has been used inconsistently across disciplines, particularly in the context of AI/ML, medical devices, pharmaceuticals, and regulatory science. An overview of different definitions of validation across various fields can be found here.








Meeting #4+5

Meeting #4 will be on 05/02/2025 at 1:00 PM (EST)

Meeting #5 will be on 05/27/2025 at 2:30 PM (EST)

Meeting #3

The meeting was on 04/17/2025 and we went over the first (original draft) of the manuscript. Three main files were shared with the project participants after the meeting and a summary of the key discussion points is provided below .

Meeting Minutes

Joe Lennerz. Joe emphasized that vague and inconsistent use of "validation" language creates an unintentional barrier for the clinical implementation of products and AI tools. He briefly introduced the scope of the meeting (to some new members of the initiative) and then outlined the meeting composed of a review of materials by Amanda and a group discussion of so-called ‘consensus statements’.

Amanda Dy. Amanda guided through the project folder containing three documents.  She went over these documents explain their content, structure, and core function. She emphasized the definition of “entries” in the methods section and how these relate to the 5 domains.
Another element she emphasized was the notion of “consensus statements” in the results section.
The supplemental file (and excel table) with multiple subsheets was discussed extensively, including supplemental explanation by Joe regarding the selection of entries. Briefly, these entries are recurrently identified terms that might mean something different depending on the context they are used in. Amanda took notes during the discussion of consensus statement 2 (AI/ML) and encouraged everyone to contribute.

Keith Wharton. Keith suggested that the overturning of the LDT rule should serve as a central catalyst for the manuscript, distinguishing between validating a product (device) and validating an LDT workflow (service). He stressed the importance of defining "validation" clearly and noted that payers focus not just on validation but also on clinical utility.
He proposed that new terms should be assessed for "stickiness" — whether they are likely to be adopted across domains. Keith further joked that misalignment isn’t just academic — it affects real-world adoption — and contributed several lighthearted comments that helped maintain a lively atmosphere. Keith also mentioned that we should consider including wording examples on how to use the more specific and/or proposed wording.

Sandra Bütow. Sandra mentioned her specific vantage point as a public health official and that validation in this context is important. Of note, Sandra provided additional references and a section related to the public health procedures and infrastructures for the discussion.  

Fabienne Lucas. Fabienne recommended using the two extremes — research at the beginning and clinical/patient care at the end — as guideposts to align the terminology in between. She highlighted the importance of reaching non-pathologist clinicians who order tests but may not understand the nuances of validation, suggesting that ambiguity here is a known communication problem. Fabienne also endorsed the idea of exploring linguistic nuances across languages (e.g., German, French, Hindi) as a possible subproject to enrich understanding.

Andy Bredemeyer. Andy appreciated the idea of framing around the LDT context but raised the concern that doing so might downplay the importance of business, communication, AI, and ML realms. He also proposed a tech development pipeline visualization showing how the community uses "validation" differently at various stages. Andy supported incorporating insights from different languages and joked about the session becoming "überdisciplinary."

Staci Kearney. Staci emphasized that vague use of validation terminology presents a barrier to clinical implementation and AI adoption. She contributed to the "who cares" moment, reinforcing that even if misalignment is unintentional, the consequences are serious enough to warrant clear communication across all involved domains.

Shannon Bennett. Shannon pointed out a regulatory nuance: under CLIA, labs must establish performance characteristics but technically are not required to continually check them. She supported the broader discussions about the importance of precise terminology and added to the "who cares" moment regarding the impact of language misalignment.

Monika Lamba Saini. Monika introduced herself as someone working on digital pathology implementation and AI model validation pipelines. She endorsed Fabienne’s idea of exploring linguistic and cultural nuances and emphasized the importance of cross-disciplinary collaboration.

Brandon Gallas. Brandon, joining from FDA, stated that although he might not be able to be a coauthor, he could still participate scientifically. He stressed that for FDA purposes, robust documentation practices during validation are critical, including tracking how and when labels, training data, and tuning decisions are made. He linked the conversation to a recent publication about reproducible reporting of annotations for AI models, reinforcing the regulatory aspect of validation.

April Khademi. April recommended separating "tuning" from "validation" explicitly. She pointed out that engineers typically perform "validation" after the model is locked and no longer tunable, which aligns with regulatory standards. April's comment stressed the need to redefine "validation" distinctly to fit regulatory expectations rather than research conventions.

Kim Blenman. Kim noted that clearer and future-proof terminology should be proposed to support future generations of clinical and AI tools, emphasizing the long-term importance of getting the language right now.

Paper shared by Brandon Gallas: https://www.modernpathology.org/article/S0893-3952(24)00019-X/fulltext

Recording

Meeting #2

This meeting was on 04/04/2025 and we went over the materials submitted. Thank you for all your contributions and the additional references. The references were sorted by themese/domains and given the broad range we discussed a submission to a journal with broader scope

The bar graph shows the number of references per domain (as assigned in the supplement; status April 2025)

Based on these domains we delineated that the term “validation” can be interpreted in different contexts. The following graph summarized the shared notions in the above 5 domains.

Contextual domains of the term “validation”.

Three pages of notes from the meeting are included in the presentation below and the next meeting (4/17/2025) will focus on moving towards a manuscript.

Meeting #1

This meeting (01/06/2025) addressed the inconsistent use of the term "validation" across subspecialties and disciplines. We discussed the compilation of relevant publications and creation of a resource and in support of a peer-reviewed publication.

Key Takeaways

1. No universal definition of validation exists across disciplines; it varies by field, objective, and regulatory context.

2. AI validation involves both technical (accuracy, robustness) and regulatory (safety, ethics) considerations.

3. Validation is critical in healthcare, pharmaceuticals, engineering, and AI governance to ensure safety, reliability, and compliance.

4. Ongoing efforts (e.g., TRIPOD+AI 2024, British Standard BS30440, FDA AI Validation Guidelines) aim to standardize AI validation approaches.



Next
Next

REAL-AIM