PathML: An open-source software toolkit for computational pathology research
Summary:
Imaging datasets in cancer research have grown exponentially in size and information density in recent years, driven chiefly by two trends:
Increasing adoption of digital pathology workflows at departmental and institutional scale (large n datasets)
Emerging technologies in highly multiplexed imaging and spatial omics (high dimensional datasets)
The unprecedented scale of today’s datasets may enable derivation of insights for cancer research and clinical care, but only if researchers are equipped with the tools to leverage advanced computational approaches from machine learning and computer vision. PathML is a software toolkit designed to lower the barrier to entry for computational pathology, enabling researchers to develop streamlined, scalable, fully customized end-to-end image analysis pipelines, with a unified framework for brightfield, multiplexed immunofluorescence, and spatial omics images and support for 160+ file formats. Developed at Dana-Farber Cancer Institute and Weill Cornell Medicine, PathML is currently being used by 7+ research groups and 2 imaging core facilities across the two institutions. PathML is an open-source project freely available on GitHub, with complete documentation, tutorials, and example vignettes and more than 15,000 downloads worldwide. We welcome anyone interested in collaborating or learning more to contact us at PathML@dfci.harvard.edu or visit www.pathml.org for more information.