(1098-A) Tackling the JUMP CP dataset: Best practices to navigate high content multiparametric data
Wednesday, May 24, 2023
13:30 - 14:30 CET
Location: Hall 3
Abstract: High content image-based approaches yield rich phenotypic data that can reveal important insights into a candidate drug’s mechanism of action or toxicity. Cell Painting is an example of such a phenotypic assay. Originally developed at the Broad Institute, Cell Painting is quickly gaining traction, and over the past years, great efforts have been made to further optimize and standardize the assay.
Spearheaded by the Broad Institute, the JUMP (Joint Undertaking in Morphological Profiling) Cell Painting (CP) consortium was established to take these efforts even further: the consortium aims to enable a new data-driven approach to drug discovery. For this purpose, the consortium has generated a reference Cell Painting dataset (~3 million images, ~75 million single cells, 5000+ features), using ~140,000 different genetic and small molecule perturbations. This unprecedented dataset was made public at the end of last year, and has the potential to be an outstanding resource for drug discovery research. Its size and complexity, however, pose a significant barrier to leveraging it.
Here, we will demonstrate how cloud computing can help overcome some of these challenges. Specifically, we will present a robust and iterative data analytics workflow that will allow users to mine it for information relevant to their drug discovery projects. We will cover the basics of detecting redundant data, quality control, data pre-processing, and, finally, calculating phenotypic profiles and grouping samples accordingly. We will demonstrate how this approach enabled us to unravel underlying biological processes, and to investigate the reproducibility of the data.
Taken together, our findings emphasize the importance and feasibility of a robust analytics workflow, which can break the analytics barrier for scientists who wish to take advantage of this public dataset.