Cancer research isn’t what you think it is Artwork

Cracking the Cancer Code

In an age where data are everywhere, harnessing the power of data science can be a catalyst for groundbreaking discoveries in the fight against cancer. Welcome to the Cracking the Cancer Code podcast where we explore the latest in cancer data science. As a part of the ITCR Training Network (itcrtraining.org), we’re a small team of individuals who are working to democratize data science education in the hopes of catalyzing cancer research and ultimately fighting health inequities in cancer.

The ITCR Training Network (and this podcast) is supported by NCI UE5CA254170 but the views expressed on this podcast are those of the individuals who expressed them and do not reflect the views of our funders.

Find out more about the ITCR Training Network at https://www.itcrtraining.org/

All Episodes

Cracking the Cancer Code

Cancer research isn’t what you think it is

September 23, 2024 • ITCR Training Network • Season 1 • Episode 1

0:00 | 21:26

Send us Fan Mail

This podcast episode introduces a new series called "Cracking the Cancer Code" that explores cancer research and how data science is revolutionizing the field. The hosts, Dr. Carrie Wright and Candace Savonen, discuss the history of cancer, its complexity as a disease, and how cancer research has evolved over time.

Find out more about the ITCR Training Network at the website
Find out more about ITCR at their website.

Special thanks to the individuals who allowed us to interview them:

Researchers:
Juli Klemm - Director of ITCR program
Jeff Leek - Chief Data Officer of Fred Hutchinson Cancer Center
Drew Jones - Assistant Professor at NYU Langone
Susanna Kiwala - Business and Technological Applications Analyst II - McDonnell Genome Institute
Kartik Singhal - PhD Student at Washington University in St. Louis
Anja Conev - Computer Science PhD Student at Rice University
Mariam Khanfar - PhD Student at Washington University in St. Louis
Chloe Herman - Ph.D. student at Northern Arizona University

What is Cancer Informatics? Interviewees:
Celina Bowman
Beverly Booth
Alyssa Kivari
Ann Savonen

Credits:
Directed by Candace Savonen and Carrie Wright
Produced by Elizabeth Humphries, Candace Savonen, and MJ Wu
Edited and written by MJ Wu
Also written by Elizabeth Humphries and Candace Savonen
Hosted by Candace Savonen and Carrie Wright

What is cancer research? The number of different people that interact with data systems and biomedicine now is basically everyone in biomedicine. The common thread is people who are doing very data intensive, informatics intensive research. The future is here, it's just unevenly distributed. I think we've learned studying it one gene at a time is not sufficient to understand that complexity.

You're listening to Cracking the Cancer Code. Cancer researchers will increasingly need to be skilled at data science. It's basically everyone. A podcast series about the researchers who use data to fight cancer. Few things have haunted humanity like cancer. Evidence of tumors have been found in 3, 000 year old mummies, and mentions of tumors and how they could be removed have been found in texts that are even older.

Abnormal growths, tumors, and tissue pathology can be plainly seen to the naked eye, and as long as humans have noticed them, they have been trying to treat them. While we don't have a cure for cancer, cancer itself is no longer a death sentence. This is the result of decades of painstaking biomedical research that makes the Emperor of Maladies slightly less scary than it used to be.

How did we get here? And where do we go from here?

This is Cracking the Cancer Code, and in this series we will explore cancer and the researchers using data to fight it. I'm Candace Savonen, a data scientist at the Fred Hutchinson Cancer Center and the tech lead for the ITCR Training Network. I'm Dr. Carrie Wright, a senior staff scientist at the Fred Hutchinson Cancer Center, A collaborative effort of researchers around the United States aimed at supporting cancer informatics and data science training, funded by the National Cancer Institute.

Carrie and I work closely with a variety of dedicated cancer researchers on the forefront of cancer informatics. What is cancer informatics? Stay tuned and you'll find out. But it's shaping our understanding of the field and its future.

Cancer researcher. First thought would be. I see somebody in a lab coat sitting at a lab analyzing cells and doing things like that. First thing that pops in my mind, I would think of you sitting in a lab looking in a microscope. Maybe there's some little mice in cages or something. Uh, never actually thought about it.

Um, first thing that comes to mind is probably the classic, like, lab coat. sort of situation and all the science stuff that goes with it, the little test tubes and Bunsen burners and whatever, a lot of microscope, but being like realistic and like thinking about it, probably not if you're looking at a lot of data analysis.

So maybe part of their time is in the lab doing tests and things like that. Well, I think there's different kinds of researchers. There's probably the kind that work in the lab. You know, and they have microscopes and then they try different things and stuff, but then there's probably also the kind that collect data and do stuff with the data to figure out what kind of trends or what things they might see that will lend them to some kind of answers.

What is cancer research?

Most people know what cancer is to some degree. Cells in our bodies multiply out of control and metastasize and spread to other parts of the body. The last part about spreading is key, as abnormal cells Cell growth and tumors do not necessarily mean cancer. Cancer specifically refers to instances where these abnormal cell growths breach the barriers between different tissue types into places where it shouldn't be, evading the normal barriers that keep cell types organized.

So, the root of cancer is uncontrolled cellular division. Normal cells will divide in a tidy and controlled manner. In cancer, something in the cell's biology changes, prompting ceaseless division beyond the limits of what should happen. While some cancer researchers have focused on better treatments, others have focused on understanding what changes in the cell to trigger this unstable, unending cell division.

For many of them, this means looking into our DNA, the material that contains the instructions on how our cells work. Their goal is to identify which part of the DNA might cause cancerous growth. Genes are only one piece of the puzzle of why we get cancer. Carrying a copy of a mutated cancer risk gene called a variant might increase your risk of developing cancer, but it doesn't guarantee it.

Identifying the risk gene or variant is just the start, but studying a particular gene's effect on cancer is a whole other discipline. We don't fully understand how these mutations Eventually lead to cancer, what causes these mutations, and eventually how to predict who will get sick and stop cancer before it happens.

Not everyone who gets cancer has inherited a variant or mutation for cancer risk. Outside forces like viruses or environmental exposures can also trigger the disease. Someone who has no family history of cancer and does not carry any mutations for known cancer risk genes might still develop it.

Especially if they've been exposed to outside factors. Likewise, someone might have all the known risk factors and still never develop cancer. Understanding this is one of the many goals of cancer research, but we are still a long way from being able to predict who will get sick and who will not. And this research endeavor requires data, and a whole lot of it.

When you think about the front lines of cancer research, what do you think about? Maybe you think about a doctor in a white coat, test tubes, administering different clinical trials, whatever you think clinical trials might be. You might think of very complex surgeries, you know, somebody's saying, give me the scalpel, whatever you've seen in a doctor show, right?

And you're not wrong. Those elements do come from some truth that is part of reality. But the heart of the battle to understand the cure and the disease and the real front line of cancer, right? looks very dramatic given it's 2024. My name is Anja Conev and I'm coming from Rice University in Houston, from the computer science department.

Focus is developing algorithms for modeling proteins. and modeling protein structure and the interactions of proteins with ligands and we use structural data to represent the proteins and the drugs or the ligands. Kartik Singhal. I'm at Washington University, St. Louis. I'm a PhD student. I have two main projects right now.

One of them is focused on DLBCL Atlas. It's like a lot of single cell RNA sequencing samples from patients with diffused large B cell lymphoma. And then the other project focuses more on which is like neoantigen prediction pipeline for personalized neoantigen cancer vaccines. Hi, I'm Drew Jones. I'm an assistant professor at NYU Langone Health.

And I direct the metabolomics laboratory there and also carry out research in the Department of Biochemistry and Molecular Pharmacology related to metabolism and metabolism informatics. My name is Susanna Kiwala. I work at Washington University in St. Louis. As a software developer, I have two main projects.

One is pVacTools, which is our immunogenomics pipeline, and the other is CIVIC, which is a knowledge base for creating variants in cancer. So my name is Mariam Khanfar. I'm a PhD student at Washington University in St. Louis. I am utilizing single cell RNA in different projects, such as Hodgkin lymphoma, prostate cancer.

And for those, we're really just trying to characterize the cancers. What is the microenvironment in the tumor cells? Can we detect Hodgkin lymphoma? Only use in single cell RNA because it's very hard to detect those Hodgkin cells. Hi, my name is Chloe Herman, and I work at Northern Arizona University. I'm a PhD candidate there.

I'm currently designing a plug in for QIIME 2 called Q2FMT, which will help people with the methods they need to quantify engraftment extent following fecal microbiota transplants or FMTs. That relates to cancer therapy because it's a common side effect, reoccurring C. difficile after cancer therapy, because it does wipe out your gut microbiome, and then if a microbe like C.

difficile gets in there and is able to dominate, you could end up with reoccurring C. difficile infection after you've already gone through your cancer treatment, which no one wants. We keep talking about cancer as a singular thing, and I think in the public sphere, a lot of people do. But the truth is, cancer isn't one disease.

Cancer is a category of disease. Or really, if we're going to be more specific about it, cancer is a description of what we see happening in the body. Every cell type in your body, of which there are many, can become cancerous. Some more likely than others. and therefore could develop what we are currently categorizing as a different type of cancer.

But there's no guarantee that this is coming from the same biological mechanism or even related, and even in the same cell, there could be different causes of cancer. So this explodes the complexity when it comes to cancer research. And to give you an idea, here's a list of the different types of cancer compiled by the National Cancer Institute.

Cutaneous T cell lymphoma Carozzi sarcoma, melanoma. Note that each of these cancers has numerous more specific subtypes, and this is only our understanding at this point in time. Even without considering specific subtypes, we can count more than 200 unique types of cancer at this point in time. Some of these are common affecting millions of people each year.

Some of them are rare striking. Only a handful of people in the world. Some may attack children. Some may attack the elderly. They have different genetic causes, different environmental risk factors, different growth trajectories, and different treatment needs. Yet they all fall under a single umbrella of cancer.

To give more context, looking at breast cancer alone, researchers divide it into four major subtypes. Luminal A, Luminal B, HER2 enriched, or triple negative, sometimes called basal like. These categories are useful, and what we have right now anyway, for determining what cellular factors may have caused the cancer.

Essentially, the breast cancer subtype tells researchers how three major hormones might be fueling the tumor's growth. Does it respond to estrogen? Does it respond to progesterone? Human epidermal cells? growth factor, which we also call HER2, or does it respond to none of these? Possible treatment options for breast cancer differ based on which molecular subtype you're treating.

As researchers learn more about how breast [00:11:00] cancer develops and how tumor types respond to different treatments, they'll The number of subtypes used to classify breast cancer will expand or change. All 200 different types of cancer behave differently and need to be studied separately to be tackled effectively in order to really get somewhere.

This means more research, different types of research, and more data. In general, the breadth and complexity of cancer research expands as we get closer to understanding its origins. And to get closer, we need something like cancer informatics. Gastrointestinal neuroendocrine tumors, parathyroid cancer, paragangliomas, phenochromosotomas, What

do you think cancer informatics is? Oh boy, uh, well, it has info in the word. So, information, uh, Answer what? Informatics. In [00:12:00] for matics. Uh, It sounds like it's tied to, like, information. Cancer information. No idea. Is it info or infer? Say the second word again. Informatics. It's like info. Okay. Informatics. I have no idea.

I've never heard that word before, so I don't know. I feel like it's probably just a fancy word for information based on data. Or maybe it's the information that drives the data, that drives the research. I don't know. I think it's when they take the data that they're collecting and they put it in some kind of code so it can be processed and then I assume from there they share it and other people can use it and things like that.

While cancer research itself has been happening for centuries, it's been only for a short time that we've had cancer informatics, which is the process of using data science and computer science to acquire, store, and use data [00:13:00] for cancer research. Much larger sets of data.

As our lab techniques have advanced and we've been able to more precisely study and understand cellular processes, the amounts of data generated from cancer research has exploded in size. Let's take breast cancer treatment as an example. In 1990, Mary Claire King and her lab demonstrated that breast cancer was heritable.

She estimated that inherited susceptibility affected 4 percent of families, and that women carrying the susceptible allele had a lifetime risk of developing breast cancer of 82 percent compared to 8 percent of the general population. After demonstrating the possibility of a genetic link to breast cancer, Dr.

King was able to pinpoint that some sort of breast cancer risk gene had to be present at the chromosome location 17q21. And really all that means is, picture on a map when we talk about latitude longitude, chromosomes have their own [00:14:00] type of thing. Based on pedigrees of families with unusually high prevalence of early onset breast cancer.

Just four years later, the actual gene, BRCA1, Braca, as people call it, was cloned, quickly followed by the related gene, its cousin, BRCA2. Dr. King's work to identify the BRCA1 gene location involved collecting genetic data from 329 individuals across 23 families. For each study participant, the King Lab measured 173 possible genetic markers.

If a study like this were to happen nowadays, researchers would likely generate whole genome sequences for each of the study participants. A single genome can take up to 60 gigabytes of storage space. For a study of 329 individuals, just storing the genomes themselves would require 19, 740 gigabytes of space.

And 329 individuals is a really small scale study nowadays. In the past, genetic studies with humans have been limited in size. The earliest lab techniques were revolutionary, but time intensive and expensive. Studies were small because often researchers didn't have the resources to make them bigger. As both the laboratory and analysis methods improved.

Researchers have been able to process and analyze more samples faster and for less cost. Sequencing the original human genome cost 2. 7 billion dollars and took 13 years. Now you can get your genome sequenced by commercial labs for 200 dollars and get the results in four to six weeks. Most of the time these days, a study will have tens of thousands of study participants, if not hundreds of thousands.

Some of the biggest research studies aim to enroll one million or more people. Keeping all the data stored and easily accessible in modern genomic studies is not possible without serious computational skills. It's not just genomics research that has experienced this change. Pick any field of cancer research and any type of data and the same pattern emerges.

We are generating far more data now than we have in the past. [00:16:00] The analyses themselves also require a lot more computational power and skills. What this means is that cancer research has fundamentally evolved from the stereotypical small scale wet lab experiments of the past. to large scale, giant, data set based research.

The real front line of our battle against cancer is actually computing. So, you can imagine, recording these data on paper is just not a feasible technique anymore. Now cancer research is dealing with a volume of data quite literally incomprehensible to people merely a few decades ago. Now every day, there's an unprecedented demand for even more analytical power.

Over the course of this season, we will explore just how much the era of big data and big computing is changing cancer research. We know what cancer research has looked like in the past, but what will cancer research look like moving forward? Join us as we talk to experts about questions like, how are these data shared?

Who gets to use this stored data and who gets to decide what sort of research is worthy of data access? How can we protect patients in our quest for more data and make sure that their needs and priorities are centered? How can we make sure that cancer research is benefiting everyone? and not just some.

In the next episode of Cracking the Cancer Code, we will dive into just how complex some of the cancer data can be.

Carrie and I started this podcast. We didn't exactly know what the theme was, but as researchers, we decided that the best way for us to go about this was to do a little research. And by that, I mean we interviewed maybe 20, more than that at this point, 30, 40, uh, researchers, patients, advocates, all folks who've somewhat had some interaction with cancer or cancer research.

And that's been a very enlightening experience. The collection of perspectives that we've gotten to talk to. has been really impactful and interesting. Even if we didn't publish this podcast and we selfishly kept all of those conversations for ourselves, Carrie and I would be leaving this experience having a lot of really cool things.

But that being said, It's been really neat. We're really excited to share these conversations with you as well. The conversations we've had with the researchers, support staff, government employees, advocates have been really meaningful. I've learned more from doing this podcast than reading countless papers.

I've learned about the human side of cancer research and I've learned how important that is. I would say this has been a really neat experience because we've gotten to learn from a variety of folks who are very passionate about what they work on. And not only have we gotten to like, be techie and nerd out a bit, um, but we've also gotten to talk about some really societally important problems and I would say overall what I've learned a bit is the tech part of cancer informatics is actually probably the easiest part and the most challenging part generally has to relate to how humans get through it.

So, as much as innovations are blazing forward, sometimes it feels like we have the same problems that we're encountering again and again. In interviewing many different researchers, one of the main things that came up time and time again is the team that people work with. Having a great team means a world of difference.

We've asked everybody, what are they most proud of? I don't think anyone said anything about the tech or tools that they built, although they certainly are excited about that. Every time, they've talked about the impact they've had on someone else. Or the impact someone else has returned to them. So as much as we are talking about tech and data, it's actually very reassuring that it really still comes back to the human condition.

As, as much as people these days are like swirling with anxiety about AI and Tools and how is everything going to change? At the same time that everything is changing, everything is also staying the same, which is a very cliche way to just say that humans are still humans. And that's a good and a bad thing.

It's a good thing in that all of the good parts of humanity still do persist and maybe are even getting better. But all of the struggles that humanity has are also persisting, and we need to be cognizant of that, that the tech is not going to solve those things, that we have to do the work on ourselves to fix those things.

This podcast is supported by the National Cancer Institute through the Informatics Technology for Cancer Research Program, grant number UE5CA254170. The views expressed in this podcast do not reflect those of our funders. We want to graciously thank all of the ITCR investigators and other investigator friends who graciously lent us their time for making this podcast.

Without their contributions, it would not be possible. We'd also like to thank Dr. Jeffrey Leak, the Principal Investigator for this grant.