Cracking the Cancer Code

Team Science!

ITCR Training Network Season 1 Episode 6

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 27:21

Send us Fan Mail

In this episode we discuss how modern cancer research increasingly requires collaboration between traditional lab scientists and computational experts from the earliest stages of study design. The episode challenges the common misconception that data scientists only analyze data after collection, highlighting how early involvement of computational experts can strengthen experimental design and ultimately lead to more robust cancer research.

0:08: You're listening to Cracking the Cancer Code, a podcast series about the researchers who use data to fight cancer. 
 0:15: I'm Doctor Carrie Wright, a senior staff scientist at the Fred Hutchinson Cancer Center. 
 0:19: I lead content development for the ITCR Training Network, a collaborative effort funded by the National Cancer Institute of researchers around the United States aimed at supporting cancer informatics and data science training. 
 0:31: And I'm Candace Savonen. 
 0:32: I'm a data scientist at the Fred Hutchinson Cancer Center and the tech lead for the ITCR Training Network. 
 0:38: We work closely with a variety of dedicated cancer researchers who are shaping the field's future. 
 0:43: Last episode we discussed what Rigorous science can look like for informatics research and why practices like code sharing and code review make us all better scientists. 
 0:53: Skill sets that people weren't trained on when they were in school are now key parts of our jobs, and this is true for even folks who went to school 5, 10 years ago. 
 1:01: Knowledge and research accumulates like a snowball if things are working right. 
 1:05: This means also that as our knowledge becomes more complex, the methodology needed to Accumulate more knowledge is also increasingly complex. 
 1:12: During our discussion with Roger Pang, he made a point about the role of experimental design and reproducible science that caught our attention. 
 1:20: One of the challenges that I've been thinking about is that, you know, most people seem to think that experimental design is very important, right? 
 1:25: Whether it's in a laboratory science or even in the observational studies or epidemiologic science, you have to be able to control certain factors and account for various confounders and things like that. 
 1:35: There isn't a corresponding kind of concern or even theory about like what should the analysis be once you've got the data, right? 
 1:42: Cause I think there's a sense that like, well, analysis is cheap if we mess it up, we can just do it again, right? 
 1:46: You can't do that with data collection usually. 
 1:48: So I think there's an increasing appreciation for the idea that like maybe we should think about how to do the analysis before we do it. 
 1:55: Because there's a lot of, I think, elements there that can contribute to if it's done poorly or incorrectly, they can contribute to reproducibility or replication problems. 
 2:03: So I'm more interested now in kind of like what is the analytical design for a given study and what should we think about before we kind of just dive into the data, because the data sets are so big and so complex now, if you just kind of dive right in, you can very easily make a number of kind of errors and things like that. 
 2:19: So. 
 2:20: Both computational and statistical. 
 2:21: So the bottom line and to my mind is like there's a massive kind of exponential increase in complexity of data analysis, very little time for people to catch up, and so people are kind of out there just doing whatever. 
 2:33: And that can lead to some problems. 
 2:35: Roger makes a really good point. 
 2:37: Data science is about more than just what we do with the data after it's collected. 
 2:41: It's equally important to think about informatics even before we collect a single data point. 
 2:46: Yeah, as it turns out, a lot of us have experienced this, if we've been in research long enough, that data analysis itself can't save poor experimental design. 
 2:55: So if you're starting off your research with the experimental design that a data analysis can't support. 
 3:02: It's not going to be something that you can massage later and matter of fact, you shouldn't massage it later. 
 3:08: So, a couple of things for us to think about. 
 3:10: Are we measuring what we think we're measuring when we're designing our experiments? 
 3:15: Are we thinking about this appropriately? 
 3:17: Have we collected and analyzed enough samples to actually measure the effect? 
 3:21: Is it going to be a large enough effect size to really have the statistical power to be able to see that effect? 
 3:28: Did we measure everything that we need in order to control for different confounds? 
 3:33: So if we processed one group of samples on one day and another group of samples on the other day, we need to make sure that, for example, there's a control and an experimental group. 
 3:44: We got to have those things mixed up, or we may not mitigate for the impact of bias that comes from humans doing work like perhaps if we don't have an inclusive of enough data set. 
 3:56: To really capture what's happening with a variety of different types of people. 
 4:02: And all of these things need to be considered before any data is collected, before any pipettes go anywhere or any kind of samples are collected from any patients, anything that's happening in order to get this data, we need to be considering all of these things upfront and preferably with the help of some statisticians. 
 4:23: Or a data scientist, or an informaticist, any kind of expert who is collaborating with someone, because it's again, a very multidisciplinary field and they've got to have teamwork in order for science to be done properly. 
 4:38: With multidisciplinary teams, we really need to talk early and often, and really get to know what the other experts in our research teams are doing and what they suggest we need to consider before we start doing our. 
 4:51: Research. 
 4:51: We've used the word multidisciplinary, just such a mouthful, and we've used it a few times now, because it is a very good descriptor. 
 5:00: There are so many groups of people of different training of different expertise who are coming together to try to make cancer research move forward for the benefit of cancer patients. 
 5:13: So it's a lot to come together, but what's at the core of it all is communication. 
 5:18: And being able to talk properly and effectively with your collaborator. 
 5:23: I think some of the most successful collaborations I've seen are collaborations where the other person wasn't afraid to say, what,, what are we talking about? 
 5:31: I don't really understand. 
 5:33: And let's, let's dive through this together so we can all kind of understand and bring our expertise to the table. 
 5:39: Data scientists like ourselves also often overemphasize the data analysis phase too, even. 
 5:46: And in this podcast about data science and cancer research, up to this point, we haven't really talked about the full role that data scientists play in research. 
 5:54: We've mostly talked about data in the analysis phase. 
 5:58: In this episode, we wanted to really dive into the value of data scientists bring to all aspects of cancer research and why collaborations between the traditional wet lab scientists and the computational scientists are integral to every step of a research project. 
 6:12: We've thrown around. 
 6:13: A few terms about bioinformatics, data science, computational biology. 
 6:19: So what really is the difference between all of these? 
 6:22: Well, there's a bit of a Venn diagram and they do overlap. 
 6:26: Data science is typically broader, but describing the use of data, so that could be for finance, for biomedicine. 
 6:36: It's basically how do we work with data and try to interpret meaning from data. 
 6:41: Informatics on the other hand, is using data that's related to biomedicine and typically related to genomics or imaging. 
 6:51: , it can also have to do with biology in general, like looking at ecology, for example. 
 6:57: And finally, computational biology is a little bit more broad than bioinformatics, and that can refer to more things than just sort of interpreting and utilizing data, but has more to do with more complex computer science-based research that we might do. 
 7:15: And here's the thing about these terms, they do have differences for sure, but here is why you don't need to be too hung up on them is that depending on the audience or the grant or the presentation or the person I'm talking to, I might use a different way to refer to myself. 
 7:32: Some people might have a more heavily leaning way to refer to themselves, and some people might think of them. 
 7:36: So it was more as a bioinformaticist if they, like, for example, like myself, I think I think of myself more that way because I worked with a lot of genomic data and then some people might think of themselves more as data scientists if maybe they're not even really totally constrained to biology, maybe they think of themselves as a biological data scientist, and sometimes people think of data science too is having an overlap with both analysis and methods. 
 8:01: So yeah, they're all the same and they're all different. 
 8:04: The field of cancer informatics might have started small, but there's no doubt that today it's grown into an active large research area that's attracted many scientists with a huge variety of expertise, including people with computational or math backgrounds like Doctor Fertig. 
 8:24: So, I'm Elana Furig. 
 8:25: I'm the Associate Cancer Center director at Johns Hopkins for quantitative Sciences, and I'm a computational biologist. 
 8:33: I came to this field from a background in weather forecasting and multimodal data integration and weather forecasting. 
 8:39: And what I found really excited when I entered cancer biology and the sort of typical arrogant mathematician, I'm like, oh, I'll do this for a few years, and I'll figure it out, and then I'll move back. 
 8:49: I'd say we don't know what the fundamental equations are. 
 8:52: Of cancer biology. 
 8:54: We have a whole bunch of variables. 
 8:56: I'm not even convinced we have the right variables. 
 9:00: So we don't know what the right variables are. 
 9:02: We don't know what the right equations are. 
 9:04: And it's a highly dynamic evolving system. 
 9:06: And so it really does feel like one of the frontiers where we have the ability that if we could learn that, we could understand the disease and get interceptions a whole lot better, but we have the ability to do fundamental scientific discovery through. 
 9:21: Trying to solve that problem. 
 9:22: So that's what I'm really excited about and that's what's always kept me here. 
 9:26: So we've talked a little bit about how there's been this historical division of the wet lab versus the dry lab. 
 9:32: But really things today are moving in a direction of combining these and people are learning now as graduate students and postdocs to really be more of a hybrid type of researcher who knows aspects about both. 
 9:48: Right, there are labs. 
 9:49: All across the continuum, really. 
 9:51: Some don't have any pipettes and do all everything on the computer. 
 9:54: , so they're computational heavy. 
 9:57: And then there's other labs who everything is in the lab in the traditional sense of when we say the word lab, where they're pipetting things, their hands are on samples of some sort, maybe they work with mice or cell lines, but their lab and their work can exist anywhere on that continuum. 
 10:13: There's some labs who do both. 
 10:14: And I always found that super impressive to have like kind of Leg in each camp because I think those labs are kind of translating for other labs as well. 
 10:23: So that as a whole community of scientists, everyone's work is the best it can be. 
 10:28: And when Canice says pipe pet, she's talking about these instruments that pick up a very tiny, tiny small amount of liquid. 
 10:37: And in genomics labs, we use these a lot. 
 10:41: A grad student or a postdoc or a lab member will spend a lot of time. 
 10:46: Transferring small amounts of liquid into tiny little tubes. 
 10:50: We have both been there. 
 10:52: Yes. 
 10:53: Carrie and I both started out as wet lab people who kind of found the data and the data heavy side of the world and found it just fascinating. 
 11:03: And I think we both kind of, I won't speak too much for Kerry, but for myself, I found that really fascinating and just continue to pursue the computational part. 
 11:11: But I guess what I'd ask Kerry is like, first of all, would I love to hear your experience of the wet and dry labs, but second of all, like, where do we think things are going to go? 
 11:20: Like is everyone going to be a hybrid lab? 
 11:22: Is everyone going to be like kind of specialized one way or the other? 
 11:26: What do you think? 
 11:26: I think it's going to stay a mixture. 
 11:29: I think there still are going to be specialists on both sides that really focus, but I do think there's going to be a lot more labs, the majority of labs are going to have more of a hybrid approach because technology. 
 11:43: is really leading us in that direction. 
 11:45: There's so many opportunities now to collect really useful different types of data, and we just need that expertise to be able to utilize it. 
 11:54: And so from my experience, I kind of came from a more wet bench side of things in my undergraduate, I've been doing wetbench research. 
 12:02: I started out in a lab doing that in graduate school, and then actually changed to a lab that did a lot more computational research because I was really interested in that. 
 12:11: And then in my postdoc, I kind of had a hybrid approach where I collected my own data, but then analyzed it myself, which was really cool to do, but then have since transition to being more on the computational side. 
 12:25: Yeah, and I agree with Candace that having both of these experiences has been really valuable for being able to communicate with people on both sides and understand where they're coming from. 
 12:36: Right, and really it makes you kind of practice that communication. 
 12:40: Nobody can be an expert in every single thing, and that's why again, science has to be this teamwork thing. 
 12:46: And I think that's what Dr. 
 12:48: Fertig has been really good at doing is bringing her mathematical background in a way that everyone can apply it to their work. 
 12:58: We asked Dr. 
 12:58: Fertig, why so many people seem to think that computational biologists are really only analyzing data and not working with the data earlier. 
 13:09: Yeah, I mean, when I started, it was definitely predominantly a service field. 
 13:14: There's no question. 
 13:16: It was predominantly, because I started in the microarray era. 
 13:19: The experiments were much smaller, and a lot of the research was focused around how do you work with the technology. 
 13:26: And like, yes, they needed us to do the analysis, but it was much simpler, where you were basically focusing the research efforts on the pre-processing, and then you basically aim to reduce everything to a T test at the end of the day, more or less. 
 13:39: So I think it sort of grew out of that. 
 13:41: But we really are, I think now in an era where a bioinformatician can act independently as an independent investigator to learn fundamental biology from just data in the public domain. 
 13:54: And I think that's changed a lot of the field. 
 13:56: And I would argue at this point that I think basic bioinformatics, basic differential expression analysis, things like that, I think are in today's labs as basic. 
 14:07: The tool is pipetting. 
 14:08: And I think we're getting to a point where I don't know how much of this divide between computational and experimental is an artifact, and the more modern the next generation of cancer research is just going to be more hybrid. 
 14:21: I think that's gonna happen and everyone's gonna be, I think we're at this awkward phase where there are two disciplines trying to find and crossing. 
 14:29: As a researcher, Dr. 
 14:31: Fertig and her lab work on methods for analyzing single-cell data as well as developing Algorithms that can predict the therapeutic response of a treatment. 
 14:39: In this case, algorithms usually just mean like a mathematical pattern or a thing we can apply to find patterns. 
 14:46: She is an expert in working with huge complex multi-omic data sets. 
 14:51: Multiomic just means that it's got things from RNA and DNA and all kinds of different biological tissue origins, all kind of mixed together in a way that's hopefully meaningful. 
 15:04: And these multi-omic Data sets require careful analysis, as you can picture. 
 15:08: They're very complex. 
 15:09: And so in order to parse through all that, you need to be able to identify the biological signal from just technical noise. 
 15:17: I like to say this as a joke, but maybe the tech sneezed and a little bit of the sneeze got in one of the tubes, right? 
 15:24: Like there's things that cause noise. 
 15:27: That's a very visual explanation. 
 15:28: It's not a very typical one, but we can picture that there's all kinds of things that don't really relate. 
 15:33: To the question that we're analyzing, but might end up in the data. 
 15:36: And so like many bioinformaticians, Dr. 
 15:39: Fertig frequently collaborates with wet lab scientists in addition to running her own research program. 
 15:44: So we asked her with her unique perspective, what's the biggest thing she focuses on when she starts a new collaboration. 
 15:51: Yeah, so, I mean, there's two things that I want to tease apart there. 
 15:55: One is the, oh my God, how do we consider all of this data and the fear of sort of entering data. 
 16:00: And I would like to say that. 
 16:02: Always breaking it down in terms of what's the experiment and what's the unit that you're trying to measure is really valuable both from the analysis side and the interpretation side. 
 16:13: So whenever I sort of start out with colleagues and they start jumping and they're like, yeah, I'm gonna run an atlas and I'm going to do all this profiling, I'm always like, OK, back up. 
 16:21: If you were to do flow cytometry, how would you analyze this problem? 
 16:25: I typically, with most collaborators, won't work with them until they on a data set and will encourage them not to generate data until they can answer that question, because then it becomes an issue of you've tailored this down to a very elegant experimental design, and then you're going higher dimensional in terms of what you profile. 
 16:45: It's always my preference to have a data set that does that, and I realized that's not on the analysis method side, but I think we undervalue the role of data scientists in study design and in what we can do on the onset. 
 17:00: And I would much rather have a data set where I'm almost irrelevant on the analysis side because we've put so much effort into the design that most tools will be able to uncover it. 
 17:10: So I think that's a big part of what I view my role and what I think is a value add of a data scientist. 
 17:16: I think people tend to think of us as, oh, I got my data. 
 17:19: I'm gonna dump this thing on you, and you're going to run like whatever magic thing or blender that you've got, and then you will give me the answer. 
 17:25: And I find those projects never work very well. 
 17:29: The importance of study design, I think is something that is undervalued and the importance of methods too. 
 17:35: I think that's another one that I've seen this one more recently where I'm surprised by this, that for a computational person, it's very obvious if the methods are wrong, that the whole thing is just wrong. 
 17:46: If you have the wrong statistical test, you're not doing the right thing with the data. 
 17:50: And I've seen that the, the statistical foundations and the computer science and the mathematical assumptions. 
 17:58: Behind the algorithms, I think people tend to black box more than I would like. 
 18:03: And so I think to some extent we may have made the software too easy, that we've taken out some of the, where should you or shouldn't you be using this that does require the computational depth to understand. 
 18:15: Of course, it's not just the wet lab biologists who need to become familiar with new topics in research collaboration. 
 18:21: Scientists from computational backgrounds also have to get comfortable with the nuance and uncertainty. 
 18:26: That is inherent in biological research. 
 18:29: Biology is very messy, and that can be overwhelming and challenging to someone who's new to the field. 
 18:35: When you come from a computational background, everything's clean. 
 18:38: There is a truth. 
 18:39: You are taught to write things in terms of,, these are your assumptions, you prove it, and there exists a truth. 
 18:46: And there's a solution, and you can find it. 
 18:48: And if you haven't found it, you've done something wrong. 
 18:50: And so I think the computational side, that can be, I think, one of the biggest learning hurdles for. 
 18:55: reputational people going into biology that I think a lot of people don't appreciate is that there could be a discipline where there's a right or wrong answer, because that's just so not the case in biology. 
 19:06: And getting comfortable with that nuance is very hard, I think, for computational people, right? 
 19:12: We have a tendency to underestimate the complexity for specialties, we don't know. 
 19:17: Some educational efforts are not just practical for skill sets, but to improve communication efforts and recognize That sometimes we just don't know what we don't know. 
 19:27: It can be very easy to be like, well, why don't you just rerun the samples and just, just, you know, it's just easy, just do the thing, but that's not necessarily being accurate or even respectful of recognizing what someone else's expertise that they are brings to the table. 
 19:41: Yeah, I've definitely seen this on both sides where we bench researchers think that computational work should be really quick and easy because it's all on the computer. 
 19:52: And I've seen the opposite where Computational biologists think that the experiment can be repeated, not knowing how many resources go into that experiment or how much time. 
 20:03: Some of these experiments take multiple days that are over 12 hours a day to collect that data. 
 20:10: Both have a lot they bring to the table, but also have things that they're learning too, and it's just as overwhelming, no matter where you're coming from, to pick up a whole new skill set. 
 20:20: And so just being patient with our Collaborators, especially if they're coming from a different background than us, is I think just the key to making this all work. 
 20:28: Yeah, and that willingness to learn. 
 20:30: If we open up and start having conversations about the nuances that everybody brings to the table and their awareness of the data, this can really strengthen the rigor of the science. 
 20:42: Dr. 
 20:42: Fertig still thinks there's a place for researchers who specialize in computational work, even as everyone becomes more proficient with basic data science. 
 20:50: Both the specialists And the generalists are important parts of the informatics team, and it's vital to know when to bring in a specialist. 
 20:57: I really appreciate the education network to make it more accessible to people. 
 21:01: And I think that's so important to break it down. 
 21:03: But the flip side of that is people learn how to do a thing, and then they think it's easy and doesn't have the expertise or need the expertise to do that. 
 21:12: And I, I don't know where the right balance is because I can't take on all the projects that people need bioinformatics help with. 
 21:18: There's no human world in. 
 21:20: Which that can remotely happen. 
 21:21: The only option is to empower people to do it. 
 21:24: But there also needs to be a point where people are learning their limits and learning the difference between being able to get something up and running and the diversity. 
 21:34: And I don't know what that line is. 
 21:36: In our work, we run into a lot of people who fit into a term called the lonely bioinformatician. 
 21:41: These are researchers or trainees who are the only person in a lab with knowledge about programming or computational work, which means they frequently become the lab's go to for any and all bioinformatics tasks. 
 21:53: The lonely part of the lonely bioinformaticians certainly puts a picture on it because I think this starts to get to where a research issue not only is the, you know, affecting the quality of the work, but it also is probably affecting the mental health of the person who is trying to do that work because it can be very difficult to feel isolated and not know who you can ask questions for, you know. 
 22:18: Academia for better or for worse, does work on this kind of hierarchical structure where somebody has their mentor or their mentee, and that is actually the mentorship part of academia. 
 22:28: It really is beautiful. 
 22:29: However, if a person doesn't have that mentorship or doesn't know where to go, and all the department is depending on them to get their data analyzed, that could be very, very stressful. 
 22:41: So along these lines, we asked Dr. 
 22:42: Fertig what she thought the field could do to support these lonely bioinformative. 
 22:47: And why she thinks this role is so common in the cancer research community and really all biological research communities. 
 22:54: Yes, I have a few thoughts on this. 
 22:56: One is I don't like the term lonely bioinformatic because it implies the fault is on the bioinformatician as opposed to it being a systemic thing. 
 23:05: It's sort of like imposter syndrome where you're putting the fault back on the person who's being isolated. 
 23:10: I prefer the term isolated bioinformatician because I think it's an active thing that the Institutions are doing. 
 23:17: I think we need better institutional supports for people, and we need better homes for people to come to as a community who are in this group. 
 23:25: So, for example, one of the things we've done in our program is we've started a joint lab meeting for anybody doing bioinformatics regardless of if they're in a dry lab or a wet lab, so that there's at least a community they can go to for best practices and have everybody together. 
 23:40: I think there's two issues with this. 
 23:42: One is the community building, and the other is that you're setting up your faculty to have different classes of faculty where the computational faculty becomes servants, basically, for the biologists, as opposed to being independent investigators with bi-directional collaboration for their own right. 
 24:01: So, if you're the isolated faculty bioinformatician in that role, what do you lose? 
 24:06: You lose the ability to choose which projects you do because you have to make everybody in your group happy. 
 24:11: You lose the time to do your own work and something that you're going to be recognized for, which is going to hit you at promotion in 10 year time. 
 24:19: And there's ways to get around that with team science, but then your skills are also going to stagnate if you're just always doing analysis for other people. 
 24:27: And the focus is on how do I use existing tools as opposed to how do I push the cutting edge on that. 
 24:34: So I do think we need to push back on departments that are asking for that and just ask them and say, is this a reasonable expectation for a tenure track fact? 
 24:43: What would you say if I were to say as your computational biologist, oh, I need somebody to do experiments for me, go run all my mouse work, go hire a faculty member who can do all the mouse work for my lab. 
 24:52: I think people would very quickly tell me, Oh, build up your lab, get an RA, get somebody non-tenure track to do that. 
 24:57: And I don't know where this notion came in that computational faculty should be serving in that servant role. 
 25:04: So I think really respecting us as scientists, respecting the methods we develop as academic products. 
 25:11: Respecting the work we do in a creative data analysis as a product and really defining what are the rules for meaningful co-first and co-senior authors. 
 25:22: What are the bi-directional collaborations so that I'm not just doing data analysis for somebody, but I need an experiment to validate one of my methods, you'd also be willing to do that in the same way that I'll go to bat for you for analyzing your data. 
 25:36: So, I think that the community expectations desperately need to change. 
 25:45: So this is all really keeps coming back to communication and education. 
 25:50: We've talked a lot about this in other episodes and that research is really a community effort. 
 25:56: I think one of the most surprising things about doing this podcast is seeing how often this theme comes back to being able to talk to people with different expertise. 
 26:06: And this is something that's really important to the ITCR training network, and we have materials about this because we really want to support all types of trainees in biomedicine and cancer research. 
 26:18: And really, these people who are doing this type of bioinformatics work and they're kind of alone in their lab, they need support too. 
 26:26: And they need additional mentors outside their lab that can help train them and mentor them to grow in the direction that their career is growing. 
 26:37: So in our next episode, we will explore the importance of an often overlooked part of the cancer informatics research team, and that's the administrative staff. 
 26:49: Thank you for listening to Cracking the Cancer Code. 
 26:51: This podcast is sponsored by the National Cancer Institute through the Informatics Technology for Cancer Research Program, grant number UE5CA254170. 
 27:02: The views expressed in this podcast do not reflect those of our funders or employers. 
 27:06: We especially want to thank Dr. Elana Fertig for her time.