Empowering cancer research through data - ITCR Artwork

Cracking the Cancer Code

In an age where data are everywhere, harnessing the power of data science can be a catalyst for groundbreaking discoveries in the fight against cancer. Welcome to the Cracking the Cancer Code podcast where we explore the latest in cancer data science. As a part of the ITCR Training Network (itcrtraining.org), we’re a small team of individuals who are working to democratize data science education in the hopes of catalyzing cancer research and ultimately fighting health inequities in cancer.

The ITCR Training Network (and this podcast) is supported by NCI UE5CA254170 but the views expressed on this podcast are those of the individuals who expressed them and do not reflect the views of our funders.

Find out more about the ITCR Training Network at https://www.itcrtraining.org/

All Episodes

Cracking the Cancer Code

Empowering cancer research through data - ITCR

November 04, 2024 • ITCR Training Network • Season 1 • Episode 4

0:00 | 22:54

Send us Fan Mail

This episode discusses the dramatic increase in data use in cancer research over the past two decades. It highlights how the data revolution has democratized access to data, creating new challenges in data management and analysis. The episode features insights from Dr. Jeffrey Leek, a Chief Data Officer, and Dr Juli Klemm, the program director for NCI's Informatics Technology for Cancer Research (ITCR) initiative. The discuss with us the importance of data infrastructure and education. The ITCR supports the development of informatics tools for cancer research, addressing the growing need for specialized software and expertise in the field.

0:00: The number of different people that interact with data systems in biomedicine now is basically everyone in biomedicine.
0:12: You're listening to Cracking the cancer Code, a podcast series about the researchers who use data to fight cancer.
0:20: I'm Dr Carrie Wright, senior staff scientist at the Fred Hutchinson Cancer Center.
0:24: I'm head of content development for the I T C R Training Network.
0:28: A collaborative effort funded by the National Cancer Institute of Researchers around the United States aimed at supporting cancer informatics and data science training.
0:36: I'm Candice Savinon.
0:37: I'm a data scientist at Fred Hutchinson Cancer Center and I'm the tech lead of I T C R training network.
0:43: We work closely with a variety of dedicated cancer researchers on the forefront of cancer informatics and are shaping yields future.
0:50: Last episode, we talked about the challenges and importance of sharing data and cancer research and how important community is in driving those initiatives.
1:00: We also talked about how data science, good data infrastructure and data management are essential to help create effective cancer research programs.
1:09: Cancer research has evolved to demand more and more data and these data need to be shared the way they're shared isn't necessarily straightforward, not just the amount of data has grown as we've seen from previous episodes, but the different ways that we interact with data as well as different types of data has changed over time.
1:26: We're still at the very beginning stages of seeing how the data revolution is impacting cancer research and ultimately how that's going to affect how cancer is treated to learn more about this.
1:36: We spoke with Dr Jeffrey Leek, a data scientist who has witnessed these changes.
1:41: Firsthand.
1:44: I'm Jeff Leek.
1:44: I'm the Chief data officer at the Fred Hutchinson Cancer Center and a professor in the biostatistics program there.
1:50: What does a chief data officer do at a research institution like Fred Hutch Cancer Center, the Chief data Officer, which is a service role to the institution.
1:58: And really the goal is to build an infrastructure so that people can host data in the right places with all the right control in place data governance to enable people to get access to the data.
2:08: They need to in a way that complies with all the relevant laws and regulations and then builds them cool tools and support systems like our training programs that we develop so that everybody can take advantage of the data operation, not just the really technical researchers and staff and students and faculty.
2:25: And it's a really interesting role for somebody like me who spent their whole career, really being one of the people that benefits from these systems to be the person that's supposed to be designing it and maintaining.
2:35: It has been an interesting change of scenery and challenge.
2:38: How is data science and bioinformatics changed in the past 20 years?
2:42: Yes, I have observed a lot of those trends.
2:44: I think going all the way in the way back machine to what I really started my career as a beginning graduate student.
2:49: There wasn't data that you could access easily for almost anything unless you were part of like a really big project.
2:56: You were not getting access to like a big genomic data set.
3:00: They were extremely expensive to generate.
3:03: Only a small number of people had access to them.
3:05: Even the really big genomic data sets now would fit in like would still fit in the CS V file that you would store on your laptop.
3:12: Not like the petabytes of data we're dealing with now.
3:16: So let's take a moment to talk about what a petabyte is our average laptop today.
3:24: A typical nice one has about one terabyte of storage.
3:29: This is actually an incredibly large amount of data if we look at the history of computers, but one petabyte is over 1000 terabytes.
3:37: So we're talking about a lot of computers worth of storage.
3:43: And so data was hard to get and it was sort of a precious resource in that way.
3:49: And it was sort of a challenge to interact with it in any way other than through a system that had already been set up to, to share that data.
3:56: Then there was sort of a series of things that changed over time.
3:59: And I think one of them was that data became cheaper and cheaper to produce.
4:03: So there was just more of it around which changed culturally how things worked.
4:07: Like people who were in labs, who before the data was like a western blot now is like they have a bunch of sequencing data, like it changes the way that whole lab works.
4:16: And I think a lot of different areas have had that happen where in the span of my career, there has been people that have gone from like no data to like massive amounts of data in their lab.
4:25: And you know, that's a pretty abrupt change as science goes, what does this data revolution mean for the people who build data analysis tools?
4:34: It has real implications for I T C R type developers or other people that develop software because their user base is much broader, which is great for them, of course.
4:43: But it also means they might be less well trained in using software, less well trained in analyzing data.
4:49: And so thinking about how do you support a broader, more democratic access to data analysis and data management is a total culture shift from how it felt like when I started my career.
5:02: So that's been so yeah, so it feels like there's been this consistent trend towards things being different.
5:08: And I'll just say, you know, it's felt like an exponential curve.
5:12: Like it, it felt like at the beginning there were rumblings of this kind of stuff and now it's like, feels like it's accelerating and with the A I stuff even more.
5:18: So it's sort of really accelerating to a point where it's tough to keep up with all the culture change that's happening.
5:23: I think it's making a lot of people like really nervous about how the world works.
5:27: How has this data revolution changed biomedical research?
5:30: The number of different people that interact with data systems in biomedicine now is basically everyone in biomedicine, whether you're like a person who's interacting with epic and the charts that it produces or you're a person that's doing an analysis of a big set of genomic data or, and everything in between.
5:45: Basically everyone is doing data science work now.
5:48: And so because of that, you get this huge variation and some people totally know what they're doing in their context, but they don't quite know all the legal regulations.
5:58: So how do you support that person?
6:00: Some people know, you know, all of the compliance and legal stuff, but really don't know how the technical systems work.
6:05: So a big challenge.
6:07: And I think one of the interesting things about the group we're building here is trying to find translator, people who can sit at the boundary between really different disciplines and really different kinds of people who are coming to work to do their job and they really care about doing their job, but data is a part of their day to day and they need to figure out how to do it.
6:24: Well, you're a CD O which stands for Chief Data Officer and involved at a high level in running a research institute.
6:32: Why is someone in your position involved in educational initiatives?
6:36: One of the things that's been really interesting has been how much of my role I feel and a lot of the people in our group, their job is education.
6:45: I understand this technology.
6:47: I understand how it relates to this legal thing.
6:49: I'm gonna go to the meeting with a person who's a pure technical person and a legal person.
6:54: And my job in that meeting is just to educate as to what this tool is in a it can enable people to do or not enable the people to do and connect the dots between the legal requirements and the technological requirements.
7:06: And so yeah, it's interesting how much education plays a role throughout the whole process, everything from like very senior executives who have to make decisions about data to like really in the weeds, people who are like, I just wanna like, where do I store the backups for my data?
7:22: Which database do I store that on has some important implications and all of that requires education.
7:30: So I, yeah, I think that's part of the reason why I've spent so much of my career thinking about education is I really feel like when you're in the midst of multiple revolutions, the only way to keep up is like very rapid form educational triage.
7:42: So it is, it's a really, it's way more complicated than just building a database and handing it off to people.
7:52: So we've talked about how science today relies on scientific software to handle these massive amounts of really, really important data with so much data.
8:01: A lot more people with a lot less formal training in computer science and writing code are trying to use software that can lead to a lot of headaches.
8:09: A lot of looking at things online, trying to figure out what's the proper way to do this for folks who don't really necessarily have the fundamentals in computing or all these other kinds of skills that informatics need.
8:23: Yeah, it's really a self taught field in a lot of ways.
8:26: And that leads to a lot of what I like to call Swiss cheese education.
8:31: So people end up with some holes that they don't even necessarily know about in terms of what they need to know.
8:37: Another piece of that also involves skills related to developing software.
8:43: A lot of scientists need to create a tool to help them do their research.
8:48: And then this involves them creating some software, but they are usually quite new at creating software and they do not necessarily have training on how to create user friendly accessible software for others to use because they talk about that phd S which is a research center degree is generally looking into a topic that is very niche, very on the forefront of a field, which means that you're probably going to look at data questions that are very unique.
9:18: No one else has probably looked at that.
9:19: That's super exciting but also super tricky because now you're going to probably have to create software in order to address those data questions that you have that nobody else has asked.
9:29: And often when scientists create this type of software, it often is really useful for other scientists trying to do similar research.
9:37: And so funding for software is really important.
9:39: It takes a lot of time to create software, to maintain the software, to advertise and let other researchers know about it and to help make it more user friendly over time as users use the software and find issues with it.
9:53: Software development takes a lot of time.
9:55: Sometimes it takes computing resources otherwise and training and other funds.
9:59: But a lot of what it takes is people's time and not only when they create in the beginning but to maintain it over time and to make it iteratively better in the US, then the National Cancer Institute, which is part of the National Institutes of Health N I H has created a special initiative called it T C R which stands for informatics technology for cancer research, to support bioinformatics work for cancer research.
10:24: How the it cr works is that people who are interested in doing informatics based cancer research or want to create a new software tool will apply for funding from the I T C R.
10:37: And if they write a proposal that meets the needs of what the community is looking for.
10:43: At that time, the I T C R program will give them some money to work on their research and their software.
10:50: We were very fortunate to speak with one of the program officers of the N C I for our show, Dr Juli Klemm is responsible for running the I T C R program.
11:00: So what does ITC do?
11:02: So the primary goal of it cr which stands for Informatics Technology for cancer research program is to support the development of informatics tools and software for cancer research.
11:16: And the way we accomplish this goal is by supporting methods and software that are closely tied to the data generation and research activities they support.
11:26: Because really if you look at evolution of software for biomedical research and specifically for cancer research, you see that the advances in the software have really paralleled the advances in the measurement technologies and the research directions.
11:41: So we feel that creating scientific software again, specifically software cancer research, that's truly meeting the needs of cutting edge science needs to really be embedded in where that cutting edge science is happening.
11:56: And so we are really committed to tying the tool development to these driving needs in cancer research.
12:03: Why does the N C I find it's important to support bioinformatics work like software development?
12:08: Yeah, informatics experts and data science experts.
12:11: There's increasing demand for this expertise in cancer research and the approach for getting recognition for that work is a little different from hypothesis driven research.
12:21: How do we ensure that there is an appropriate profession path for individuals who are pursuing a career in scientific software?
12:31: Because this is such an important resource for our community.
12:34: How do we ensure that they get the credit that they need to continue to advance in their careers?
12:40: And how do we ensure that we provide the appropriate incentives for individuals with those skills to continue to work in in the academic environment where so much of the scientific innovation takes place.
12:53: It was really in response to the increasing need for software tools for cancer research.
12:59: And in fact, Dr Dana Singer, who's now N C I S Deputy Director for Scientific Strategy was really one of the visionaries for the inception of this program.
13:08: And in 2012, there were a couple of reports that were published that I think also had a particular influence on the shaping of I T C R.
13:16: And there were some common recommendations in these two reports that both recommended or saw the need for enhanced support.
13:24: For community driven software development, the need for a strategy to support scientific software through the development life cycle.
13:33: As well as this is what we talked about before peer review panels with end users and domain expertise that had the appropriate knowledge to review proposals that and evaluate them for whether they were meeting an important cancer research need.
13:48: And really these key recommendations informed the development of I T C R and really still underpin the goals of the program today.
13:58: How many projects has I T C R supported?
14:00: I don't think at the beginning, we realized how much the program would grow and honestly, what the demand would be.
14:07: Now at any one time, a steady state will have 60 or so active grants.
14:12: At this point, we funded about 100 and 50 awards since 2013.
14:17: So I don't think at the time anyone really envisioned how large and successful the program would be.
14:24: And I think that's really a testament to both how it was structured and as well as to just the ever increasing demand for software for cancer research, the I T C R as well as the N C I.
14:38: It's umbrella funding institution is all funded by tax dollars in general.
14:44: So the reason you should care, not only just because cancer research is important because cancer is a ubiquitous disease, but you should also care because you're actually helping pay and fund for this research.
14:56: And these really cool software techniques that are going to hopefully lead to better cancer treatments.
15:03: What sort of software and tools does the it cr fund when we solicit project proposals to it, cr we really do our best to reach out to all of these constituencies who are developing informatics tools who may be looking for dedicated support for tool development.
15:24: And so as such, we get a huge diversity in the types of cancer research projects that are being proposed for informatics support through these projects.
15:35: If you look at the portfolio today, you will see where we have.
15:39: For example, we have a lot of tools we have supported in the area of genome and transcript.
15:46: Do we have a very strong portfolio in radiology, imaging pathology, imaging analysis tools?
15:53: I think where we'd love to see more applications and fund more is probably in the area of epidemiology and population sciences.
16:01: I think that's where we have a lot of opportunities to provide more support.
16:06: So through I T C R, the National Cancer Institute has focused on supporting the development of many really valuable tools for the cancer research community.
16:15: And beyond, some really popular examples are tools like galaxy or bio conductor.
16:22: If you've dabbled a little bit in cancer research, you might have heard these names before.
16:27: Galaxy is a cloud computing service online that doesn't necessarily require programming knowledge because we said about how tricky it is to learn programming.
16:37: If you don't have that background.
16:39: However, it does allow you to still write your analysis and handle your data in a way that's transparent for others to look at and you can share it in your manuscripts really easily.
16:50: Bio Conductor.
16:51: On the other hand, is a large project that's supporting the development and sharing of open source software, especially when it involves the programming language R or informatics and it creates all sorts of really helpful software to help with a variety of different informatics challenges.
17:12: Yeah, bio Conductor is actually really well named because what it helps you do is orchestrate your biological data.
17:19: So both Galaxy and Bio Conductor, as well as many of the other I T C R funded tools both help cancer research but lots of other biomedical research.
17:28: So psychiatric research, rare disease research, you name it.
17:33: So while the I T C R is supporting huge tools like this as well as smaller teams, developers developing all kinds of cutting edge tools, they're also supporting initiatives to build the cancer informatics research community.
17:46: So who is a part of that cancer research informatics community, a collection of researchers, clinicians, professors, patient advocates, and also yours truly.
17:57: And Carrie I T C R also works to expand the accessibility of cancer informatics by funding projects that help educate learners about informatics.
18:09: What makes the ITC R different from other funding initiatives for software?
18:14: What's really unique about it?
18:16: Cr is its focus on developing tools that are intended for use to a broader community of researchers.
18:24: Often informatics tools have their genesis within a particular research project based on a very specific need.
18:32: And then that need is seen by other scientists in a similar area who then, you know, begin using the tool as well.
18:39: That tool is disseminated and becomes more widely used.
18:42: Then there's the need to, as a tool gains in adoption, there's a need to have make it more robust and build those aspects into the tool that are really needed.
18:54: When you're starting, it's starting to be used more broadly that Q A of the of the software, making sure that it handles errors correctly, it doesn't crash easily.
19:05: And so that kind of software engineering mindset is something we emphasize in the I T C R.
19:11: But opportunities as well as the need for dedicated support such as training materials, documentation, a website and those, those resources that are almost as important as the software itself in order for it really to be useful to and usable by the community, it's intended for, I looked at a Twitter thread once and there was a long thing about what is a class that you never took.
19:50: But now the contents of that class are a core part of your job.
19:54: And one of the top ones that I noticed people responding with were programming.
19:58: So many people, researchers in particular have to learn how to program because these technologies are rapidly changing.
20:05: And maybe by the time that they were in school, the tech analogies that they're using and the data types that they're using didn't exist.
20:11: But now it does and they need to learn how to program quickly so they can get to the next stages of their research.
20:16: And we're not even talking about someone who went to college 30 years ago, we're talking about someone who went to college 10 years ago, the field is rapidly changing.
20:27: I went to college and I'm going to date myself in 2009, just not that long ago.
20:33: And I too had to learn how to self program because there were no data science programs at my undergraduate institution and not really anything like that.
20:42: And my story is not that uncommon for people of all ages.
20:46: Yeah.
20:46: My career journey has also involved a lot of self training.
20:50: I was lucky enough to come to realize that I wanted to do informatics research in graduate school.
20:57: So I did actually have some courses related to informatics or computer science.
21:03: But even with that, there was a lot of material that the professors would overlook because they assumed that I would have that background just like any other computer science major.
21:13: But I was not a computer science major.
21:16: I was someone getting a phd in biomedical sciences taking a programming class.
21:21: Yeah.
21:21: And it's not even just programming, there's also like advanced type of mathematics or other things I remember taking and auditing a lot of different courses when I was in grad school trying to pick up different skill sets that my own program wasn't teaching me because they didn't know they had to teach me it because I was in a part where we were trying to use these data that were so new.
21:42: So I'd be heading all over campus to different departments because it's a very multidisciplinary effort.
21:47: Types of big data science research.
21:50: Bioinformatics continues to become a more and more integral part of biomedical research.
21:55: Everyone involved in biomedical research today is working with software in some way while making sure that software is user friendly and accessible.
22:03: Goes a long way to help researchers understand the patterns they might see in data and how to fight cancer.
22:09: It's really only part of the puzzle.
22:11: It's also important to understand how to use these computational tools in a way that is transparent to others and consistently gets the same results.
22:19: In our next episode, we'll speak with experts about how rigorous and reproducible science is accomplished in cancer informatics research.
22:28: Thank you for listening to cracking the cancer code.
22:30: This podcast is sponsored by the National Cancer Institute through the Informatics Technology for Cancer Research Program grant number U E five C A 254170.
22:41: The views expressed in this podcast do not reflect those of our funders or employers we would like to especially thank Dr Jeffrey Lee and Dr Julie Clem for their time and contributions.

Cracking the Cancer Code

Cracking the Cancer Code

Empowering cancer research through data - ITCR

Candace Savonen

Carrie Wright

Elizabeth Humphries

MJ Wu

Jeff Leek

Juli Klemm