Vega Conversations: A Q&A with AI developer Joe Futoma

At Vega Health, we've assembled a network of collaborators — team members, advisors, investors, and allies — with decades of experience making technology actually work for clinicians and operators. Vega Conversations is a series highlighting their perspectives: what brought this group together, what we're doing differently, and how we believe AI can structurally improve healthcare.

Joe Futoma is a Staff ML Data Scientist and researcher at the wearable health technology company ŌURA. He holds a PhD in Statistical Science from Duke University and previously completed a postdoctoral fellowship at Harvard University. His work spans clinical machine learning, model evaluation, and the development of predictive and foundation models for health data.

Q: How did you first start working with Mark Sendak, Vega Health’s co-founder and CEO?

I met Mark early in grad school at Duke, where I was doing my PhD in statistics. He was in medical school and had started connecting with people in the stats department, including my advisor, Katherine Heller. He had ideas about applying machine learning to healthcare, so we started collaborating on projects together.

Q: What kinds of projects did you work on together?

We worked on a few different efforts, including surgical complications prediction, kidney disease trajectory modeling, and sepsis prediction using electronic health record (EHR) data from Duke Health. My work was more on the technical side of developing and evaluating machine learning models for this kind of EHR data, while Mark was more focused on the data cleaning and implementation work. The sepsis project ended up getting the most traction and gave me early exposure to what it looks like to move from model development toward real-world deployment.

This early work was the foundation for Sepsis Watch, a solution ultimately developed during Sendak’s time at the Duke Institute for Health Innovation.

Q: What did that early experience teach you about healthcare AI?

One of the biggest lessons was that model accuracy is only a small piece of the puzzle. Even a strong model will struggle if it does not fit into clinical workflows, align with the broader care environment, and make sense for end users. I got early exposure to the full process of moving a model toward real-world deployment. On the technical side, it is easy to over-index on the model itself, but my experience with DIHI taught me that the technical piece is only one part of a much larger socio-technical problem. That has shaped how I think about AI ever since.

Q: What was it like working with healthcare data a decade ago?

Messy, but honestly, healthcare data is still messy. One example was that the coding for serum creatinine changed over time when Duke switched to Epic. The naming convention changed, which is a seemingly small change; but, if you were querying the EHR the old way, it could suddenly look like the data disappeared. We saw similar issues with other variables too. It was a good lesson in how much of the real work is data cleaning, harmonization, and preprocessing.

Q: How has AI development changed since you started?

When I started grad school in 2013, a lot of model building was much more manual. You might derive training equations by hand and then implement them yourself, and the tooling was much less mature than it is today. Over time, frameworks like PyTorch and TensorFlow automated and standardized a lot of that work. Now we are in another shift, where coding agents and LLMs are raising the level of abstraction even more, so the work is increasingly about specifying intent, reviewing outputs, and debugging higher-level failures.

Q: Is that shift a good thing?

Mostly yes, because it makes powerful tools accessible to more people. But there is a tradeoff. As abstraction increases, it becomes easier for people to build things without really understanding how they work under the hood. That is fine until something fails in a subtle way. Then depth still matters a lot, especially in high-stakes settings like healthcare.

Q: Why is evaluating healthcare AI so difficult right now?

Traditional machine learning is hard enough, but at least you usually have a clear setup with inputs, outputs, and some notion of ground truth. LLMs are much harder to evaluate because they produce open-ended outputs , and there often is not a single obviously correct answer to a prompt. As a result, evaluation ends up relying much more on human judgment, qualitative criteria, and benchmarks that can be surprisingly sensitive to task design and prompting. That makes it much harder to know whether a system is genuinely robust, especially in high-stakes settings like healthcare.

Q: What do you bring to Vega as an advisor?

I think a lot of it is an evaluation mindset: being rigorous, being skeptical, and trying to understand whether a system is being evaluated in the right way. I also really value working with people I trust and genuinely enjoy collaborating with, and that has definitely been part of my relationship with Mark over the years.

Q: How has the healthcare AI market changed since the early days?

Back in 2014 or 2015, health systems often seemed very beholden to large incumbents like Epic. There was a sense that smaller players were not worth betting on because eventually the big vendors would offer something similar. That feels different now. There is more openness to specialized companies, and I think more recognition that incumbents do not always have the best or most innovative solutions.

Q: What excites you most about healthcare AI today?

I am very interested in wearables and the question of how they become more than consumer gadgets. The exciting challenge is figuring out how these tools can actually become useful to clinicians and patients in ways that help detect disease earlier or support better care.

For generative AI, it feels like we are finally seeing these systems start to be used more broadly in the real world. There is still a long way to go, but the potential is real. Healthcare has so many inefficiencies, and if we can build tools that actually improve access, support clinicians, and fit into care delivery in meaningful ways, they can have real impact. What is most exciting to me is the chance to work alongside friends to help build things that solve real problems.

Vega Conversations: A Q&A with AI developer Joe Futoma

Related resources

Vega Conversations: Dr. Rob Califf on using AI to increase equitable outcomes in healthcare

Beyond Model Accuracy: Designing AI Monitoring for Real-World Healthcare

STAT Breakthrough Summit: Approaching the digital divide in health AI

Ready to learn more?