Driving Invention with the Latest in AWS Generative AI

Executive Summit at AWS re:Invent

In this fireside chat, Dr. Swami Sivasubramanian, VP of AI & Data at AWS, discusses key developments and strategies in artificial intelligence. He emphasizes AWS's comprehensive approach to AI innovation, addressing challenges like model accuracy and data integration. He highlights lessons from implementing AI at scale, including the importance of hands-on experience and adapting management styles. He offers insights on moving AI projects from prototype to production, focusing on real-world effectiveness and cost management. The discussion covers AWS's latest tools for data preparation and AI implementation, strategies for cost optimization, and the future outlook of AI in business. (January 2025)

Transcript of the conversation

Featuring Jonathan Allen, Director, Enterprise Strategy, AWS, and Dr. Swami Sivasubramanian, VP, AI & Data, AWS

Jonathan Allen:
Well, thank you for being here, Swami. It was a busy morning. Congrats on the keynote. Clearly a lot of passion and pride for what your team have built. I counted over 20 plus announcements this morning.

Dr. Swami Sivasubramanian:
Alright.

Jonathan Allen:
It was like I was going through it again. That's another one. That's another one. That's pretty awesome. As we reflect on that blizzard of announcements, tremendous amounts of innovation for customers, what are the two or three things which really stand in your mind?

Dr. Swami Sivasubramanian:
Sure. First of all, thanks for having me and really nice to be here and meet with the group. I would say you're asking me to pick my favorite children. And in real world, I have two kids and a puppy, so I guess three.

But I would say, I mean the key thing in many ways is if you see the AWS approach to AI, it's way more than one model or one chatbot. We do think in the future, innovation is going to happen in every layer of the stack, in every enterprise too. Because I'll set this context and I will answer your question, but it’s important because everybody thinks, I mean of course you are going to see every application developer in all our organizations going to innovate with models and all the tooling with capabilities that Bedrock provides. And of course you're going to have GenAI assistance actually helping developers and actually business users and data analysts.

But the other thing people always forget is that every CS grad, undergrad that is passing out right now, knows how to build a large language model from scratch. This is already happening. So that means what you're going to see in your workforce coming up more and more, are people who are capable of actually doing active surgery of these models in open source. So that's why it's important to actually keep innovating in every layer of the stack. That build.

Now I'll dig into few of my favorites. One is of course, I've been at AWS now for like 18 years. So in many ways where we are with GenAI reminds me so much of early days of AWS, when there were literally only few of us, probably the size of this room and whatnot. And not just because the number of people who are in AWS, but the pace of innovation and how much of a game changer it was.

And in many ways, Bedrock pace of innovation reminds me of the times we built DynamoDB or S3 or several others. And we are at an unprecedented times where it is still murky. 2023 was all about prototypes. All of us. Our customers were actually experimenting with which proof of concepts work, what actually does meaningful, what is not.

2024, now they are seeing what it takes to actually get into production. Then they realize, you know what, it's a lot of work to find which model works for which use case. And then I don't need one model. I need lots of models. And then I need to be able to do it cost effectively. I need the right guardrails and so forth, and I need these agents to work and whatnot.

So that's why in Bedrock, you see we have done innovation in every layer of this bucket in Bedrock. From the model variety where even the first in the industry to say, I mean launch it, you're not going to launch with only one model. You're going to launch with a wide variety of models. At the time, people thought it's because it's a weakness.

The RDS actually didn't launch with a single database. We actually launched with multiple databases. Because one thing that is always common time and again in the industry is developers want freedom to innovate. And they want choice in databases, they want choice in models, they want choice in tooling.

AWS is where it is because we actually have always been about giving developers the freedom of tooling and choice. We doubled down on that in Bedrock. But we also went beyond models and we actually gave all the tooling and you saw all the innovation there.

But some of my favorite are of course in the data automation space. Lot of us actually deal with lots of data sitting that is completely inaccessible. And now you have a GenAI-powered ETL for all your unstructured data without any code to be written. I actually think that would be big.

But of course, the other big one I think is Q Business. Again, I am an engineer who some point became managing teams and managing businesses. So I ask lot of business questions. I work on pricing and various other things. And I'm not good at spreadsheets. I always get impressed with my finance team and they can do all this magic paper tables and so forth.

So I would rather write a Python or Polcode to do what they are doing than work on spreadsheets. So that's my type. But what you can do with Q scenarios today is amazing. I mean, what used to take three weeks to be able to answer complex business questions and scenario analysis, which is typically done in spreadsheets, if it can be done here, that's like a game changer.

The same thing is true for developers. All these coding assistant war that happens, it can be any of the Copilot or any of them, they focus on the 20% problem. Because your developers don't spend 100% of their time writing net new code. It's almost a myth. We wish that is true, but it's not true.

They end up spending time writing documentation, unit tests, code reviews, and then performance engineering, ops investigation. So how do you get them to do 100%? You've got to solve the other 80%. That's what we did with Q, with all those launches. Those are actually quite a favorite of mine and we already see it within HAQM.

When we actually did this Q to be in charge of all the software upgrades for Java, we gave the company back 4,500 developers years, just in the first seven months. Think about the productivity improvement at that time. I know you want to get going, but I will stop right here.

Jonathan Allen:
Yeah, that four and a half thousand, when you stare at that number, your mind has to process that to go, "Is that real?" And then you look at the data, you go through a number of cycles, you see Andy talk about it, you're like, "That's a real number and that's a big number and that really matters."

Jonathan Allen:
Now you and I both work with a lot of customers who have iterated their way through generative artificial intelligence to solve real problems for themselves. I thought Andy talked really eloquently yesterday when he talked about the iteration that you have to go through. But I think you and I have both seen customers get to that, I'm 99% of the way through and then I'm just going to hit pause because of hallucinations.

Dr. Swami Sivasubramanian:
Yeah.

Jonathan Allen:
So obviously, we're an extremely, we do a lot of science in HAQM. And I'm truly interested to hear you talk about some of the progress we've done in that.

Dr. Swami Sivasubramanian:
I think this is an area, it's fascinating when you think about what LLMs are capable of. But what we all and many of us forget once in a while, that at the end of the day these are highly probabilistic models. And they end up actually generating things even if it is wrong 1% of the time or inaccurate. It can actually be a real production blocker, especially in regulated industries.

So when we looked at this problem, we actually said, "You know what, what is the best way for us to be able to actually ground these with the more certainty?" And then this is where in my team we are invested in a science called automated reasoning. If your engineers are actually leveraging our IAM policy tools and authoring, it is already powered by automated reasoning capabilities, to check if that policy is enforced the way we think it is enforced, or you think it is enforced.

It's like a mathematically verifiable construct using formal logic. And we actually thought why not actually combine the mathematical proof that is provided by automated reasoning, with actually the creativity and expression of GenAI? To actually produce something that is extremely creative but also grounded in formal logic.

And this is what actually was one of the groundbreaking innovations I actually think can change the landscape in a way. Because this is going to be one of the big blockers in the future for putting GenAI innovations in the regulated industry. And we have been working with a few customers in these spaces. And the early results are really, really promising. Where you give these models a document of saying, "Here are the rules, grounding principles you can never violate." And then give it as a guardrail. And it constructs a semantic model that actually acts as the guardrail for the LLM to not violate.

And when the LLM generates an inaccurate response, it goes back and actually tells LLM like, "You've got to try better, you are violating this one." And then suddenly everything becomes not better. So when you think about it, these are the kind of techniques we used even in our Java upgrades. When you think about formal logic is not new. It is what powers our computers today in many ways with things like compilers and so forth.

And that's what we're doing, and we are very excited about it. Of course there is a lot more machine learning-based innovation that is happening. And I can get into that more and more too on how we built Nova and various other things too. But this is something I don't think the industry has woken up to.

I almost view it as the yin and yang, so to speak, between because formal logic is all about certainty, whereas GenAI is all about unleashing creativity and accepting imperfection. And if you actually marry these two well, I think the results could be amazing.

Jonathan Allen:
I think certainly in the regulated industries, mathematical proofing is actually pretty well established. But the automated reasoning element coming in, I think having listened to customers, it's something that they're going to rapidly embrace. I'm super excited about it.

Now when I was a customer in this room before I joined HAQM, I wanted to hear about the lessons learned. You've been at HAQM a long time. You've worked on many machine learning and generative artificial intelligence initiatives across HAQM.

For this group here, what are the two or three most pronounced lessons learned that you've got as you've helped those initiatives come to life? That you could share with this audience where you wouldn't want them to make the same mistake unconsciously or without going into it?

Dr. Swami Sivasubramanian:
Yeah, so tough. I was thinking the two or three biggest one I would say is when the technology is leading edge, some of the mistake, we as like organization owners and product owners. And we end up first actually not investing enough time from the engineers to all the way up to even the CXO levels to actually get hands-on and play with it.

To me, I actually think this is one of those where unless you really know and embrace and really experiment with this technology, you don't realize the power and potential of what it can do. And to me, that is often overlooked. So that's why I almost make it a point every week I have a few rules that I set for myself.

I at least meet with three to four customers personally and I at least read three to four research papers personally. Those are the two rules. And I tell my teams. And then over the weekend I either write code for fun or with my daughter for a robotics project, one of these two. So that I actually know, I stay to the ground.

Jonathan Allen:
Yeah, these are good forcing functions, right?

Dr. Swami Sivasubramanian:
Yeah, exactly. Otherwise, you end up operating at a level where you are not grounded.

Now the second thing I'll just say is we had to really be tuned when the pace of innovation is so rapid that the mechanisms we typically organize to manage at scale really do not work. When the pace of innovation in technology is actually suddenly changing to 10X than what it was five years ago.

All our existing mechanisms of managing programs and engineering projects, all those things are really going to be not fast enough at the pace at which you want to innovate. So it requires a different style of management where you are not down to the details. And then sometimes you don't need to actually engage every layer of managers before you get to the person who's actually designing the product and then writing the code and so forth. So I actually changed my style pretty drastically in the past three years for that reason as well.

And then the third one goes without saying, it's an HAQM DNA. Work backwards from the customer problem and the business objective you want to accomplish. At the end of the day, as much as I'm passionate about GenAI, AI, and data, it's a means to an end to solve something for your customer and for your business.

This is why, even in the early days of deep learning, but also early days of GenAI, we all build a lot of proof of concepts and they all went to the graveyard to die a slow painful death. Because we really didn't see which one is going to really make an impact to my business in terms of return on investment.

And that is so important more than ever because especially in the GenAI world where the cost is really, really high. And you saw my launches, a lot of these are all around how do you manage things at scale? If you saw the SageMaker HyperPod task governance, this is a big lesson in HAQM. Like last year we had lot of GenAI projects.

And you can call them inference and training and fine-tuning and experiment. And we internally built a system to automatically move resources between that. Because inference, the cool thing about inference is, I shouldn't say cool thing. The challenging thing about inference is let's say in something like our shopping agent requires 1,000 training in two instances at its peak. The nighttime, not that many people shop, but you can't have those instances sitting idle. But ideally you would want to actually give it to workloads that are not time sensitive like training or fine-tuning experiments.

This required active resource management. So we ended up building the system. And then we said, actually when we talked to many of our customers, especially CEOs, they said, "I am pouring so much money into this, but my utilization is not that high. Why is it so?" Then we actually said, "This is what we internally do." They said, "Why can't I have it in SageMaker?"

That was a big lesson. To actually do GenAI at scale, you've got to actually be exceptionally good at managing costs. That's why you saw so much attention on hyper-powered task governance, but also doing scale at inference with model distillation. And also actually things like intelligent prompt routing. These are the techniques we internally use because our GenAI projects especially, they are reaching at a scale where every percentage improvement in utilization and cost really matter.

Jonathan Allen:
I was meeting with a customer yesterday and they had identified, now I've seen this a few times, they've had 100 candidates for GenAI things they want to do. They isolate in on a few and then they have trouble getting going with those one or two. There are blockers as we know.

Jonathan Allen:
And akin to sort of the early days and even now of cloud migration and modernization, what I found as a customer was getting close to the stand-ups, removing the blockers was super important. But with those generative AI initiatives, when they've identified those one or two things, what are some of those? What do you recommend for customers who take those early experiments to really drive that through to scale into production?

Dr. Swami Sivasubramanian:
I think the path for prototype to production is all around actually paying very close attention to see what works in the wild and what doesn't. That's number one.

Because the interesting thing is these LLMs, when you put them in the wild, especially with these projects in front of your customers, sometimes these projects really resonate. Sometimes we might end up overdoing on boundaries. Saying, "I don't want to actually it take any topic question and then that is except for this." And then your chatbot suddenly can answer only five questions that you have written a rule engine for it.

So you have to actually get the balance right. You got to actually keep experimenting. So this is where the pace of iteration is super important, especially as you scale your production. But that's why if you see the better of some of these tools and the ability to keep iterating and tuning things like guardrails and other things, matter.

And then the cost of inference is a really big deal. And this is something we experimented a lot and that I touched on as well. And third thing is we also sometimes think GenAI is about one lead building, like an application to do contact center automation and various others.

We sometimes forget that these GenAI assistants to enable actually non-developers and business users, so that they can get to that data and analytics faster and actually make data-driven decision, is like a game changer.

In many of HAQM businesses, now we have non-technical folks, these can be category buyers or folks who are in the ads and whatnot. They are able to actually put technology like Q and being able to access data lake to drive analytics. And that I think is a really, really big game changer in a way because then the speed of decision making is going to accelerate in a big way.

Jonathan Allen:
Now you had two or three customers join you on stage this morning, amazing stories. When I've been working with customers, I think the combination of some of these building blocks has worked astonishingly well.

So HAQM Q for Connect, when you're able to make a material difference. On making what is a really hard job of actually being a contact center agent, just that little bit easier. Helping it. And I think the retrieval augmented generation announcements this morning, very exciting to accelerate that even further.

Jonathan Allen:
What are the two or three customers that sort of sit in your mind though where you're like, "Wow, I wasn't expecting that."

Dr. Swami Sivasubramanian:
I'd say if you see the general pattern with customers, I kind of put them in two to three buckets. The first bucket is they use GenAI to create net new products or net new customer experiences. That's like the Eco under Autodesk, where they are practically re-imagining 3D design with generative AI. And that is a step changer, a game changer for that industry. And that is one big, big, that's like improving your top line in a way.

Now the second thing is actually I put it in the category of use GenAI to actually make sure you do things faster and cheaper and better for your customers. And you can actually view it as how do I automate my mortgage application workflow? Like what Rocket talked about. Or how do I serve my customers faster? And so forth. Those buckets are very, very, very important too. That's like improving your bottom line.

And then the third bucket I would just say is we already innovative. We also manage a huge software development team. And I could have had another five customers on stage, then it'll be a three-hour keynote. But the other one that deeply resonates with many of our customers is the ability to make software engineers 5X productive.

And I do think we are at the cusp of doing that, especially with all things we are doing with Q, where we are not just focused on one aspect but all aspect. And that is something that is really resonating among our customers, especially with things what we are doing. And like Q helping ops investigation Q helping with unit test and documentation. And every aspect that is going to be a game changer.

Let alone Q is literally the best in the world right now in software code generation. You can go check at SWEbench.com and you'll see it is the best assistant. But to me that is just a part of the puzzle with productivity.

Jonathan Allen:
Yeah. Now, one of the things you touched on this morning, which I want to double click on a little bit. Is, and you've already mentioned it, with the disparate data sources across an enterprise, whether they're in operational data stores, whether they're in parallel processing engines or data lakes. How do you get your data ready for generative artificial intelligence?

Dr. Swami Sivasubramanian:
I think this is one of the very interesting challenges as I was working with lots of customers and internally and externally, we found. When people got excited about generative AI, one of the eye-opening things last year and early part of this year was almost all generative AI and RAG only work for text data. They were barely usable for your data viruses and data lakes. And they were also not usable for most of multimodal content.

So this kind of actually means most of the generative applications that customers really wanted to do, are not possible without them actually building their own custom SQL engine, like natural language to SQL, or they had to build a GenAI ETL. They were doing lots of other things instead of actually building what they wanted to build. So this is what led to today us saying like, "You know what, let's go solve all the data problem, not just the simple RAG problem."

And so we launched just the ability to build structured data retrieval so that you can actually create what Q and QuickSight does. If you want to create such an experience and you want to create a CS chatbot yourself to be able to retrieve your data from let's say your relational databases to answer a customer query on what is the status of my latest order? You can actually do that today, and that is possible.

And you can take all your multimodal content sitting in PDF to everything else, and image and video, and automate the ETL process like you would do for a traditional structured data without any code whatsoever, with Bedrock data automation. And these are going to be really game changer.

But the part which is super important is these are just tools. You still need a current data strategy on governance to various other aspects. This is where we want to help with the SageMaker catalog, where it will actually combine all the cataloging, organization, governance, and discovery, everything built in as well.

Jonathan Allen:
Cool. Now like security, which is always top of mind for everybody. Cost.

Dr. Swami Sivasubramanian:
Yeah.

Jonathan Allen:
Cost. We want to have our cake and eat it. As we do this, we want to make that difference. We want to roll out those new customers. Cost is rightly, a very strong concern for customers in the audience. What are the top tips you've got here?

Dr. Swami Sivasubramanian:
I would say number one, if you see any GenAI project and the team's come and demo you something, they always demo you with the most capable model. Which what they don't tell you is that it's the costliest model where the economics simply don't work. If it is saving costs in terms of dollar per user is something like $5. But if the operational cost is actually close to $5, it's not helpful.

So I think getting into the details of what is the unit economics of what you are trying to do is super important. And when you do it at scale, you have to start thinking saying, "How do I make sure I still meet my accuracy bar while actually meeting my cost constraint?" This is why we built techniques like model distillation. So the big costly teacher models can actually teach the smaller models for your use cases.

It's like the equivalent, or I explain it as these big models are like these geniuses with eight PhDs. And then you don't want someone with eight PhDs to solve, let's say, how to write software. You need the people like me with 21 PhD in computer science to write a code program or whatnot. So you take any domain expertise and then actually teach it to solve for that use case. Model distillation is going to be a game changer.

Second thing is actually the HyperPod task governance. Many projects are cross training to inference is still done directly on GPUs, or increasingly on Trainium. You want that utilization to be as close to, if not about 90%. And right now many are not. And this is something is really important. And internally we work super hard and these are the things I would encourage you to watch so that you build mechanisms to manage these costs.

Jonathan Allen:
Cool. Now I'd like to push you a little bit on what do you think is going to happen on the future? Now, whenever Jeff Bezos was asked this question, he would famously say, "Let's talk about what's not going to change. Our customers are always going to want it faster. They're going to want more selection, they're going to want it cheaper."

But can I push you a little bit on where do you think this is going in the next 12 months?

Dr. Swami Sivasubramanian:
You almost stop me from using this.

Jonathan Allen:
Yeah, that's funny.

Dr. Swami Sivasubramanian:
That's what I was going to say. Jeff taught me well. To say that's not the interesting question. The more interesting question is what's not going to change? But I will take this and entertain all of us, not entertain in a joking way.

But I think the key thing is, I almost put it in three P buckets. Not like the MBA three Ps of marketing. I'd say last year was all about prototypes. This year was all about getting to production, and next year is all about productivity.

And I actually think we are going to be at an unprecedented phase of where we are going to see increasing productivity in every part of the discipline in our organization. And companies which really embrace these technologies and kind of reinvent what they do to make their employees be like 5X productive and 10X productive, are able to be innovate faster. And companies which do not embrace them, they are going to fall behind.

So to me, that is going to be one of the fundamental ones. And it's not about the embracing a GenAI assistant. It's not about putting an assistant for your office tools or so forth. Yes, that might be interesting. It's more about how do they actually fundamentally work better, work differently?

And then solve the problem that they don't want to solve. All of us really love taking on extremely hard challenges, and we don't like doing mundane challenges. For instance, nobody's complaining that they don't manually switch telephones anymore, once automated switching actually happened. They all actually love that they don't have to do that job.

But during that time, managing that transition is hard. It's almost like a human transformation problem. And I do think we as leaders had to lead our organizations through that effectively so that we can reinvent how we do things like really, really efficiently. But also actually make all our employees also embrace these technologies to reinvent what we do, how we do, and do it for better and do it for good.

Jonathan Allen:
Brilliant. Swami, thank you very much for spending half an hour with us and sharing your wisdom. We greatly appreciate it. Thank you very much for joining us.

Dr. Swami Sivasubramanian:
Hey, thanks for having me.

Jonathan Allen:
Thank you.

Dr. Swami Sivasubramanian:
Thank you.

Dr. Swami Sivasubramanian, VP of AI & Data, AWS:

"Work backwards from the customer problem and the business objective you want to accomplish. As much as I'm passionate about GenAI, AI and data, it's a means to an end to solve something for your customer and for your business."

Listen to the podcast version

Listen to the interview on your favorite podcast platform: