Making the Case for the Terminal as AI's Workbench: Warp’s Zach Lloyd

Zach Lloyd built Warp to modernize the terminal for professional developers, but the rise of coding agents transformed his company's trajectory. He discusses the convergence of IDEs and terminals into new workbenches built for prompting and agent orchestration, and why he thinks "coding will be solved" within a few years, making human expression of intent the ultimate bottleneck. Zach explains how Warp competes against subsidized tools from Anthropic and OpenAI, and why the terminal's time-based, text-oriented format makes it perfect for managing swarms of cloud agents. Hosted by Sonya Huang, Sequoia Capital

Published: Published Jan 27, 2026
Uploaded: Uploaded Jun 11, 2026
File type: POD
Queried: 00

Full transcript

Showing the full transcript for this episode.

AI-generated transcript with timestamped sections.

0:00-1:32

[00:00] Just the general form factor of the terminal is perfect for agentic work because it's [00:04] Everything is time-based... [00:06] It's all about input of text and output of text. You get a lot of what you're doing. You can multitask agents in the terminal really easily. And so I think it's been like actually a great stroke of luck for us in a lot of ways that the terminal has become the center of agentic development. It's a huge opportunity for us. [00:24] Thank you. [00:40] In this episode, Zach Lloyd, founder of Warp, reveals why the terminal is becoming the center of AI-powered development. [00:47] Zach shares how coding interfaces are converging into a new workbench built for prompting and agent orchestration, and why the next frontier isn't developers typing prompts, but ambient agents running in the background that autonomously respond to system events like server crashes or security incidents. We discuss the brutal competitive dynamics of the coding market and why model providers are racing into the application layer. And finally, Zach shares his thesis that coding is nearly solved and that the ultimate bottleneck for AI will be humans' ability to clearly express intent. [01:17] Yes. [01:17] Enjoy the show. [01:20] Zach, thanks so much for taking the time to join today. [01:23] Thanks for having me on. [01:24] Before we get started, can you tell our audience a little bit about yourself and what is Warp and what company did you set out to build and why?

1:32-3:10

[01:32] Yep. So I am Zach. I'm the CEO and founder of Warp. Warp is a developer-focused startup. Our goal has always been just like help pro developers ship better software more quickly. The product that we've built, it has an interesting history. We're like five years old. We started off building a modern reimagination of the terminal. [02:02] is one way of thinking about it. Probably the simplest. It's a workbench for building software with agents is kind of the more general way of framing it. [02:12] Awesome. Let's dive right in. What made you decide that the terminal was the right place to build? [02:18] So I've been a developer for a really long time. I've always used the terminal, uh, [02:24] In prior life, I was a principal engineer at Google. I used to run engineering on Google Docs. I'm not a good terminal user. I always worked with people who were good at using it, and I saw that... [02:35] you get just a ton of stuff done as a developer because of where it sits in the stack. So, [02:41] It's like a... [02:42] super duper powerful thing if you know how to use it right but the sort of [02:47] stock version or classic version of the terminal, I think is like a horrible product. It's hard to learn. It's easy to make mistakes in the mouse doesn't work. And so, you know, I was, I was interested in how do you build something that's impactful for, [03:02] developers, how do you build something that helps more good software exist in the world? And trying to reimagine the terminal felt like a cool thing to take on.

3:10-4:43

[03:10] And how much of the thesis is around making the terminal great for single player versus multiplayer? [03:16] It's a good question. So the multiplayer part was going to be with the business model. [03:20] was going to be. So it's like, you know, I came from the Google Docs world. I built collaborative software. I think the closest analogy would be something like Postman, where, you know, [03:30] They have a collaborative API platform. We were going to do that around the terminal where you could share commands. You could share runbooks, share incident response manuals. And Warp actually has all that stuff, and it's super... [03:44] It's super useful, not just for people, but for agents at this point to have all that knowledge baked into the product. [03:52] So that was going to be the business model. But where we actually started was just like... [03:57] the [03:58] hands-on keys interaction with the terminal itself. Could we... [04:03] re-imagine the developer experience of that. And so we spent... [04:08] you know, the first like year, year and a half, [04:11] of just like, how would we like this thing to work? How do we want the input into the terminal to work? How do we want the output to work? [04:16] And, like, how can we make it just, like, easier without... [04:20] diminishing the power of the tool. [04:21] Yeah. Awesome. And you made the decision to focus on rebuilding, reimagining the terminal pre-generative AI, pre-coding models taking off. Do coding models and agents, do they change your answer to the question of like how important is the terminal as the workbench? [04:38] Terminal is ironically more important now. The terminal has become...

4:44-6:19

[04:44] I think the preferred form factor for working with agents. I mean, basically you can work with them in the IDE or you can work with them in, [04:53] in the terminal or you can create some other workbench which [04:58] You can see warp that way. Actually, Warp Starter as a terminal is a broader workbench now for agents. But just the general form factor of the terminal is perfect for agentic work because... [05:08] Everything is like time-based. It's all about input of text and output of text. You get a log of what you're doing. You can multitask agents in the terminal really easily. And so, [05:18] I think it's been like actually a great stroke of luck for us in a lot of ways that the terminal has become the center of agentic development. It's a huge opportunity for us. [05:28] Hmm. I'm curious if you thought that we were headed towards a world where people just weren't going to spend time in the IDE. And do you think that's been accelerated now? [05:36] I think that the [05:37] kind of tools are morphing. Um... [05:40] Yeah. [05:40] And so, you know, pre-Agent World, you had... [05:44] pretty clear distinction between terminals and IDEs. Today you have tools like warp, which are, you know, we've like grown from the terminal and added a bunch of IDE features like, [05:58] code editor and code review features and, and the file tree. And like we get yelled at on Twitter for having a file tree and warp, because it's not like a pure terminal thing. But then if you look at like the latest iteration of cursor, uh, [06:12] which started as an IDE, it looks a lot more like Warp. The primary interface is now more of a chat interface and

6:20-8:04

[06:20] talking to your computer, but you still have all the file editing things. So I don't know if I would be like terminal is going to die or the IDE is going to die. What I do feel strongly about is that there's going to be innovation and there is innovation happening where the form factor is changing to match what the agentic workflow should be. So the form factor is like... [06:43] It should be geared towards prompting. It should be geared towards adding context. It should be geared towards reviewing agent-generated code diffs. I think actually now, like... [06:55] team is even more important, like especially as you have more and more agents that are [07:00] not just launched locally by people, but are coming to... [07:03] be launched by system events. And so I think the workbench is changing and I actually think it will end up looking more like a terminal than an IDE, but probably won't, strictly speaking, be a traditional version of either. [07:15] And is the rough framing of, you know, the reason to use each that, you know, terminal is roughly equivalent to the chat bot, like you can you can chat with a coding agent and hand off tasks and then IDE is roughly equivalent to like a GUI for actually editing and writing code? Is that is that the right mental model? [07:33] Yeah, that's definitely where things have started. Like, yeah, right. The IDE is like Microsoft Word for your code and the... [07:41] The terminal is like chatting with your computer. [07:46] And... [07:48] The... [07:48] If you're doing professional agentic development, which I would distinguish from vibe coding, you kind of want both of those. Like, I don't think for the pro use case, we're at a spot yet where you can be so disconnected from the code.

8:04-9:53

[08:04] that you don't need some way of like, [08:06] falling into hand editing it. I would think of it as like... [08:10] the hand editing... [08:12] is like almost become like a fallback interface or a secondary interface. And the primary interface now is the prompting interface. And so, um, [08:21] Yeah, basically I think that [08:23] That is the right distinction, but I think it's all kind of merging product-wise. Merging, yeah. Interesting. You mentioned ProCoders a few times. What was the decision to focus on ProCoders about, and how do you think that plays out over the next decade? Will there be ProDevelopers left? Will everybody be a ProDeveloper? [08:41] So, okay, it's a great question. So I think what I really care about is... [08:46] Um, [08:47] helping build software that I use every day. Like there's probably... [08:54] Thank you. [08:55] 10 apps or whatever in my Mac doc and pinned as Chrome tabs, which are apps like Google docs or Spotify or notion or Figma or warp, which are, um, [09:06] hard to build apps that I think, um, [09:10] it's those hard to build apps that the world spends most of their time using. And those are built [09:16] I think – [09:17] more by pros and they're definitely built more by enterprises. And I just want to be a part of like creating that kind of software. Whereas I, I do feel like the, the, the non pro segment is cool. And I do think it's, it's actually really, [09:33] it's empowering that [09:34] anyone can make an app at this point. But I just think the sort of economic value of the apps that you build with a Vibe Coding tool, like Lovable or Replit or whatever, is lower than the economic value of the apps built with the tool that's geared towards pros. And it's also just like...

9:53-11:34

[09:53] I don't think I've ever spent my day using an app that's been built in like a no-code, low-code, vibe-code tool, whereas I spend all of my days like literally living in software that is built by pros. It's really like built for like these immersive, hard, important, economically valuable use cases. [10:12] Totally. The world is a museum of fashion projects. I think that includes the software we choose to use every day. Yeah. Let's talk about competition in the coding market. This is like the most brutally competitive competition. [10:25] yeah the software I have ever ever seen um and you're playing in interesting waters right you're you're competing with a lot of folks you're collaborating with a lot of folks um maybe just help orient our audience uh where do you see yourselves in the broader competitive landscape in the coding market yeah it's like it is competitive out there because it's such a [10:48] big important market that a lot of people want to [10:51] playing it. And, you know, where do we sit? So... [10:59] So we are a sort of general purpose agentic development team. [11:04] Workbench. [11:06] Which means you can use us like [11:09] cursor you can use us like cloud code i think we have a unique product approach [11:16] to doing agentic development where we are [11:20] truly the only, um, [11:22] platform out there that has grown out of the terminal. So there's a lot that have grown out of like IDEs, specifically forking VS code, and they're all very similar products. And then there's a lot that are just...

11:34-13:05

[11:34] apps that run within the terminal. And so those are like text-based apps. And those are also basically all the same. And so Warp has a very differentiated product approach. [11:47] I think one area where our product approach really shines is for people who are doing like traditionally terminal-heavy workflows. And so that would be things like [11:57] stuff beyond coding. So stuff like the software development lifecycle, it could be setting up projects. It could be deployment. It could be working with like Docker and Kubernetes. It could be incident response. So like backend DevOps, you know, [12:15] SRE, people who do production work, I think Warp is an amazing tool for them because it integrates so well with all of these non-coding terminal workflows. But the truth is, I don't know, Warp is like we're at any given moment, we're one of the top five [12:28] Agents on SWE bench were typically number one or two on terminal bench. And so it's a great general purpose coding agent. [12:37] And so we're [12:38] We're in the market, but it is really competitive. Yeah. We're trying to compete on the quality of the product. It's like we are in competitive levels. [12:50] There's competitive pressure around cost, which is like a really challenging thing for us. [12:54] Let's talk about that directly. And just to hit it head on, how do you compete when Anthropic can subsidize their tool with model profits? It's Anthropic, it's OpenAI, and it's Google.

13:07-14:51

[13:07] So – [13:09] we have to compete based on the quality of the product for one thing. And so I think like, [13:15] we can be a little bit in the more premium part of the market here, like coding agents and like, [13:22] developer experience. It's not bananas. It's not like a commodity. These aren't like totally fungible things. Like the product experience does actually matter. And so we can get people who care about that. I think... [13:39] That matters. Basically, I also think you want to stay away from certain user segments who are most... [13:47] cost conscious and cost shopping. And so that would be like, [13:51] Vibe coders, people who are running like agents, like 24 hours a day making, you know, making [13:56] prototypes. And that's just not actually the usage pattern of a pro developer. And so for a pro developer, [14:06] I think you can make a pretty strong argument that like the, the actual holistic experience of using the tool might be worth like 20 bucks more, 40 bucks more, um, [14:14] 80 bucks. These are tiny sums compared to the amount of [14:19] like productivity that people are gaining and the amount of software that's being produced. [14:24] I think there's also a way that we are trying to sit above the model providers and, you know, [14:30] I think there's been a positive development here, which is that for about – [14:34] I don't know, three months, maybe three, six months. I think Anthropic was kind of like the main show in town when it came to frontier coding. And now I think Gemini 3 and even like the latest codecs are basically on par with like the latest Claude model. And so...

14:51-16:33

[14:51] there's advantage to being able to let people choose between those or model route amongst them to model route with like cheaper open source models. So it's not easy. And if you view it as just like, [15:04] we're in like a cost race and all these coding tools are the same. And I think that's like, we have to differentiate way more on the product and also just like the orchestration of these agents. But I don't think it's quite, that's quite the situation. [15:17] Got it. Okay. So you're winning people because they love the overall Warp product, and that includes... I think so. ...but also products like the actual terminal, the better terminal you set out to build. Yeah, it is kind of a funnel. It's like we have a lot of... [15:33] We have like 700,000 developers in Warp actively. And there's like a bit of a funnel from the terminal into the coding use cases and at least into the terminal use cases. [15:45] Yeah. You mentioned being one or two on terminal bench. What goes into that? Like, are you training your own? [15:52] terminal models or is this harnesses on top of the existing foundation models? So for us, it's a harness on a mix of models. So we're, [16:01] um, yeah, [16:02] And then like the actual capabilities of the app kind of matter, which is interesting. So what I mean by that is like Terminal Bench. [16:10] it's not just coding tasks. It's like all sorts of things that you might do in the, in the terminal. And so we have some intrinsic advantage there by actually being, [16:19] the terminal and not like an app running within the terminal. And so like, for example, I had one of the terminal bench tasks was like playing Zork or something, which is like an interactive terminal game. And so we can...

16:34-18:05

[16:34] We can like use the terminal. We can do computer use in terminal is probably the easiest way to think of it. So just like there's. [16:41] there's companies that are doing browser use, we can do terminal use at the layer of the terminal as opposed to at the layer of, like, a web page, which is what the equivalent would be for the browser analogy. And so that helps us, like, do certain tasks on that particular eval that is hard for other harnesses to do. [17:01] Hm. [17:02] Got it. [17:04] You recently redid your pricing. What did you learn about developers and how they want to pay for AI? Oh, my God. So we're still, like, not fully out of this. Yeah, I mean, I can just explain, like, the whole thing here. So our initial pricing was basically you do a subscription and you get a fixed amount of AI information. [17:26] credits every month uh and we priced it so that uh [17:34] This is when we were at smaller scale. If you fully utilized your plan, it would cost us money, but the hope was that the [17:42] on the sort of on the average utilization that we would make money, right? So it's like you have a plan that gives people 50,000 credits and most people only use 20,000. You can kind of price it around that. What happened was like the... [17:56] people just use more and more. And so we got to a point where we were losing more and more money. [18:01] And so from a company strategy standpoint, we had a choice.

18:06-19:41

[18:06] We talked to Andrew a bunch about this. Like, we could either kind of play the, like... [18:11] And we're growing really fast. The revenue is growing... [18:15] We're adding like a million in revenue every... [18:17] It's since slowed down a little. It was like every five days or something. And it's like we could play the game, go raise more money, but the margins were really bad. And so we... [18:29] decided that wasn't the smart thing. [18:47] if we are margin positive. And so the way that we have like changed the pricing is so that it's much more consumption based, [18:56] So you now pay for like a base plan of 20 bucks a month and then you buy credits on top of that. [19:03] And we ensure that that like. [19:06] You know, it's like in the old world, we didn't want people fully utilizing their AI because it would cost us money. Now it's like much better if people use more AI. It is more expensive. [19:17] For sure. [19:19] And we've had a lot of user complaints around that. [19:21] Which sucks. If any more customers are listening, like it's a bummer. It really does suck. But it's like we just could not afford to keep subsidizing the way that we were. And like all in all, I would say it's gone... [19:36] pretty well. Like, we're still, we're still growing pretty well. And now it's like,

19:41-21:30

[19:41] a growth that is [19:43] sustainable and not like a, you know, unsustainable subsidized revenue growth. So tricky thing to do. Would you ever train your own models? [19:51] Yeah. [19:52] I think we would definitely do like what... [19:56] some of our competitors are doing where we would fine tune models and do RL and that type of stuff. [20:02] I, um... [20:04] It's hard for me to imagine us competing with training like a full... [20:08] Thank you. [20:08] frontier level model, just the amount of capital that costs. We do have a ton of interesting data [20:15] Like... [20:17] I think it's like actually a really interesting strategic asset for us in terms of like the workflows people are doing in the terminal, how to improve them, how people are interacting with our agent. [20:25] So I think it's likely that we will... [20:28] we will do some sort of RL. And I think it's also very likely that we are going to lean more into a mixture of models and more model routing to try to like, [20:39] give users the best experience when it comes to sort of [20:43] latency, cost, and quality, which are the three vectors here. [20:48] And do you see your role as kind of like optimizing that on behalf of users? Do you want to see yourselves as giving all options to users for them to pick? [20:58] Yeah, so our philosophy has been like, make a great default, but then because these are developers, they want control. So we have like a... [21:08] We actually have a couple variants of default. So we have a default that's geared towards efficiency and one that's geared towards performance. And then after that, we give people the raw choice. The raw choice is a little weird because increasingly we really want to use different models for different things internally. And it doesn't map that cleanly on the using...

21:30-23:04

[21:30] say like, [21:31] you know, GPT-5.2 for everything. So it's a little complicated, but I think it's actually something that developers like is the control. And so I don't see us moving away from that right now. [21:42] Yeah. [21:44] One of the most interesting parts of where you sit is that you can actually see which models different developers are using. Yeah. I'm curious in your user base, [21:52] Which models are most popular? Has it evened out a lot? And are there different... [21:57] flavors or personalities of what the different models are good at [22:00] like 70 to 80% of our user base will use whatever we set our auto to and not touch it. And what we set the auto to, [22:11] currently is like [22:12] Um... [22:13] it's a different one for efficient, a different one for performance. It's a mix of the, um, [22:19] Codex [22:20] Sorry, GPT-5 too, but it's related to Codex. And then Sonnet 4.5. [22:28] When people are opting into choosing a model, lately Gemini 3 Pro... [22:36] has been very popular. It's a really good model. You know, what, what we will do is like we will test different variants in our auto model and see how people respond, how they engage with them. And I think we'll probably test Gemini three. I've been impressed with it. Um, [22:51] So, yeah, I would say if I had to like stack rank, I would probably say like the anthropic models are probably still most popular. And then. [22:59] between Gemini and OpenAI, there's a decent amount of people opting into each of those.

23:04-24:37

[23:04] Well, that grok. [23:06] Grok is not in Warp. [23:09] It could be in Warp. [23:11] They've reached out a bunch of times to put it in warp. [23:14] We, we, I'm not, I'm not at all opposed to putting a warp. It's just like every time we put a model in warp, we have to. [23:20] I like with like some concrete benefit to users because it's a bunch of work to tune our harness to work well with the model. [23:29] Hmm, I see. [23:31] for a second. [23:31] Can you say a word more about that harness? Like what do you do to make your harness good? [23:36] So Harness is like how you prompt, what tools you make available, how you manage context. [23:46] And so like the big things that like... [23:49] Our determining quality of harness are like it's literally like the language of the prompting. It's the tool set definition. It's things like handling the context window. So specifically, when do you use something like a sub agent where you where you go out and. [24:08] have something that has a separate context window. When do you summarize? When do you truncate? Like we have things that have like, [24:14] You know, you might run a terminal command that has like gigantic output and you don't want that all in your context window, but you might want some part of it in your context window. [24:22] So how do you sort of pick out the right stuff? How do you do RAG? How do you integrate with MCP? And so... [24:28] It's – [24:29] There's just like some engineering and like alpha that goes into that. The way that you make that good is by measuring.

24:37-26:09

[24:37] Like you can start [24:39] just by like in a pretty like naive way and just give it a bunch of prompts. But the way that you make it really good is by measuring. And by measuring, you can do it with sort of like a fixed set of evals where we know what the results should be and like not all of them should work. And so we have internal evals. You can do it on public benchmarks. [24:58] And that's actually been really good for getting our harness to be awesome. It's just like – [25:02] going through the exercise of making our agent perform well on the public benchmarks. And then you can do it by looking at like, uh, you know, user data. And so we use, um, [25:12] Thank you. [25:12] brain trust so so there's various platforms you can use so like sort of look for patterns and failure modes [25:20] in the agent interaction and then try to tune the harness and sort of replay them as evals. So, you know, that was a big mindset shift for us, like to get to doing that, but that was 100% necessary to do it all data driven to get something that was good. [25:34] Yeah, got it. And you have your own [25:36] tab autocomplete models is that just not even relevant for your your product we don't have that um it's not super duper relevant at the moment um [25:47] Like, I think there would be incremental benefit if we were doing... [25:51] tab completion in the terminal for people. And we do have [25:55] And I think for the hand-editing parts of Warp, [25:58] By far, like the more typical use cases for code review of an agent's code than it is like typing code. But it is it would be nice to have. It's just not a not area of high priority for us.

26:10-27:43

[26:10] Yeah, got it. Awesome. I'd love to talk a little bit about how you see the future of coding interfaces and SEMA future workbench evolving. So it seems like very much like you believe there's a convergence happening between kind of the traditional, call it Microsoft Word, IDE, GUI approach to writing code and then the chat with your computer agentic approach. [26:35] kind of style of terminal first approach. And those two are starting to merge. What other kind of UI innovations do you think are happening in terms of how people work with coding agents? [26:47] I think the biggest change that we're going to see in the next year is more and more like [26:54] kind of like cloud agents or we call them alien agents where, um, [26:58] And this is already happening, like we're investing in this at Warp, where rather than a developer sitting at a keyboard giving a prompt, there's some system event that triggers an agent to do something. And that system event could be like you have a server that's crashing, you have a cluster of user reports, someone has filed a security incident against you. And all those things are basically going to serve as context into an agent that gets launched. [27:28] and runs not on some individual's machine, but like somewhere in the cloud. And so what I think that, [27:34] implies is that you're going to want the sort of like workbench to become more of an orchestration platform, more of a

27:43-29:20

[27:43] like kind of cockpit for managing not just your own agents, but your team's agents. I really think it implies you're going to need like a strong team concept because, you know, [27:54] you know, the, these things aren't, it's not going to be the normal workflow of like, I'm sitting at my desk, like I'm writing a coding change and I push a PR. It's like agents are going to push PRs and then agents are probably going to [28:07] leave an initial round of reviews on PRs, and they're going to file tasks in your task tracking system. And so all this stuff... [28:16] needs like it needs tracking and it needs coordination and you need different ways of integrating it into existing systems. And so whoever like this is like what warps probably like our. [28:30] our biggest product focus for next year is on this type of, of, uh, evolution off of just like interactive agents into the cloud agents. Cause I think it's going to, it's going to be pretty transformative. [28:42] And I imagine that's like a massive infrastructure push. [28:47] to be able to kind of, you know, run up and spin up. It is. It's turning, yeah. So, like, for us, it's turning us much more from, like, a product to a platform. And so the way that we think about building this out is building it out in different layers of the stack where, like, [29:03] you have like an agent SDK, you have agent hosting if you want it. So like, [29:10] If you're a smaller company and you don't want to set up spots in the cloud for your agents to do their work, Warp will host that for you.

29:20-30:58

[29:20] there's a whole category of startups that are going into this business of agent hosting, which I think is really interesting. It speaks to like, this is like a real thing that's happening. There's an API layer for once you have the agent running, how do you get its status and how do you, how do you maybe take it over or see its progress? Where's it right? It's logs. And then there's like a management layer of like, what are all these things doing? What states are they in? What's the log? [29:50] produce PRs. And so I think it's cool because it's going to be the most impactful way to use these agents a lot. It's just not to have a person like driving them. I don't think that this means that the person driving the agent is going to go away either. I think that there's like, it's the sort of types of tasks that are going to start with these ambient cloud agents are more like toil tasks or things that are one-shotable. And I think harder, more interesting software engineering is still going to be done by a developer like, [30:18] at their workbench but um yeah this is how i see things evolving in the next year [30:23] That's awesome. And I think one of the more important scaling law charts is like the meter, like how long can the agent run for? Yeah. What are you seeing in terms of how kind of long horizon these agents can be? [30:37] I don't know. At its max, I would say doing real coding tasks for us now is like... [30:43] Thank you. [30:44] 20, 30 minutes, something like that. Maybe like you can have it run longer, just to be clear. But the problem is it will start going in circles still. There's still a context.

30:59-32:31

[30:59] limitations and like [31:01] It's a costly proposition to have agents... [31:06] running without people checking in. [31:09] and guiding them and just by far you get the best results [31:14] when the agent is really steered. So when you do like an upfront plan with the agent, [31:19] When you check in on the agent's work and sell. [31:22] I think... [31:24] Yeah, I think this will just keep on going up, but the... [31:28] I don't know. It's like hours and hours of work. It needs a really clearly defined task in order for that to even make sense to me. It needs to be doing some big code migration or some big, big task. [31:39] Got it. What do you think the product might look like when it's good at kind of [31:44] being this cockpit to manage, you know, swarms of these agents. [31:48] Yeah, so we're building this out right now. And we're having all sorts of internal debates on whether it's [31:55] It should be one product or two products. [31:59] Um... [32:00] The way that we're doing it right now and how I think other people will probably approach this is like a sort of like area of our app, which is about agent orchestration. [32:12] The reason I wonder if it should be a whole separate product for us is because like... [32:17] Um... [32:18] You know, it's [32:20] It very much, it feels more web-centric to me, which we can make Warp work on the web, but it's not the primary interface. It feels like potentially it has like a different user interface.

32:31-34:02

[32:31] some of the time. Yeah. But the advantage of having it bundled into warp is that it makes the handoff... [32:40] from one of these cloud tasks to a developer, extremely seamless. And so very, very common workflow. Like we have this thing, like we have it running in our Slack and our linear. [32:51] And so what will often happen is like, you'll tag something in Slack and you'll be like, you know, can you make this fix for me, change this button position or whatever. And right now you need a developer to like tie the loop on that. So it'll do the work in the cloud and then you'll bring it onto your local [33:10] environment where you can just keep working on it seamlessly. So short answer is I don't know, but it'll be a little bit more like a task management UI. [33:18] I don't think it's going to quite like... [33:20] I know there's like... [33:22] thoughts of like is task management the primary primitive for developers to be working with i don't buy that either like i don't think like every developer's been doing all their work out of linear jira but i do think there's some aspect of seeing what the agents are doing across various systems that developers are going to want [33:38] Yeah. [33:39] Awesome. I'd love to close by maybe talking a bit about the state of agentic development and how the software engineering market will play out. Yeah. [33:48] Sure. I guess maybe for starters, where do you think we are in terms of like the frontier model? [33:54] you know, capability, [33:56] uh, frontier. Like, where are the models good today? Where are they not? Um, yeah,

34:02-35:40

[34:02] Are people still producing other errors? Where are we? [34:06] Thank you. [34:07] So I'm constantly using this stuff. I'm somewhat biased because I use it on Warps Codebase, which is like a – [34:13] Very custom, big Rust code base, but I think that's still an interesting perspective. The agents can do what I would think of as medium complexity tasks pretty well if you give them a bunch of guidance. [34:28] They can't do... [34:31] Like whole big projects. [34:33] at least... [34:36] We haven't had success. [34:37] doing that they can't uh i don't trust them to make like very fundamental architecture [34:43] decisions for us. [34:47] So it's like you want like pretty constrained tasks, but they're well beyond... [34:53] doing trivial tasks, like change the button, color, take the text. Like, they can make apps. They're very good at zero to one. They can solve, like, kind of hard bugs. [35:04] We have a medium-sized feature. Like, I don't know what a good example would be. Like, I was adding a new slash command to warp the other day, and it's like I just tagged agents to do that, you know, in Slack, and it made a 300-line PR, and it was basically right. And so I think there's a bunch of headroom. [35:21] at the upper end. [35:24] If I had to put it on a scale of 0 to 10, I think we're at a 6, maybe. [35:31] So I think it's real. It's game-changing for how people work, but it's not at the level of doing what a full-time engineer on a hard product needs to do.

35:40-37:27

[35:40] Hmm. [35:41] And where do you think the bottlenecks are? Like, is it just people, you know, the models don't have enough context, we need to get better at giving them instructions? Is it just we need to keep scaling these things up? [35:51] What are the biggest bottlenecks? [35:53] So I definitely think context window is still a big issue. Um... [35:57] And even with the bigger context windows, it having like attention over the whole context window in a reasonable way is hard. [36:05] Yeah. [36:06] I think like... [36:09] There's like an issue of it always having to like relearn everything. Like memory is not... [36:14] It just seems like a slow, inefficient, repopulate the whole thing with a bunch of files. [36:22] take like there's no like continuous learning with it so it's yeah [36:26] it's like this big stateless thing where you're kind of always starting from scratch and have to fill it up before you can set it loose. That's, that sort of stinks. I would, [36:36] I would like to see that solved. [36:38] there's still like... [36:42] how do you use it effectively as a developer? We're very early. Like, this stuff didn't exist a year ago. And so how should you be doing context engineering? How should you be setting up your projects so that agents can work well with them? That's, like, a problem. If you were to look across how people on our team use Warp to build, it's, like, [37:05] high variance and... [37:08] you know, that's not great because it's like we have very, very great. We have very, very like rigorous standards around writing code and like almost no standards. I mean, we've tried around like how to use the agents. No one has been taught how to use the agents. There aren't even agreed best practices on how to use the agents. And so I think that's pretty nascent.

37:28-38:57

[37:28] Yeah, got it. My experience, whenever I try to vibe code a little bit, is that the coding models still produce a lot of errors. Yes, that's true. [37:43] Like, did it work or did it not? You should be able to RL it. And it's like, where are we today in terms of the state of... [37:50] you know, how frequently are we coming out and like, can we actually RL that or am I misunderstanding something? No, I think that there's still definitely producing errors. [38:03] It's... [38:04] It's interesting. So it's pretty... [38:06] infrequent that the agent at this point will produce something that doesn't compile for me. [38:11] which I think is an interesting milestone. So like... [38:14] I don't know, not that long ago, four or five months ago, that was a problem, like getting to a compiling version of the thing. [38:21] It compiles for me about 100% of the time right now, which is amazing. [38:26] Um... [38:27] It produces stuff with bugs and errors... [38:31] relatively frequently. I don't think it has a good way... [38:36] of... [38:37] closing the loop in terms of does the thing work? And so I think some version of browser use or computer use is, [38:46] where... [38:48] The agent can not only make the change... [38:50] but verify the change from the user's perspective, not the code perspective, is pretty important. Are people doing that yet? Yeah.

38:58-40:37

[38:58] We're working on stuff like that. Like the, the, the computer use, [39:02] All of the model providers have beta versions of computer use APIs and... [39:09] You know, browser use for sure, computer use we're looking at, like, I would be surprised if this wasn't a thing. And I think it becomes even more important of a thing pretty soon as more work is done remotely because the real pain in the ass with the remote work is verifying that it works from a user perspective. So I think that's like a big part of it. And then I think if you have that loop, it's probably easier to do RL and get to things that are behaviorally correct, not just like static compile correct. [39:37] Yeah, yeah, absolutely. Okay, well, looking forward to that. And I guess, do you think that we're going to reach a super intelligence moment here, like where the models are better at coding than the best human coders? [39:49] I have no idea. No idea? What I do think is going to happen is I think – [39:56] I don't know if this is super intelligent. I do think coding will be solved by models. And what I mean by that is I think that the limiting factor that we're going to come up against... [40:07] It's just like expression of intent. [40:09] from from humans in terms of like what do you want built how do you [40:14] Yeah. [40:14] How do you build it? How do you express that clearly? English is ambiguous. [40:20] Isn't coding the truest expression of intent, though? No. [40:23] Yeah, but the problem is we're moving from a world where people speak in code to one where they just speak in English to try to build apps. And so we're like reintroducing ambiguity because developers, people building apps are no longer...

40:38-42:20

[40:38] actually, um... [40:40] directly expressing what they want. They're going through this translation layer of telling it to a, saying to a model what they want. And then the model produces the code. So it's an interesting, um, [40:50] It's like an interesting step backwards there in a sense, but it's also way, way, way more efficient to do it this way. [40:56] Yeah, I think like we'll get to a point where you actually don't need to be on the frontier. [41:00] to have... [41:02] something that produces [41:04] code that is [41:06] as well matched to a person's intent as possible. So I, and I think that actually is an interesting thing from a competitive perspective, um, [41:14] I wouldn't want to be in the API business for... [41:19] coding tokens because I do think like... [41:22] At some point, you just won't need to be on the frontier and you're not going to be able to charge a huge... [41:27] margin on top of it, which is why I think actually you see [41:31] Anthropic and OpenAI and Google going so hard at the application layer because there's huge risk at the API layer. It's just for this vertical in particular that I think things are basically solved within a few years. [41:45] Yeah. I don't know that. That's just I'm [41:47] prognosticating. Well, that's awesome. Do you think that people will ever, are people already thinking about [41:55] the amount they spend on coding tools being... [41:59] you know, the replacement of what they would be spending on, you know, hiring a few software engineers. [42:04] Or are they thinking about it in their heads as buying a tool still? [42:08] So when we talk to enterprises, it is still viewed as like, by and large, as like a productivity boost. And that's like the way that it's being evaluated.

42:21-43:55

[42:21] In fact, it's really hard to measure even what, like, the effectiveness of this stuff. And so it tends to fall back to subjective measurements from engineers. Like, do you feel like you're getting a bunch of value out of this or not? Or maybe you look at, like, Dora metrics. Or, like, it's really hard to, like, to know. So I don't think that they're viewing it yet. [42:38] By and large, at least as as labor spend. And I think today, if you pitch like here's a two hundred thousand dollar agent to replace your two hundred thousand dollar engineer or whatever, they would be like, what? Like, no, like not even not even close. Like, yeah. [42:55] So, but I would expect that this starts to change. Why do you think we'll change that? [42:59] It's a great question. [43:02] I think it's like increasing the automation use cases. Or maybe another way of thinking is like if companies start to launch products without engineers – [43:13] I think that that will be like a major proof point. [43:17] And to be clear, I don't want this to happen. I'm like an engineer at heart and I don't want people losing their jobs, but there will be projects, products that are launched where there's like very, very minimal engineering involved. [43:28] And you're going to look at the spend for that and be like, okay, this was the cost of delivering money. [43:33] the product is, [43:35] And you're going to be like, okay, [43:37] With and without engineers, what's that like? So I think I think you need more of that to happen. I don't think that's happening very much yet. [43:45] Hm. [43:46] Got it. [43:47] And then maybe last question. I'd love to chat about how you see coding as an art form and therefore, you know, your role in the world evolving.

43:56-45:40

[43:56] You wrote this blog post I loved back in 2023, I think. Everyone should go give it a read. It's called, I think it's about the future of productivity interfaces being ask and adjust. Maybe say a word on that and how you think. [44:12] you know, [44:12] Three years in, how do you think that's evolved? [44:15] Yeah, so I wrote this pretty shortly after ChatGPT came out and we started... [44:20] I'm like trying to deeply integrate it into warp and the, [44:24] The idea was... [44:26] This is sounding really obvious right now, but the way that productivity interfaces have always worked in the past was that they were geared towards hand editing, right? And by hand editing... [44:38] It could be like you go into Figma and you're like, [44:40] drawing vectors or you go into Google Sheets and you're entering cells or you go into VS Code and you're typing code. And my thesis in that article was like, that's going to change to a point where the primary interface is one is a [44:56] I didn't have the word agentic at the time, but it was like AI based where you would ask the app to do the thing for you. And then you as a human author would be responsible for... [45:13] adjusting and adjusting might mean like reprompting. [45:18] Or it might mean free prompting fail. That might mean like. [45:22] going in like treating the prior hand editing interface as like using that to like complete your change. And I kind of think that's where we're at right now for a lot of like, especially for coding, it's really transitioning to you start by asking for something and then you adjust it.

45:42-47:31

[45:42] And, um, [45:43] Another thing I said in that [45:46] article which I don't know if it's right or not was that [45:49] I was thinking about, are you going to be able to get rid of the adjustment piece? And my thesis was that... [45:55] The area we're going to need the adjustment piece the least is in areas where there's, like, a lot of, like, acceptable solutions. So that would be, like, creative domains, like... [46:06] You know, if you ask for an image of something, there's probably... [46:10] a thousand outcome, a thousand images that might work for you. And so you can just reprompt, reprompt, reprompt until you get what you want. Whereas for something like. [46:18] code [46:20] or a spreadsheet where there's one thing that needs to be right that you would have to keep that ability to like [46:28] get it perfect with a hand editing interface so that was the thesis i think it's not it wasn't bad i think it's held up okay [46:34] Not bad. [46:35] Yeah, I guess you didn't coin agentic everything back then. No. Yeah, the thesis was spot on. You want to know something we coined at Warp? [46:45] Which we should have trademarked is... [46:48] Agent Mode. So we were the first product to launch a branded thing called Agent Mode. And if you like look this up on ChatGPT and just like ask like where did this come from? It came from Warp. And now that's like a... [47:01] a very common... [47:03] like way of describing the future, which I wish we were getting, you know, some kickbacks for that or something. Totally. I love it. Well, thanks so much for coming on to share what you're doing and, you know, your observations on the coding market as a whole. It's such a white hot competitive market and the way that you think, you know, the terminal will be the workbench of the future and how it's going to evolve. It was awesome to have this chat today. Thanks, Zach. Thanks, Sarah. It's awesome to be here.

47:32-48:01

[47:32] Music.

Want to learn more?

Ask about this episode