The Most Powerful & Accurate AI Coding Assistant: Sourcegraph Cody | SourceForge Podcast, episode #23

By Community Team

In this episode of the SourceForge Podcast, Beyang Liu, CTO & Co-Founder of Sourcegraph, talks about the evolution of their code intelligence platform and its AI-powered coding assistant, Cody. They discuss the challenges developers face with legacy code, the importance of context in coding, and how Sourcegraph aims to make coding accessible to everyone. Beyang shares insights from his professional journey, the competitive landscape of AI coding assistants, and the innovative features that set Cody apart, including its model agnosticism and advanced context retrieval capabilities. In this conversation, Beyang discusses the evolution and future of Sourcegraph, emphasizing the importance of automation in software development. He highlights the challenges developers face with large codebases and the need for tools that can streamline their work. Beyang envisions a future where AI significantly reduces the toil in coding, allowing developers to focus on creativity and innovation. He also shares exciting new features being developed at Sourcegraph and offers his perspective on the current AI hype in the industry.

Watch the podcast here:

Listen to audio only here:


Learn more about Sourcegraph.

Interested in appearing on the SourceForge Podcast? Contact us here.


Show Notes

Takeaways

  • Sourcegraph’s mission is to make coding accessible to everyone.
  • Cody combines AI with code search for better context retrieval.
  • Beyang’s experience at Palantir shaped Sourcegraph’s approach to enterprise software.
  • Cody is the first coding assistant to integrate context-aware chat.
  • Model agnosticism allows users to choose the best LLM for their needs.
  • Context is crucial for generating high-quality code.
  • Sourcegraph serves large enterprises with tailored solutions.
  • The evolution of AI has influenced the development of coding tools.
  • Cody’s retrieval augmented generation (RAG) approach is industry standard.
  • Understanding user needs drives Sourcegraph’s product development. Sourcegraph was built to effectively handle large codebases.
  • Automation in software development is a long-term vision.
  • Professional developers will remain essential in guiding AI tools.
  • The majority of a developer’s time is spent on repetitive tasks.
  • Sourcegraph aims to automate 99% of developer toil.
  • The creative aspect of development should be prioritized.
  • New features like prompt library enhance AI code generation.
  • Open context protocol simplifies integration of various tools.
  • AI’s impact on codebases will be significant and transformative.
  • The current AI landscape is filled with hype and overselling.

Chapters

00:00 – Introduction to Sourcegraph and Cody
02:55 – The Evolution of Sourcegraph’s Products
10:40 – Beyang’s Professional Journey and Its Influence
15:06 – Positioning Cody in a Competitive Landscape
19:37 – Model Agnosticism and Its Benefits
22:08 – Understanding Context in Cody
26:15 – Scaling Sourcegraph with Growing Organizations
28:03 – The Future of Automation in Software Development
32:39 – The Evolving Role of Developers
37:06 – Exciting New Features in Sourcegraph
40:53 – Industry Hot Takes on AI and Automation

Transcript

Beau Hamilton (00:05)
Hello everyone and welcome to the SourceForge Podcast. Thanks for joining us today. I’m your host, Beau Hamilton, Senior Editor and Multimedia Producer here at SourceForge, the world’s most visited software comparison site where B2B software buyers compare and find business software solutions. Today we’re talking with Beyang Liu, CTO and Co-Founder of Sourcegraph, a code intelligence platform that claims to revolutionize the way developers understand, fix and automate their code.

With their coding assistant, aptly named Cody, they’re harnessing the power of artificial intelligence to navigate large coding bases, find relevant snippets of code, and get historical context. They also have a code platform called Code Search, designed to help users quickly fix bugs, refactor code and improve importance among other things. So to talk more about Sourcegraph, let me introduce Beyang Liu. Beyang, welcome to the podcast. Thanks for being here.

Beyang Liu (00:54)
Hey, Beau, thanks for having me. It’s awesome to be here.

Beau Hamilton (00:58)
Awesome. Well, I want to know what it’s like being the chief technology officer at a major enterprise software company. So could you introduce yourself and just kind of give us an overview of your position at Sourcegraph?

Beyang Liu (01:10)
Yeah, so, you know, I’ve been at Sourcegraph since the beginning, co-founded it with, Quinn, our CEO, back in the kind of ancient year of 2013, now, but it’s, it’s been an amazing journey, it’s really been a place where I’ve had kind like a frontline seat to the evolution of software development over the past 10 years in a variety of roles. So, you know, as, as you probably know, you know, startup founders, you wear many different hats over the course of the evolution of the business. So my role has evolved from everything from coding a lot of the initial search and code graph architecture, implementing a lot of the initial UI. I have been frontline support for many early customers. I’ve been kind of like the people manager for product and engineering.

And these days, I’m mostly pretty hands-on with the code. So I’m pretty in the weeds with Cody to help lead the sort of like tiger team that we had internally to spin up that project. It was kind of like a new effort at the time. And so these days I kind of split my time between maybe like 50% coding and 50% just like spending time with users and customers and getting to understand their challenges and how they’re using our products.

Beau Hamilton (02:28)
Ok. I’m sure that’s pretty rewarding being really hands-on with the code.

Beyang Liu (02:33)
Yeah. Yeah. Those days are kind of like the best days when you have like a large, kind of like block, chunked off to really sit down and focus.

Beau Hamilton (02:42)
Nice. Yeah. So you guys have two products available, which I briefly described in my opening monologue. Could you kind of just describe your company as a whole for our listeners and break down maybe some of the key features of Cody and Code Search?

Beyang Liu (02:58)
Yeah, totally. So our company exists to make code kind of accessible to everyone. So our mission statement is to make it so everyone can code. And that starts with making professional developers making their jobs easier, making it easier to like build useful stuff. And so the way we started on that journey was both Quinn and I were actually working inside this very large, messy enterprise code base early on in our careers. We were a part of Palantir in those days and we were part of like a small crack team inside Palantir that was drop shipping into some of the Fortune 500 companies that Palantir was trying to land as new customers at that point. So we’re essentially building a lot of stuff in the field for these big enterprises in the context of their code bases.

And the key pain point that we discovered there, and this is something that I think like every software developer has felt in some way, shape or form is the pain point of dealing with legacy code. So everyone has legacy code. If your application is in production, you have legacy code because you have an existing code base that’s powering users where you don’t want to break things. You want to move quickly, but not break the thing for existing users. And so the initial piece of technology that we built to solve this problem was Code Search.

Code search effectively addressed the key challenges of retrieving and gathering the context from the broader code base and paging that into the working state in your human monkey brain. That, to us, was 90% of the challenge of working inside large existing code bases. Because once you have the context paged in, that contextual knowledge tells you what exactly to write. And the actual writing of the feature and debugging it, it may not always be quick, but at least it’s fun. And because it’s fun, it seems to go quicker. It’s the long, arduous slog of pulling all the different pieces of context and reading through all the existing packages and worrying that maybe you missed something or maybe the thing that you’re writing has already been written somewhere. That’s the thing that we wanted to accelerate with Code Search.

And so that was the initial product. We shipped it to a lot of the old customers that we used to work with at Palantir, as well as a lot of new ones along the way. And then over the years, we kept watching the AI ecosystem kind of evolve. And so my background is actually in AI. I studied it in college at Stanford. My undergrad thesis advisor was Daphne Kohler, who’s gone on to do some amazing things in AI. And so it was always kind of like in the back of my mind, you when is this technology going to mature and reemerge in a way that we can take full advantage of it? And really for us, that journey probably started around like 2019, 2020, when we started to see some signs of like, wow, like there’s been some recent advancements and large language models and things like that, that seemed really powerful. And we started to experiment with rolling more of those signals at first into our Code Search product, but eventually it grew into this new effort called Cody.

And what Cody has become today is really tying together the best in class context retrieval mechanisms that we built along the years inside Code Search. So essentially plugging in Code Search and all the code graph capabilities that we gave human developers previously and plugging that into the sort of like context retrieval or rag engine of Cody so that AI can also have access to the same enterprise code context as our Code Search tool.

And so what we do now is we combine the best in class LLMs, whether it’s Cloud 3.5 Sonnet or GPT-4.0 or Gemini or any of the other open source models like Llama, DeepSeq. We make those all available in Cody. And then we combine that with the power of our Code Search and code contacts. And together, those sort of form this kind of unified platform that is really essential, we found, for making these AI tools actually useful in the context of existing code bases.

Beau Hamilton (06:58)
Hmm. That’s really interesting. Yeah. I imagine when you were working with AI at Stanford, was it, kind of branded and marketed as AI or was it something else? You know, cause I’ve talked with other founders and they’re like, yeah, I mean, this is AI before it was AI.

Beyang Liu (07:13)
Yeah, that’s a great question. Like the terminology has sort of evolved. Like AI was always sort of like the broad umbrella term for everything related to, I mean, it covered everything from like machine learning to robotics, to motion planning, to a bunch of things. It was always, I think in those days viewed as like a fuzzier term. We like to call what we did in our lab machine learning because that was more to the point of what we’re doing. We’re kind like training and evaluating models on large data sets. And by and large, I think that has evolved into like the AI that is very popular today. It’s all kind of like machine learning based. Yeah.

Beau Hamilton (07:54)
Yeah. It’s very like all encompassing. Interesting. Yeah. Thanks for that distinction there. Cause it’s, you know, I think AI is just, it’s, it’s taken over and it’s, I mean, for anything remotely similar to, automation, it’s just AI slapped on top of it. Now as far as the, the product roadmap for Cody, you know, you guys were founded in 2013, was this product always, on the roadmap on day one, or was it kind of conceived relatively recently?

Beyang Liu (08:23)
It was conceived in probably the 2019, 2020 era. And the conception was, you know, in terms of like a lot of things, it wasn’t like super concrete at the time. It was just like this general idea of like, Hey, there are these things called large language models. They seem to be evolving quickly. They can do some neat things right now. You know, 2020 is still well before the release of ChatGPT. So, you know, it’s not obvious to everyone how these things could be useful. But we started poking around and we found the first ways in which we found this could be useful was as signals into our search ranking and Code Search engine.

So that’s where our experimentation started. And then those kind of initial early technical spikes eventually evolved and crystallized into a thing called Cody, which took shape as first an editor extension. Now it’s also available in web application form. And increasingly, it’s an API that powers sort of like a long tail of tools that our customers are building. But in the beginning, it was just a humble editor extension.

And it was really focused around two tasks. One was kind of like inline autocomplete. So completing the next line or next couple of lines that a developer was typing. And then the other aspect of it, which was new at the time, was this context aware chat. So we were the first coding extension, AI coding extension, to combine LLM code generation with retrieving context from your broader code base. And we were able to do that because we had an existing retrieval engine in the form of Code Search.

Beau Hamilton (09:58)
Interesting. Ok, so that’s kind of how it sort of evolved to meet the demands of modern software development, I suppose. Now, you have quite the resume. I see you worked at Google. I mean, you mentioned some of this. You worked at Google as a software engineering intern, a research assistant at Stanford. And for almost two years, up until founding Sourcegraph, you worked as a software engineer at Palantir.

You know, I’m sure there’s a bunch of other products or projects you were involved in as well. So how did this, all this experience sort of influence the design and functionality of Sourcegraph’s products and specifically with, with Cody, because I think that’s really your, kind of bread and butter and future rich product.

Beyang Liu (10:37)
Yeah. So I took inspiration from kind of all those stages of my professional life, I suppose. So you know, the experiences from Palantir really informed the way we thought about building a tool for day-to-day development. It gave us firsthand exposure to large, very large, very messy, very old code bases. And so that gave us a sense of like the scale of what we needed to tackle, both in building up the original sourcecraft Code Search engine and in building Coding to scale to these large code bases and these large enterprises.

The experience at Google was also formative because Google has a really good internal developer tooling ecosystem. And if you talk to anyone who’s ever worked inside Google before, they will say that that experience probably change the way they think about what is possible in terms of developer tools. And so broadly speaking, there’s kind of like two buckets of developers. When we think about like, you know, how we’re kind of like marketing our product, actually, it’s like those who’ve worked inside Google or some, you environment similar to that, and those who haven’t because the two developers actually have kind of like a different way of thinking about tools. And in particular, if you ask any ex-Google dev, what is the developer tool you miss the most, I think invariably they will say Google Code Search. It’s Google’s internal Code Search engine that really saved them a ton of time. And so we definitely took a lot of inspiration from that at, sort of like a high level in terms of wanting to bring that level of developer experience to every company and every organization in the world.

And then lastly, like the experience in the AI research lab, I think it, really trained me a lot in just like the general way of thinking about how to develop with AI. In particular, one of the principles that I took away, from my experience there was always do the dumb and simple thing first, always establish a baseline in your experiments. Don’t go chase the like shiny new thing that has, you know, the hype around it necessarily always be grounded in the way that you approach the problem.

And so that really informed how we built Cody. You know, if, if, if you rewind to November, 2022, when know, ChatGPT first landed and there was this like huge hype wave. I think you go back and like, look at the discourse then, a lot of people felt that like, you know, training foundation models from scratch was the only way you could create serious differentiation in the space. Or at the very least you wanted to invest in fine tuning, like something involving training because training models is cool, right? Like everyone wants their own like branded model. And, you know, we, we looked at the problem kind of differently, you know, we were not out to raise like a, a huge, you know, hype round from, from, VCs. We’re really focused on serving the needs of our existing enterprise customers. And when it came down to brass tacks and making this stuff actually useful day to day, what we found was the quickest path forward to value, and also the quickest thing to iterate on, was building a retrieval augmented generation engine. So a RAG engine that took advantage of our existing Code Search and our code graph, and plugging that in as context fetches into the LLM context window.

And I think these days you ask the people working at the frontier, this is like a common opinion. It’s like, yeah, RAG is the way to go because, you know, LMS these days are pretty good at in-context learning. And there’s a bunch of nuances around, you know, combining rag with LMS where they actually outperform models that have been fine tuned or in some cases like trained from, from scratch. and I think like everyone else has kind of like, realized this through a lot of trial and error for us. It was just like those quick early experiences, experiments guided, our path forward. And we just didn’t, we had no real inclination to, to chase the hype. We just wanted to build something that was useful. And so we started with the simple dumb thing, which is RAG. And now that has come kind of like the industry standard for applying AI to, to, to production code.

Beau Hamilton (15:01)
Interesting. Yeah. I’m glad you mentioned the, RAG, because I think that’s really fascinating, just the, the model you have with, with that API. but first I kind of want to talk about some of the competition, you know, I know there’s a bunch of, competing AI coding assistants. You know, there’s one that, one that comes to mind, GitHub Copilot, there’s, there’s Tabnine. I know Amazon CodeWhisperer is gaining some, some traction. You mentioned also Google’s Code Search. How have you positioned Cody as the de facto leading AI coding assistant in this space?

Beyang Liu (15:33)
Yeah, so Cody is really the only coding assistant that does two things. One is we’re pushing the frontiers of innovation. So we were the first to release code base aware chat and code generation. And we have the most advanced context engine today. So in terms of like context quality, if you go in and you use these sort of like code base aware retrieval mechanism of any other assistant, they are far inferior to ours. And we found that it just like matters a lot in the context of an existing code base, because just like you as a developer can’t be effective if you can’t actually go and read the code and understand the code effectively. Imagine I told you to go build a feature inside your existing work code base here, your production code base, but you’re not actually allowed to read the code. You can only search Stack Overflow and read through open source code. That’s effectively what you’re doing when you talk to an LM and ask it to generate code without really good context fetching abilities. And so we’ve been pushing the frontier there and Cody is by far and away, like the best solution in the market.

And at the same time, we’re the only one that’s pushing the frontier that has also created the ability to serve very large enterprises. So, you know, we’ve been serving large enterprises, Fortune 500 companies, for a good chunk, probably, probably like eight of the 10 years that we building this, that we’ve been building this company. and that really comes from, you know, Quinn and, and, Quinn’s and my background at Palantir. We were there, you know, early on in our careers, working with, enterprise customers and, developers working in these very large code bases. And we took from that experience, like the pain points and challenges, that, those devs feel. And so we were very enterprise first in the way that we delivered Sourcegraph. Our first customer was actually Twitter, which was, at that point, they had probably several thousand developers, and we deployed to the entire organization. And I don’t think we would have been able to do that if we hadn’t had experience working with large development organizations from the get-go.

And so these days, what we’re essentially able to do is we can take the latest innovations at the model layer. We can take a the improvements that we’ve made in terms of context quality that make, you know, Sourcegraph the best Code Search engine in the world. And we basically funnel those through to Fortune 500 companies, large enterprises, even government agencies at this point. And that innovation funnel, you know, the ability to take basically bleeding edge technology and put it in a form that satisfies all of the security and compliance and scale requirements of these large institutions. We are the only company and we’re the only tool that’s doing that effectively right now.

Beau Hamilton (18:23)
Okay, interesting. So building on that last question, you know, I mentioned the competition. I want to point out for the listeners, you know, many of these assistants, they’re using LLMs, large language models with their own capabilities and limitations. I mean, that’s the most simplified way to say it. You know, some excel by supporting a wide range of programming languages, but struggle with maintaining full context, I would say in longer and more complex files.

Some produce bogus code. Some are more ethical than others. I think the ethics around data scraping is becoming more important for users. When I was researching Cody, I discovered something I never knew about, and it sounds like a real game changer. It’s called being model agnostic, which is when a tool isn’t tied to any specific model or algorithm and can switch between LLMs as needed. And that way you kind of get the best out of each LLM, essentially.

So why did you decide to make this or go this route with Cody and what are all the different ways developers gain from this approach?

Beyang Liu (19:22)
Yeah, so that’s a really good call out and thanks for doing your homework there. So we made Cody model agnostic from the get go because we wanted, it was really two things. One was just looking out for our users. We wanted to give our users freedom of choice to use the very best frontier models. And at the time we created Cody, there was this kind of like one player that was like very much in the lead.

But already we’re starting to see signs where some other models were good or better in specific use cases. And so we didn’t want to make a choice for users there. We felt that it was going to be a competitive race ahead of time. And we wanted our users to be part of the learning and exploration process. It’s kind of like you don’t want to pick the horse for your users. You want to bring your users into experiencing the latest and greatest and being able to choose the LLan that’s best for their use case.

And that kind of rolls into the second reason, which is we didn’t feel confident that there would be one clear winner. We thought it would be a competitive race. And I think now, here in 2024, it absolutely is. are at least three players in the market that are all pushing the frontiers. And we’re already seeing a ton of specialization. Some people do the long context thing better. Some people do better on reasoning. Some people do better on certain types of code generation tasks. And so we don’t want to be too opinionated about choosing the best model for the task. And this is especially true inside enterprises, because every enterprise code base is different. And there’s a long tail of customizations that you may want to build.

And that sort of fits within our general philosophy. Like if you look at what system Sourcegraph plays nicely with, it plays nicely with every code host. It plays nicely with every editor, every language, and now every major model provider. So freedom of choice, that’s a big principle for us as a company. And it totally made sense when it came to model selection in Cody.

Beau Hamilton (21:33)
Yeah, it really just seems like the best of both worlds, you know, having that option. And so you mentioned context in your answer. You mentioned it early on too. Could you kind of just explain more about, you know, Cody’s approach to context and what does the context mean in this case and how do you, I guess, build it into Cody?

Beyang Liu (21:52)
Yeah, so I think the best way to explain this is think about what you do as a human developer. Like if I hand you an issue, like a bug to fix or a feature to build, and then I plop you into the middle of a very large existing code base, you don’t immediately start coding. You don’t even know immediately where to start coding. A bunch of your pre-work, probably like the thing that takes more time than the actual coding actually is, searching through that code base, finding different breadcrumbs, walking the reference graph. you know, clicking go to definition or find references a bunch of times, and then paging all this working state. Like you’re building a mental model of the code by reading through a lot of it. And you’re trying to page as much of that context into essentially like the L1 cache of your brain, so to speak. So like your working state. So it’s all kind of like readily available. And once you’ve done that, then it becomes much clearer, Ok, I need to modify this file or this class. Here’s the thing that I want to add. It’s got to play nicely with this other API. And so that’s the journey that you take as a human.

LLMs have to take a similar journey. You can’t just ask the naked underlying LLM to write something that’s useful in the context of that code and expect it to work. If you do that, it will generate something that is the best taken from the median of open source code or stack overflow, but doesn’t make sense in your code base. Or worse, if you tell it to use your internal APIs, it will just hallucinate APIs that fit your written description but don’t actually map to how those APIs behave in practice. And so really, you have to give LLMs the same capability to read through and search through relevant snippets of code and pull the most relevant snippets into the context window. The context window for the LLM is really the analog of kind of like the working state of the human brain.

And so like the way we do that isn’t exactly the same way as a human does it, but there’s a lot of parallels, right? Like there’s a searching capability so that we can find these like needles in a haystack. There’s also like a code graph capability to allow us like pull references and things like that. And then we’ve also built these kinds of integrations for all these other sorts of context sources, like the long tail of other tools that you might use, whether it’s like Jira or Confluence or your issue tracker or your Google Docs, whatnot. We’ve built integrations to pull context from those as well, because again, think about as a human, you might pop into some of those tools to gather the necessary context to come up with a solution to your current issue.

So all that stuff we plug in to this kind of context engine. And then the context engine is responsible for taking the kind of proposed relevant snippets from each context source, sorting through them, and pulling the most relevant pieces into whatever space you have available in the context window. Because each LLM has a different context window length that’s available. So our engine is designed to make best use of that space, while still preserving enough of that space for the user to actually describe what they’re doing and for the LLM to generate its response.

Beau Hamilton (25:13)
Interesting. Ok, huh, that’s such a, I appreciate that breakdown, that technical breakdown there, because it seems pretty clear that, yeah, context is huge. I mean, if you don’t have the context awareness, I mean, you can’t even really get started on your project.

Beyang Liu (25:30)
Yeah, exactly. It’s like garbage in garbage out. If you don’t have context or you have poor context, then the generated code or the answer that you get from the LM is going to be very low quality.

Beau Hamilton (25:42)
Yeah, that’s fascinating. I think you really have the right approach here and it makes a lot of sense. Now, I want to switch gears a little bit, and focus on the business side of things. How does Sourcegraph scale and adapt as organizations grow and scale their own code bases?

Beyang Liu (25:56)
Yeah, so this is something that we had to figure out very early on in the company’s history, because like I said, like our first customer is Twitter and they deployed to like all their developers. And so even before that, actually, we were thinking towards scale because the challenge that we were really trying to solve was how do get developers effective in large code bases? So we thought the easiest way to tackle that was just to index as much of the open source world as possible initially. So even before we had our first customer, we were indexing a good chunk of the open source code that exists in the world.

And so these days, Sourcegraph scales really nicely, actually. We’ve built it from the ground up to be horizontally scalable. So there’s kind of like a multi-service architecture. I hesitate to use the term microservices because I think that’s a bit overused. But there’s definitely like different functionality split out to different services so that they can be horizontally scaled. like there’s this search indexing process. There’s like an in-memory searcher. Those can scale independently of the kind of like API server, which is more of like a lightweight wrapper.

And all these architectural decisions kind of evolved so that we could kind of like make most efficient use of the resources that we have in any given environment while still scaling out the components that needed to be sort of like massively parallelized in the context of many users working on a very large code base.

Beau Hamilton (27:23)
Okay. So let’s go like a little bit more macro even. You know, you’re clearly a really smart guy. You got a lot of vision. What are some future trends that we should look out for in the software development industry? You know, AI is perhaps the most obvious answer, but that’s, that’s already in the here and now what’s next on the horizon and how is Sourcegraph kind of preparing for that.

Beyang Liu (27:44)
Yeah. So think long-term we see a path to more and more automation in the software development life cycle. And I think, yeah, the analogy we like to draw is with something like self-driving cars where, you know, we’re, we’re first of all, like the vision is something that seems amazing. It’s like, you just don’t have to worry about driving anywhere. Transportation is fully automated. You just get in the vehicle and that takes, it takes you where you want to go. I think the analog for that for software is like you envision the app, you can describe it, and then a human has to do very little work to make that software a reality and get it into production.

And I think our vision forward it’s going to be kind of a long journey to the path of full automation. So for now, what we’re focused on is really making AI just like effective for everyday use by professional developers. And we think that the professional developer is going to remain a key role because it’s really the professional developer that has the expertise to guide the AI in a way that makes it useful. AI is kind of like a tool that a professional dev can wield at this point to amplify what they’re able to do and what they’re able to build.

And I think the path forward here is how can we target more and more of the toil that developers experience because like a developer’s job right now is probably like 99% toil in some way, shape or form. It’s only the 1% that is like truly creative spark. That is the fun part. That’s what gets us up out of the bed. That’s probably the reason why we became developers in the first place. It’s that thrill of creation. But the day to day of professional software engineering is just filled up with all these like rote tasks like acquiring context from the code base or writing a function that’s probably written, been written somewhere else before, or very similar to something else that’s been written before, sort of like boilerplate generation, expanding your unit test suite. All these things are candidates for automation. And so that’s kind of like our vision forward. We want to eventually automate 99% of what you do as professional dev.

But, and here’s where I would draw a distinction from the self-driving car analogy. I think with, with things like cars, it’s like, okay, if the car can get me from point A to B, like what role do I have left as a human at play? Like, no, none really. Like I just get in the car as a passenger and then, you know, I’m just trying to get to my destination. So I get out. I think software is, a much richer field than transportation at end of the day. And, the guiding force of software is not as simple as just getting from point A to point B, it’s figuring out like what B looks like, you know, what, what B, what, is the destination you actually want to arrive at? You know, what’s the best thing for your users? What’s the best way to combine these things using, you know, the different pieces of technology that you have available to you. and so there’s always going to be these like creative aspect in software development that’s going to preserve, that’s, what’s happening now is that creative aspect is just, it’s been squeezed into this, 1% of your day that you actually have time to get around to doing that stuff. And what we want to do is we want to expand that 1% to be like, you know, 90% of your job. So essentially make professional software engineering much more enjoyable day to day. And as part of that, you know, we’ll also make it much quicker to build things and much more efficient to ship things into production.

Beau Hamilton (31:21)
Interesting. Yeah. No, the, the self-driving car analogies is, super relevant. I think it’s definitely, I mean, Tesla, Elon sure thinks that’s the future. I think it’s, it’s, I think it is, it’s just a matter of the timeline, right? You know, the matter of, you know, he’s, he’s kind of famous for, being optimistic with the timelines, but at the end of the day, yeah, it’s automation and it all kind of boils down to, to code and code bases.

So I’m curious to see where it goes, but, I, do you think, could you kind of like talk more about like, where you see the, the, the future developer, like, how is that role shifting with this automation? I mean, you touched on it, but, If everything’s automated, what are the main roles developers are going to be kind of concerned with?

Beyang Liu (32:07)
Yeah, so I would answer this question with basically like a question to developers. It’s like, think about how you spend your time today. Like how much of your time do you spend on writing unit tests, on reading through existing code, on pulling context from various sources, on kind of like mechanical tasks that seem boring and tedious, but you have to get through it, on like context acquisition, like pinging folks on your team for the context that they have, or fielding questions from your other teammates, you know, about questions they have about the code. So like add all those things up and then, you know, the next question is like, how much time do you actually spend on like thinking about data structures and algorithms and mapping that to user needs, and then actually writing the actual code that implements that and delivers the value to the end user.

And I think for 99% of developers, the stuff on my left hand here, the toil some stuff, far outweighs the fun part of the job, the actual creating stuff and building stuff that’s valuable to end users. And so what we want to do is we want to expand the creative fun part of the job as much as possible. So like, Sourcegraph and Cody will change what it means to be a software developer. it will automate, the vast majority of the stuff that takes up time today. So in some sense, we’re automating the job of a professional developer, but I don’t view that as a bad thing. I think it’s, it’s actually getting the reality of software engineering back to like the core essence of what we all want software creation, software engineering to be, which is that, that thrill of creation, that thing that got us into development in the first place, which is being able to like, you know, dream up an idea, write some code, hit enter, have it run and have it do the thing that you wanted it to do.

So like, you know, no more time wasted on like boilerplate or, or these like menial mechanical tasks. It’s just like thinking creatively about you want to build and then, and then going to build it. That’s the future that we want to bring into reality for software engineering. And that’s going to require tools that are able to automate as much of this as possible. And also tools that can effectively fetch context from like massive, massive code bases. Like if you think that there’s a ton of code in the world right now, just you wait, you know, we are, are on day one of like the AI revolution and this is going to expand the amount of source code that exists in the world by orders of magnitude.

And so if, if you say that, if you go to like a senior engineer and say like, I’m going to expand the size of your code base by orders of magnitude. I think the reaction you get is not excitement, but, fear and like, holy crap. Like that’s, that’s going to slow us down. But what we want to do is enable the code to expand rapidly. because a lot of that code is going to be powering, you know, like new user experiences, new applications, but make, make, make all that code just like tractable, make it easy to work with and make it so that you can continue to build cool new things on top of the all that existing code without having to deal with all the boilerplate and the tedium that crowds out your day now.

Beau Hamilton (35:29)
Yeah. Yeah. If you, if listeners, if you’re tired of artificial intelligence and AI and all the marketing around it, you know, just, you wait, we’re, we’re only on day one. So it’s, I mean, there’s, only going to be, just increasingly intertwined with, with our daily lives. And it kind of just makes me think of like, you mentioned the legacy code bases, that you have to work with. well, there’s just so many legacy, just, businesses out there that still have to be updated and automated and…

Now, are there any specific new developments or features Sourcegraph is cooking up that you’re particularly excited about and want to share with us?

Beyang Liu (35:06)
Yeah, so we’re moving fast. We should have a lot of features in the past month or so. The latest thing that’s getting a lot of excitement, actually there’s a number of things. The first thing that comes to mind is this thing that we call prompt library. So essentially what we found is that a lot of people who are new to AI code generation, they’re not actually sure how to prompt LLMs. Like sometimes they ask questions where it’s like, you know, count the number of files in this repository. And LLMs aren’t really great at counting yet. They have to be able to use external tools to execute that. And so we built this prompt library feature into Cody that essentially allows people that are more experienced with AI or who are doing exploration with what prompts are effective to produce those and share those with the rest of the team.

So we’ve had people create prompts that are specifically targeted toward unit test generation or writing docs or explaining code in detail or explaining code succinctly. And just like a long tail of different tasks and sources of toil that they hope to automate with large language models and AI. And so now those users are able to not only make those things for themselves, but also share those with others and also receive ideas from others as well. And this is something that’s gaining quite a lot of traction.

Another element that we’re really excited about is this protocol called open context. So I touched upon this before, but it’s kind of like an LSP for context, if listeners are familiar with that. It’s really, it’s kind of this like narrow waste protocol that is meant to be a kind of like common layer on top of any source of technical context. So whether you need to pull context from your issue tracker, or your production logs, or even your corporate chat. The challenge with integrating all those context sources into your developer tool, into your editor these days, is that each tool requires a distinct integration. What we’ve done is we’ve created a protocol that is a common layer on top of all those tools, all those context sources. So you build an integration to the protocol, and then automatically that context source is now available for any consumer of that protocol as well.

And we already have a number of consumers of that protocol. know, Cody itself, Sourcegraph, the web UI that allows you to navigate the code. And then our customers are also building a bunch of clients against that API as well that can take advantage of this long tail of contacts that you have inside your organization.

Beau Hamilton (38:39)
Are yeah, are these features, so are they already out now or are coming soon?

Beyang Liu (38:46)
These are out now.

Beau Hamilton (38:48)
Out now. Okay. Do you have a quarterly release cycle or is it kind of just like as these features are done and ready, roll them out?

Beyang Liu (38:56)
We release on a weekly basis. We cut a new release. Actually, if you’re on the pre-release build of Cody, it’s daily. So if you’re on the nightly release, you’ll get updates as they come. For those who want the more stable release, it’s on, sorry, not weekly, bi-weekly cadence. So the stable release gets more testing. It’s just two weeks behind the latest and greatest.

Beau Hamilton (39:23)
Wow, okay, so yeah, you guys are cranking out updates. I love that. Well, coming down to the last couple of questions for you, really the last meaty question I have for you, and I think it’s a good one because it kind of gives you the opportunity to let loose and tell us how you really feel. Listeners, I think you can sort of think of this as like a little reward for sticking through to the end of this episode.

So my question, Beyang, is do you have any industry hot takes? And if so, what are they and what will their impact be?

Beyang Liu (39:55)
Yeah. So I think my, it’s a funny question to ask, but I think, maybe my hot takes, most of my hot takes fall largely under the umbrella of like distinguishing between AI hype and what’s actually helpful, in AI. So I just think that there’s a ton of hype right now. And, this may seem like a strange thing for the, the, the CTO of like, an AI forward developer tools company to say.

But I actually think that most tools in the market right now are overselling the value proposition. It’s all too easy to create a snazzy demo of something that seems like very impressive. Like I’ve automated the entire SDLC. You no longer need developers because this thing just writes the code for you. Throw out your CS degree because that’s the thing of the past.

I think that all those companies that are pushing these solutions are kind of overselling. And to me, it’s sometimes unclear what audience they’re trying to speak to, whether they’re actually speaking to professional devs who got to solve real problems and production code bases day to day, or whether they’re trying to speak to some sort of investor class that just wants to ride the current hype wave.

And so that is essentially my spicy take. It’s like the most useful applications of AI actually are giving it the right context and making it sort of applicable in the context of working production code bases. Because there’s a lot of code in the world. Most of it is actually private inside organizations. And that percentage only grows if you talk about software that’s actually backing live applications that are actually serving, you know, millions and billions of users or, or, or supporting, you know, trillions of dollars of revenue. All that is in production, quote unquote, legacy code. It’s really just existing code that you have to sift through.

And I think the right focus that we’ve chosen is really to make professional developers effective in those code bases, starting with the basic building blocks of like choosing the right model and choosing the right context. That’s the thing that has the most immediate impact on actual productivity today. And from there, we are going to build our way up. So we are going to gradually build first from the kind of like interloop use cases where the human is actively directing what the element is doing. The next phase is to add some kind of like minor, very minor agentic feedback and tool use into this picture. But the folks that are trying to sell you this dream of like somehow we’re going to leapfrog this whole journey. We’re going to jump straight from like, you know, DARPA Grand Challenge to robot taxis in the next like 18 months. I think that my prediction is that that will not materialize. And I think all that stuff is just kind of overhyped at the moment.

Beau Hamilton (43:10)
Yeah, no, I can, I can totally see that. It kind of makes me just think of like the, the dot com bubble. Do you, do you think we’re in a bubble right now? I mean, it’s, it’s, I feel like it’s hard to deny that we are, because there’s so many similarities and parallels with, with the dot com bubble and how like, well, the internet was really, taking off all these companies were starting up and promising all these different things. You can order pet food, you can order pizzas online. You can, you know, like essentially, task rabbit, you know, have anybody kind of, fix your issue with your home or something, you know, and, like, granted those things ultimately did come true and like, can, they’re commonplace now, but it took a lot longer. So do you see like kind of a similar kind of situation happening?

Beyang Liu (43:56)
You know, like if I could, if I could predict the stock market, I’d be like, you know, a hyper rich man, but, I don’t know what the macro, how the macro environment will evolve. What I can say is just from where I stand in terms of like the customers that we’re talking to, we’re talking to a lot of the largest enterprises and organizations in the world. A lot of folks there have already been burned a little bit by overhyping things. And so a lot of times when we engage with customers, it’s because they’ve already chatted with a couple of other players that are selling a grand vision or a dream that just hasn’t failed to materialize.

And so, yeah, like I do think that there’s a lot of hype in the system. I think a lot of companies will quickly learn that you got to do way more. There’s a lot more to building a compelling user experience that’s robust and effective for day-to-day use than building a snazzy demo. And so I think that there will be a correction. I think there are a lot of companies that just haven’t thought through the hard challenge of connecting that initial demonstration of potential to something that actually gets strong day-to-day use.

Beau Hamilton (45:07)
Yeah, I guess the question is always like when and that’s kind of impossible to tell. But I think the human nature of just hyping things up and speaking in hyperboles and everything is just, it’s natural, but it’s also kind of our, not great side of things. It kind of gets us in trouble. You look at the politics and everything around that.

Beyang Liu (45:031)
I think, I think that like the, also like the, tragedy is that, maybe not the tragedy, but it’s, it’s the timing, timing is everything. And I think that there’s actually a lot of like, you know, visions that are kind of directionally, right. It’s just a matter of like how quickly we’re going to get there. You know, it’s like, if you go back and like, look at Webvan, you know, like there are comps to Webvan that kind of have successfully executed on that vision that exists today and are like very large companies. But WebVan was just too early. They oversold too much and they ran out of runway before they were like actually focused on that, in that eventual vision.

I think we share a lot of the same vision of, you know, automating as much of the toil out of software development as possible. But I think our approach is really to work with our customers and figure out what we can deliver today and tomorrow and kind of iterate with them in lockstep such that along the way we can deliver value. So it’s not this like all or nothing dream that forever recedes into the future.

Beau Hamilton (46:32)
Yeah, I think that’s a good approach. Well, you guys heard it here first. On that note, I want to ask you, where can developers learn more about Sourcegraph and how can they get in touch with you and your team?

Beyang Liu (46:43)
Yeah, so if you want to learn about Sourcegraph or Cody, you can go to cody.dev. That’s the landing page for Cody. It’ll tell you how to install any of our free editor extensions. And if you want to learn more about Sourcegraph, you can go to sourcegraph.com. And if you want to get in touch with me, I am on X, so x.com/beyang . I respond to DMs. Also feel free to email me or reach out to hi@sourcegraph.com. That’s sort of like a all purpose email handle that we maintain. And look forward to hearing from you.

Beau Hamilton (47:17)
Cool. Yeah. Well, listeners, you got a lot of, a lot of, you got some homework, got some links to click. So go, go check them out. Go follow.

Well, thank you so much for, for taking the time out of your busy day to sit down and talk with us about Sourcegraph. Appreciate it.

Beyang Liu (47:31)
Cool. Thanks so much for having me, Beau. This is a great conversation.

Beau Hamilton (47:35)
Well, if you guys enjoyed this episode, we’ll have another episode with Sourcegraph coming up very shortly, talking with the head of programming and one of the masterminds behind Cody, Steve Yegge. So stay tuned for that. With that said, thank you all for listening to the Sourceforge podcast. I’m your host, Beau Hamilton. Make sure to subscribe to stay up to date with all of our upcoming B2B software related podcasts. I will talk to you in the next one.