Interviewing Software Engineers in the Age of AI
an open letter to anyone hiring software engineers now or in the near future
At the beginning of the year, one of my optimistic, if perhaps more unrealistic predictions was:
At least one, but certainly not the majority of major tech companies dramatically changes hiring methodology in light of AI
I’d be happy to be wrong about this, but if it’s happening, I certainly haven’t heard anything even plausibly in this direction. I know we’re only about halfway through the year, but it seemed like a good time to start checking in on some of these predictions. I’m also moved by accounts like this: there but for fortune go you or I.
I’m not huge into credentials and mostly think that arguments should speak for themselves, but I’ll start by giving a bit of background on my experience as both an interviewer and interviewee in tech, and then we’ll get into existing problems which are largely well known. After that, we can break a bit of new ground by talking about how AI dramatically exacerbates these existing problems and then propose a new path forward. I recommend skipping ahead for anyone for whom this is well-trod ground.
Also, my hope is that while this highly specific to tech hiring, it generalizes to other fields in terms of the impact of AI, but I don’t think I’m the person to make those connections. Just want a disclaimer that I think this post is a bit more specific than a lot of the writing I’ve been doing so it’s easier to opt out sooner.
My primary thesis is that traditional software engineering interviewing practices were deeply strained, but AI is multi-ton whale that breaks them and completely lays bare their inadequacy.
Interviewer
I’m not one of those folks who worked at Amazon for a decade, became a “bar raiser” and literally interviewed on the order of 1,000 people (if you are interested in hearing from someone like that check out A Life Engineered — best software engineer career advice I’ve come across). But I have interviewed on the order of 100 people and taken the task quite seriously. The entire trajectories of people’s lives can be influenced by the jobs they get. It’s a big responsibility — to those interviewing, to your team and to the business.
And it’s hard. It’s commonly underestimated how hard it is to be good at interviewing and it’s borderline bizarre that it’s thought that a typical software engineer with a few hours of training and a half dozen shadow interviews would be competent in this area. We don’t select for this skill at all in the hiring process, so why would engineers be good at this? It’s almost an entirely disparate skill set.
Some things I’m proud of having gotten good at:
helping nervous, shy or introverted candidates feel more comfortable verbalizing their thought processes thereby giving them a substantially better chance at success
deciding when personality traits like narcissism, combativeness or condescension outweigh the benefits of strong technical performance
guiding candidates to their strengths — everyone has something to offer and it’s up to an interviewer to find out what it is and if it’s a match for what’s needed
having the strength of conviction in my own judgment to advocate for candidates who are under-rated by others (or against when they are over-rated by others)
developing strong opinions about interview question design
This was a long road — I started off pretty bad at all those things. I’m grateful to all my mentors along the way. And I’m sorry to the candidates where I got it wrong, I did try.
Interviewee
Pre-2023 my experience was pretty limited here. My path was pretty much warm lead after warm lead by following colleagues from one startup to the next. This diverged slightly in 2017 when I interviewed at Twitter without knowing anyone I’d worked with before. I’d had very limited traditional big tech interviewing experience, was pretty over-confident and mostly I think got a bit lucky to have passed.
Then 2023 rolled around. I’d been on sabbatical for a few months post voluntary severance at Twitter and I started prepping for interviews. I spent about 20 hours a week for months primarily doing interview prep — with a bit of job searching and applying thrown in.
For system design I’d whiteboard a couple problems from these books (1, 2) per week and then compare my solutions to those offered in the books. I also watched all the videos on this youtube channel (it’s only six videos but disproportionately useful compared to most content available in this vein at the time). In addition to getting better at interviewing, I did learn some useful things from this process.
But the rest of the time was primarily grinding leetcode questions. I absolutely hit my peak coding fluency and do not expect to ever be able to return to those levels again.
This did not make me a better software engineer. Not even a little bit. Was I able to code from memory arbitrary problems featuring themes like linked lists, depth first and breadth first search and graph traversals? Sure. Could I estimate the big-O runtime of these solutions? Yep. Did I also know that if I got a dynamic programming problem that I would almost certainly flail around and fail the entire loop? Most likely.
It was a challenging time in the industry to find a job. The response rate to my cold applications was around 2%, but with referrals it was closer to 50%. I don’t remember the stats on my phone screen pass rate to full loop and then the full loop pass rate. But I got one offer and I took it. I’d spent roughly 250 hours doing interview prep basically as a part time job. It is a deeply disappointing use of my human capital to invest so heavily in a purely instrumental skill set.
And while you might not think it given those stats, I am a good engineer. Not the best one that people will ever work with, but solidly above replacement level for most teams. I don’t have a good sense of where I fall on the distribution at the skill of being interviewed — I think it’s roughly comparable, but probably lower despite all this investment.
A representative anecdote:
Select the kth element from an unsorted array
I’d never solved this problem before, but I was able to come up with a heap based solution that runs in O(n log k) which is a substantial improvement for most values of k over the naive sort first solution which is O(n log n). I felt pretty good about this solution.
But what the interviewers wanted was an implementation of quickselect (Hoare’s algorithm) which is expected to run in O(n). I could not re-derive a challenging approach that was literally published in a journal in the remaining 5 minutes of the phone screen interview. I did not move on to the full interviewing loop.
We’ll revisit this theme in more detail later, but here is o3’s detailed response (at least my solution made the list).
Existing Problems
Not only have I never needed to implement quickselect or quicksort in a work context — it would be professionally negligent. Libraries exist with these approaches that have been battle-tested by use in millions of projects. They work, they are reliable, and they are unlikely to have a mistake. And they are so common that they are probably part of the standard library in most languages and don’t even incur the overhead of a dependency to import.
Such a task is far removed from the day to day work of a software engineer. This is well-known and I think pretty uncontroversial.
What the job actually is (non-exhaustive, but representative):
Taking ambiguous requirements and turning them into solutions and products
Communicating project status clearly and regularly to stakeholders
Writing and reviewing design documents
Reviewing code
Routine coding tasks like adding endpoints, updating state in a database, adding metrics or alerting, creating a new data pipeline
Mentoring other engineers / being mentored by others
Improving the operational characteristics of a system
Extending unit or end to end test coverage
Responding to incidents, outages, failures or bugs by triangulating logs, metrics and code
Deciding what not to build or at least asking questions in this vein
Updating, improving or extending code that someone else has written — maybe years ago with a very different mental model, constraints, and motivations
Negotiating scope
Choosing an appropriate technology or language for a feature or project
Interviewing candidates
Attending various meetings
Supporting internal customers
Creating or improving runbooks and internal documentation
Gathering feedback on technical decisions that have clear tradeoffs and collectively assessing them
Sunsetting a service or project that has outlived its usefulness
Informing product managers about the technical effort to achieve a goal or other efforts to assist with planning
Feeling really dumb some days, or if not dumb, very frustrated with the inability to make progress in particular contexts
Feeling deeply satisfied at having learned something new, accomplished a big goal, or cleverly simplified a complicated thing
Almost never on this list is novel or difficult algorithm implementation. Even writing code from scratch rather than adding to existing code bases is relatively uncommon (at least ay larger companies; I know this is different in early stage startups).
Difficult algorithmic questions like those sometimes asked in interviews do not represent the skills needed to do the tasks on this list. I understand that they are intended as a proxy, but such a crude proxy they are. They are likely the result of large tech companies being able to afford huge numbers of false negatives — excluding large swathes of capable engineers. If you can set an arbitrarily high bar and there continues to be a sufficient supply of people who can pass it, it’s possible to continue this system.
The one thing I’ll say as a positive of this approach is that while I’m hesitant to say it’s created a meritocracy, it is notedly anti-credentialist. My background is in philosophy rather than computer science and this has never been a deterrent. I do think this is both unusual and a positive. So I’d like any proposed solution to preserve at least this characteristic (selfishly, if nothing else).
It’s important to note that of course coding proficiency helps with the job. Of course. But it’s one small decision after another every day for years. It’s consistency, diligence, thoughtfulness, curiosity not brilliance, memorization and hyper-specialized practice.
How can I simplify this? Hmmm, this passes over the data twice, but it’s more readable and the data is small, so that seems better. Is an Int here ok, or does the domain space require a Long? What are the query patterns on this data and what indexes do we need? How did this code ever work? What metrics do we need here? How should we format the error message we’re logging here? Oh that map you wrote should be a foreach since it’s just side-effecting.
It’s mundane. It’s knowing a bunch of conventions, standards, and idioms with enough judgment to know when they should be ignored. It’s a team sport.
Exacerbated Problems
(source)
This chart shows o3 and o4-mini in the top 200 in the world in competitive coding. With an 800 point jump from o1. If the next release has a similar bump in capabilities its ELO would be ~3500, which would put it into the top 50 . And one more leap would eclipse Gennady Korotkevich (tourist on codeforces), the upcoming Garry Kasparov or Lee Sedol of competitive coding. We’re going to speed run 1996 Deep Blue Match 1 to 1997 Match 2 to unbeatable Stockfish in potentially the next 6-12 months.
This bears repeating and emphasizing. At competitive coding the models are already much better than me and you. Shortly, they will be better than any human is. And by better, it’s important to note that I don’t mean just a bit better. But about as much better as Magnus Carlsen is at chess than I am. Which is…a lot.
Competitive coding is commoditized. It’s going the way of the calculator and spell check.
And for context, Codeforces and other competitive coding questions are generally substantially harder than leetcode questions. So if competitive coding is solved from an AI perspective, so too is leetcode.
Why then are software companies still hiring software engineers?
Because software engineering is almost orthogonal to competitive coding. An analogy that might do some work is here a spelling bee champion and winner of a literary prize. Hopefully the above section on “what the job actually is” convinced you of this. If it didn’t, we probably have too different of a world view for us to agree on this. But understanding the clear points of genuine disagreement on an issue is under-rated, so if that’s all that has been accomplished, that’s still valuable.
Competitive coding is to software engineering as spelling bees are to literary writing.
It’s worth calling out some of my biases here. I’m rarely, if ever, the best pure coder on a team. And it’s not an area where I’m actively investing in much because I think the diminishing returns are substantial. This was a broad trend for me even before AI began changing everything. There are many areas where thresholds matter more than maximization and I’m convinced that for the majority of software engineering jobs, this particular subset of coding ability is one of them.
A further complication though is the following:
Unlike competitive coding, the SWE-Bench Verified Software Engineering benchmark is intended to capture more holistically tasks that a software engineer actually needs to do. It’s still not that representative of the job, but is overall directionally correct. More benchmarking in this continued spirit is of high value.
The general picture this paints seems relatively clear to me. Competitive coding is virtually solved and rapid progress is being made on more representative software engineering tasks. Someone will come out with a great benchmark that is more and more similar to what software engineering is day to day — initially the models will be quite bad at it. Hurray, an unsaturated benchmark. But then, rapid progress will be made again.
While the main argument so far has been anti competitive coding interviews, this anticipation of substantial progress on more relevant software engineering tasks is ultimately an even more important component of the problem and deeply informs the suggestions below.
A proposed path forward
Before digging into some potential concrete suggestions here, let’s try to uncover what problem we’re trying to solve and also why it’s so difficult right now. Let’s start with some assumptions:
We’re concerned with full time software engineering jobs (not temporary contractors) where the business intends to employ them for 2+ years
Changing an interview process at the company takes at least a month
It currently takes about 90 days from posting a listing to having a engineer accept an offer
There’s a roughly 2-4 week lag between accepting an offer and starting a new position
Onboarding to the new position takes roughly a month (this is a pretty optimistic scenario for a lot of companies)
If you start changing your process right now for a position you currently need filled, that engineer will be onboarded to the new team in about 6 months. So, you want to be thinking about how this person will be helpful to the team in a window of 6-30 months from now. Given how hard it is even for highly engaged people to stay on top of the frontier capabilities of AI, we should probably add another 3-6 months of lag in understanding. If this plausibly reflects your situation, you should be projecting your current understanding of AI models almost an entire year into the future when thinking about hiring. A year is an eternity in the current AI landscape. Critically, this also only impacts the time around their start date and first project — it neglects the longer landscape of their continued years of employment.
Why does all this matter so much? It highlights what appears to me to be one of the consistently largest mistakes in thinking about the impact of AI. Most people aren’t skating to where the puck will be and they aren’t even skating to where the puck is, they’re skating to where the puck was 3 months ago (please forgive the sports analogy).
I admit that this is a conceptually difficult task, but we must try. Here are some concrete examples from my experience that illustrate the difficulty here:
I was an early adopter of in-IDE Github co-pilot chat — I thought it was great. Then I became frustrated by its lack of ability to absorb the context that was needed — frequently getting worse results than just cutting and pasting into a model directly (my expectations change so rapidly). I went from using this consistently throughout the day as part of my regular workflow to less than once a day.
A few months ago, Deep Research came out and I thought it was amazing, but with how good o3 is, I rarely use it anymore (I still think it’s great for its specialized use case).
A couple weeks ago, I started using codex-cli and think it’s great.
Two days ago, I was having dinner with a friend and pointed out one of the limitations of codex-cli as how inefficiently it searches large text files, but that I was sure they’ll fix that. Yesterday, OpenAI shipped an update that now by default uses a `codex-mini-latest` rather than `o4-mini` model that in conjunction with some other updates appears to fix exactly this problem.
I do not expect to be regularly using codex-cli in 3-6 months. I just think there will be something better. I mean, OpenAI did release Codex just yesterday…
With that framing in mind, we can get to what we should be looking for in our near future software engineers. It can’t be experience with using specific tools, they will be outdated too quickly. As argued above, it shouldn’t be algorithmic wizardry. So what should it be?
As a valuer of caveats, here’s one more. I’m much more confident that we need to rethink this problem holistically and that the status quo is woefully insufficient than I am about my particular recommendations and ideas here. Ideally, this will be part of a larger conversation — what’s crucial is that we collectively have that conversation.
The problem, ultimately that we’re trying to solve, is what characteristics and skills will be consistently helpful across the ever evolving AI landscape. What abilities are invariant to the shifting tools and capabilities?
The top of my list is actually about temperament and disposition: curiosity, engagement, growth mindset, adaptability, creativity — maybe grit and pragmatism. I believe these traits will quickly eclipse the importance of specific knowledge. A currently great engineer that is completely uninterested in adopting AI tooling will both (1) soon no longer be a great engineer and (2) be a strong no from me. We’re moving up abstraction layers at a completely unprecedented rate. It’s important not to miss the train leaving the station.
Next up is AI fluency and taste. While the model capabilities and interfaces change quickly, there does seems to be a pretty large gap in what people are able to get out of the models. Have they spent enough time working with any AI systems such that they know how to provide sufficient context? Or redirect when the output isn’t what they want? Have they worked with enough different systems to understand the strengths and weaknesses of them? Can they turn whatever theoretical AI knowledge into concrete results?
Rounding this out are traditional software engineers skills and attributes that have a longer time horizon. Things like system design, problem disambiguation, stakeholder communication, debugging complex distributed systems and incident remediation.
Concrete Recommendations
Thanks for sticking with me or pragmatically skipping ahead. I’ve start an initial repo that was largely based off this o3 response to this question. It’s not good, but it’s a start — very much welcome collaboration there. I aspire to update it with some of the more fleshed out thinking here.
Based on the sketch above, we have the following categories: mindset, AI fluency, and higher level traditional software engineering skills. I’m hesitant to propose any weightings here because hyper local factors may dominate and they may also change rather quickly. I have mostly been making a critical assumption: the company is actively supporting efforts in adopting AI tooling. There will be strong external forcing functions to make this more and more the case. Some firms will be able to hold out longer than others, but I’ll be pretty surprised if market forces aren’t quite strong here in the medium term.
Mindset
A lot of behavioral questions already try to get at some of the key ideas here, so there is definitely some prior art. The main thing I want to advocate here is that however much you currently value this it’s just simply more important now. Part of why I wouldn’t hire an otherwise great engineer with no interest in AI tooling right now is not just the productivity of that engineer in the longer term, but also the cultural gravity that they will have. Great engineers get a lot of respect from other engineers and their voices carry a lot of weight. One bad hire in this area could slow adoption and use of AI tooling substantially for an entire team or org.
Never has growth mindset mattered more. And exactly what someone knows right now has never mattered less.
AI Fluency
Two ideas here.
One is that a take home assessment could become much more accessible. Often I’ve seen four hour windows used, which is a substantial time commitment for many folks and challenging to fit into a modern schedule. It seems entirely plausible to cut this to two hours and have similar expectations (or maybe even increased expectations). Increasing accessibility here and focusing on the output a person can produce rather than how they produce it seems like a nice win.
An equity concern: it’s critical that applicants without access to the best AI systems are able to compete on equal footing. Companies should provide API keys so that candidates can practice with these systems as needed beforehand and then also use them for the take home itself.
Take home assessments could still be greenfield projects, but with the ability of AI to generate code, it should dramatically lower the burden on the hiring engineers to produce more realistic scenarios. Things like add these three features, fix these two bugs (and bonus points for finding one or more bugs that aren’t mentioned).
Also, if the tools have a session log (such as `~/.codex/history.json`) have them include that as well — or otherwise provide a user generated log of the process. Getting insight into how someone prompts the models could be a really valuable signal (although we’ll all be terribly calibrated on this at first). And similarly, did they spend time to set up a good context for the project (in a codex.md file for example). In order to be a bit better calibrated on this, I recently created Difficult Coworker Bench and primarily time boxed to about two hours of effort as of this post (I might continue to iterate on it in the future).
The other idea is to do a pairing session on a similar problem as above, but with the full expectation that the candidate can use whatever AI tools they want. Seeing the realtime interaction with the tools and hearing the candidates thought process will be valuable. Are they completely dependent on model output? How do they evaluate the output? How efficient is their work loop? Like above, I think having engineering teams invest in non-trivial environments that more accurately represent day to day tasks will be much better than blank slate efforts like a leetcode or codeforces problem.
High Level Software Engineering
System design interviews still seem of critical importance here. Part of why, is that in addition to assessing industry experience, pragmatism, and communication they also are a great insight into how the candidate thinks about tradeoffs. Thinking in terms of tradeoffs feels like a skill that will be valuable for a long time. It also highlights things like problem disambiguation.
I’m pretty uncertain what role that AI should play here. I do think a quick prompt:
“I’m looking for a db that is highly available, affordable and favors a high number of reads to writes as an access pattern — I’d rather have a cloud native solution, but am open to hosting a great fit as needed.”
Seems fine. The signal is in how the candidate specifies the characteristics they are looking for. For example a better query would have included whether they were looking for a relational db, key-value store, or document store (but AI will remind them of these distinctions…). It doesn’t matter much if they don’t happen to know all the latest cloud offerings.
And more complex usage is probably ok as well. Similar to hiring teams creating more realistic code based assessments, they could similarly use AI to generate realistic architectures. Rather than creating a system from scratch, which happens less often, the candidate would be asked how to assess and improve the existing system. With the limitations of the current system in mind, how would add this functionality? Can it be done entirely with the existing components? If we add a new component what should it be?
The tradeoffs conversation could be quite robust here.
At the point where these skills are no longer useful, I’m not sure what being a “software engineer” will even mean. I do worry about a longer term future where all jobs collapse into a similar feeling of managing and orchestrating AI models to do tasks that we don’t have great insight into. But, I also think such a future will be pretty transitory, because how much longer would it be then before the models are better still at managing and orchestrating?
Code Fluency
As mentioned in the AI Fluency section, one of the upsides about the role of AI in hiring is the possibility for interviewers to much more easily create realistic scenarios to assess things like debugging. I think you could potentially fit many of the following into an hour assessment:
debug a failing test
add a new feature
refactor some code
add more tests
maybe even review a patch against the code they’ve just spent some time working in
Closing Thoughts
To re-iterate, my strongest conviction is that change is needed here. I’ve tried to gesture at some concrete ideas and opinions that might make things better. I hope for us all that people on the ground close to these issues are empowered to experiment and report back what works and doesn’t work.
The best time to have made changes to hiring practices was 6-12 months ago. The second best time is now.
This isn’t the most important issue with AI and software engineering, but it does feel like a neglected topic. And hiring is the gateway to labor markets, so understanding the impact on hiring is perhaps useful to thinking about impact on labor more broadly. In particular, the uneven impact of AI across tasks and responsibilities within a traditional role make it challenging to understand what that role should even look like going forward. The general contention here is that the strongest pre-AI software engineers are unlikely to be the strongest post-AI software engineers. The weighting of required skillsets and characteristics are simply too different.
I’ve mostly been focused on hiring at mid-sized and large tech companies. While the timelines for are hiring are shorter at startups, there is a counter balancing disproportionate impact of AI on smaller companies:
easier to adopt new tools and processes
less legacy code to navigate
dramatically fewer engineers so output per engineer is essential
In addition to any comments about this industry in particular, I’m also very curious to hear from those that think some of these ideas might generalize across industries. The uneven impact of AI makes this difficult to anticipate and the more cross industry sharing of insights the better.