In some cases, workers are also being asked to automate the parts of their jobs they enjoy most, Hinds said on the podcast, pointing to customer-service employees who enjoy building relationships but are increasingly expected to supervise AI agents instead.
"That's what gives you joy and meaning at work," she said. "That is very dangerous."
What's a 20% productivity gain if I constantly feel deflated by work that used to energize me? That's going to give back the productivity gain and more, while also decreasing my quality of life.
When I was given a semi-ultimatum "use AI or get fired" kind of thing for writing code I had a brief bout of depression/sadness. Whereas my friend doesn't care/says "I get paid to not work". I have gotten past it, now I'm just like, I'll do what I need to do to get paid since unfortunately I'm in a lot of debt so I need this job. I learned to code in 2013 so I like typing the code myself but now it seems like a waste of time. I still write my own code for myself/hardware hobby.
Where did the 20% number come from? I’d argue it’s way more than that (or variable, i.e. dependent on who’s using it/how it’s being used/what it’s being used on).
Having said that, the number, to me, doesn’t even matter. You could replace that with 200%, and it’d be just as true.
6 hours a week is low, unless its the average spread across industries. I think I spend more time in Claude Code via the CLI versus any other app I have on my laptop.
Like others said, the frustration is when it gets something so wrong you just think "wow, how'd you mess that up?" but when it gets it right its kind of nice. I also dont like that I basically tell Claude what to do, and then either go to busy work or waste time on the internet.
I kind of enjoy exploring black boxes, trying how different inputs are mapping to differences in outputs. It's kind of like hacking. The problem is, they keep altering the box.
My challenge has been trying to manage my higher-level context. I've gotten a pretty good setup where I have project-level orchestrator agents that can spin up workers to implement tasks with minimal oversight, and the resulting work is usually quite good (especially after I give it the mandatory "make the comments less verbose" refining, etc.). But that means I'm doing even more context-switching. I've gotten to the point where I have a half-dozen draft PRs that just need my review before I tag my colleagues, and trying to dig up the context from all of those tasks can be paralyzing.
This kind of reminds me of an article that I saw on HN ages back, there's like a subset of office workers who automated their Excel jobs, and just show up to work, read books, and do literally anything, while Excel does their work for them, and they collect their paycheck.
i've seen a number of articles claiming things like "devs self report they'er +x% more productive with AI, but actually they're -y% LESS efficient!". and i think that this is explanation for why.
as a boss (or researcher) i'm going to measure productivity based on amount of output per hour that i'm paying you; as a workers, i'm going to measure productivity based on amount of output relative to the amount of effort i'm putting in.
so what may be happening is that bosses see that output is at 80% (productivity down!) but workers see that they can give that 80% output with 40% effort (productivity up!).
Not sure among devs, but I do know that in other positions in typical corporate bureaucracy, people have a propensity to not report their own automations or productivity gains upward, because the reward structure isn't there.
Early on in my days as a sysadmin, I automated a ton of my role when the rest of the team was still doing ClickOps. The reward for doing so was more work and expectations without the additional pay increase to justify my new found productivity. That happens all over the workforce, and so people will just keep it to themselves. I learned my less on at that first job real fast that if I'm able to have the same, or greater output, for half the time, I keep that to myself so I can use the automation to free up my own time instead of have it filled by the company.
I wonder how much of that is happening now with AI in non-technical roles.
> so what may be happening is that bosses see that output is at 80% (productivity down!) but workers see that they can give that 80% output with 40% effort (productivity up!).
So why is it that the bosses are the ones that are so enthusiastic about adoption?
I've found that setting good guardrails, and running in a sandbox so that the agent doesn't keep asking tedious permission questions, makes things go a LOT smoother.
Generally, I spend anywhere between 15 mins and an hour setting things up (depending on how well the project is set up for AI work), and then set the agent going, coming back in a half-hour to an hour to check its progress. Generally, the tooling keeps it honest (for golang, forbidigo is AWESOME). 80% of the questions the agent asks me require a lot of thought. 20% of what it does needs correction.
The other thing to remember with LLMs is that they are NOT human, and won't react in a human way. So you'll see strikes of "brilliance" followed by the absolutely bizarre. But good guardrails keep that to a minimum.
> sandbox so that the agent doesn't keep asking tedious permission questions
> 80% of the questions the agent asks me require a lot of thought. 20% of what it does needs correction.
I've found even the permissions questions give me veto power over fruitless lines of exploration, especially in planning mode. For instance, it wants to use tools I don't have installed to access information that I have made available elsewhere? I get a chance to override this decision by declining the permissions check and redirecting it. Feels tedious, but helps me understand what information sources are influencing it. I head off a lot of bugs this way.
I never let it go into planning mode, other than to output a plan file that I can audit before giving it the go-ahead to implement. After that I don't want to be bothered, so --dangerously-skip-permissions keeps all but real questions out of the loop, and I can do something else while it works rather than babysit.
Your experience pretty much mirrors my own. I hate to be the 'they're holding it wrong' guy but there's certainly a lot of people out there that have no real idea how to effectively leverage AI.
That’s a problem with the tool not the people. AI is marketed literally as writing one sentence and having some app perfectly output. Just check any of the landing pages for Claude code or codex or GitHub copilot…
AI should be assisting us, instead it's doing the job and it's us being an assistant to it. This is a monumental shift that people seem to be missing in how knowledge working is changing and it's going beyond mere coding.
Guardrails, prompts, whatever, it's us helping it doing the job, not the other way around.
Opus 4.6 was the last genuinely good assistant LLM, but since then it's quite clear that the training/reinforcement is focused "given prompt -> do task" so it's behavior is more and more about doing it itself, not helping you. If you try to use it as an assistant it just sucks and is perma wired into finding the solution. Many times I want it to help me investigate, and his answer will still be focused on the fix, not answering my questions.
4.7 first, 4.8 later and fable are absolute disasters as assistants.
Fable in particular is so "intelligent" that it will push with very strong and intelligent takes even if it is completely wrong.
Wow... Our experiences have been very different, then. I've found each upgrade of Opus to be a noticeable improvement in its complex reasoning and delegation capabilities over its predecessor.
To me, this feels in many ways like a technical manager or team lead's job, where I guide the process along using my knowledge and experience, and then let the agent fill in the rest (to the best of its ability).
The agent can't really learn from its mistakes (at least, not without consuming precious context), so I apply a blameless postmortem process, updating the guardrails whenever it goes astray in the same way more than once.
And really, I'd rather be contemplating the more difficult and interesting questions of architecture, environment, ergonomics and market fit, so it suits me fine.
I think this is just a misunderstanding of how most technology has always worked?
Consider what is happening in most construction sites. The heavy work is absolutely from the technology on site. But without people there to oversee it and keep it working, it would fail.
And that is almost certainly true at any industrial site. Indeed, look up videos of high tech looms. A large portion of the technology added to them are so that the operators can locate the fault and fix it.
The problem (okay, one of the problems) with renting other people's models is, as you mentioned, that they can and will change out the model without notifying you ahead of time, and you don't always get to control which model you use. (They might decide to retire it, and you won't be able to get it back if they do).
Which is why (well, part of why) I think the long-term trend will be towards self-hosting models. Right now the frontier models are far enough ahead of the self-hosted ones that there are lots of people willing to pay by the token to rent someone else's model, because they get more value for money from that than from self-hosting models.
But the frontier companies won't be able to keep up their current levels of expenditure forever. At some point the investors are going to say "Hey, so, um, when am I going to see some return on my investment?" and then the current subsidized subscriptions (including the one my employer uses) are going to go away, much like what happened with Copilot this month.
And then the locally-hosted models are going to suddenly look like a more attractive picture. Because where you might have been willing to spend $100/month/employee to rent time on models in someone else's data center, you might suddenly balk at spending $500/month/employee. You might say "Hey, you know what? A $50,000 up-front capital investment is only, what, one month's worth of subscriptions for our 100 employees? Yeah, okay, I'll approve the hardware purchase. Get that self-hosted model set up and then we'll cancel the subscription and switch over."
Not everyone is going to do that. But once the locally-hosted models are good enough, the first few people who do so and report success are going to start a snowball effect. And it will likely be driven by money first, but it will also have the effect, that people will slowly discover, of meaning that you can better predict the model you're using. It will continue to work the same way next year that it is working this year; or if it doesn't, it's because you chose to install the new version.
And when that happens (I'm saying "when", not "if" because although it might take some time, I think it's inevitable in the long run), the frontier-model rental companies are going to struggle to stay afloat. Except for the ones who saw this coming and transitioned to a non-subscription income source somehow (maybe by selling licenses to self-host their frontier models for $$BIGNUM), or who have some other revenue stream besides renting out models.
Well... as a human software engineer, I've been the one with very strong, intelligent, completely wrong takes. The question is, are the LLMs improving faster than you can improve a junior dev? And is their ceiling as high?
I just started using Claude Code for my work as a sysadmin. For my work, it's great. I don't need to wrestle with MySQL joins, claude gets even the most complex ones right WAY faster than I would. Same with new Terraform stuff. Things that would have taken me a day are cut to less than an hour.
So for my work, it's made me much better at my job. Much faster and more accurate.
I spend at least 6 hours a week arguing with bots owned by other teams, as I’m unable to reach a human before I bypass their bot. 10k person company, clients are paying for my time.
I don't see a lot of talk about how AI development breaks the old feedback loop of write code, watch it run, change it, repeat. I really hate sitting around waiting for the agent to get done planning, reading the plan, then waiting for the agent to get done coding. It's those 5-10 minute windows when its working that really sap my patience and suck all the fun out of our jobs. Writing code by hand is just more fun.
This is something that I don't see discussed a lot in these conversations, but its true for a ton of folks.
I didn't end up with a career in tech because I wanted to tell a bot to do the fun part of my job for me, leaving me only with the boring tedious parts. I didn't sign up to be a full time code reviewer, and I certainly never wanted to be a manager, yet alone a manager of bots.
It also can't help but spark feelings of "Why am I getting paid 6 figures for this??" and that makes me nervous for the future.
I imagine the engineers and assemblers in factories pre-assembly line felt the same when things started getting automated there. There's an element of craftsmanship that gets taken away as the product moves from being artisanal, hand crafted to mass produced.
I wonder if its too late for me to pivot to hardware
Understanding what is going on with AI productivity is … frustrating to say the least.
The best I can say is that genAI is a self reported a 20% efficiency boost, and for a very (very) small group of people, it’s maybe a 2-3x boost. (And if you are at a frontier lab, you go fly into the big bucket of exceptions)
At this point, for most use cases, AI productivity is either the equivalent of giving people 3D printers, and seeing little benefit, or signing up for an outsourcing service, just without the development of human capital anywhere.
I think it depends on how you measure the boost. If you are talking about generating a first draft then yes, the boost is there. If you’re talking about completing the project in all well tested and architected aspects, then overall there really isn’t a boost.
6 hours of debugging and docs reading is not equal to 6 hours of prompt fiddling. The return of value beyond the few fixes applied will be almost nil from the fiddling.
Yeah, Amazon warehouses are just the same. Humans are only used for tasks beyond the comprehension or physical ability of a machine at that point in time.
The problem is, we haven't had the debate on a societal level if we want to go the star trek route (aka, we give our darn best to automate everything so that humans have the time to do whatever they want) or the realcommunism route (we ward off automation so that we have jobs for people).
The result of that debate not having been made is the third possible outcome - rabid capitalism automates everything as soon as it is profitable and lays off the humans, focusing on getting higher margins out of less people if need be; the best example for that IMHO is Disneyland or Vegas going on ridiculous nickel-and-diming tours. In the end however, there will be no one left any more who has employment and we'll be in for quite the riots.
I could care less about bot sitting (haven’t we always written our own automation?), but it’s botsitting the unverified slop that people send you that fuels frustration. I thought I worked with competent people who respected me
Our product lead/manager recently sent me an AI generated PRD (complete with a Claude Code spec!) to build core feature which we have had for over 2 years (and is the most used feature by our customers).
I just can't imagine tanking my trust with my coworkers by doing something like that.
So we're now in this world where everyone is instantly 10x more productive at turning their thoughts into code. Now, think about the coworkers you've had that are middling to mediocre. Do you want them to have a tool that makes them 10x more productive?
That's what I wonder about, what happens to all those folks.
Your coworkers haven't changed. What changed is that people can hand off work they never had to think through themselves. So you don't know what they checked and you don't know what you need to. You just have to read the whole thing.
In some cases, workers are also being asked to automate the parts of their jobs they enjoy most, Hinds said on the podcast, pointing to customer-service employees who enjoy building relationships but are increasingly expected to supervise AI agents instead.
"That's what gives you joy and meaning at work," she said. "That is very dangerous."
What's a 20% productivity gain if I constantly feel deflated by work that used to energize me? That's going to give back the productivity gain and more, while also decreasing my quality of life.
Where did the 20% number come from? I’d argue it’s way more than that (or variable, i.e. dependent on who’s using it/how it’s being used/what it’s being used on).
Having said that, the number, to me, doesn’t even matter. You could replace that with 200%, and it’d be just as true.
Like others said, the frustration is when it gets something so wrong you just think "wow, how'd you mess that up?" but when it gets it right its kind of nice. I also dont like that I basically tell Claude what to do, and then either go to busy work or waste time on the internet.
It may be fun to look at inputs and outputs, but it's not hackable and trying to map one into the other is more like astrology than a science.
(I spent too long by the horse racing track)
as a boss (or researcher) i'm going to measure productivity based on amount of output per hour that i'm paying you; as a workers, i'm going to measure productivity based on amount of output relative to the amount of effort i'm putting in.
so what may be happening is that bosses see that output is at 80% (productivity down!) but workers see that they can give that 80% output with 40% effort (productivity up!).
Early on in my days as a sysadmin, I automated a ton of my role when the rest of the team was still doing ClickOps. The reward for doing so was more work and expectations without the additional pay increase to justify my new found productivity. That happens all over the workforce, and so people will just keep it to themselves. I learned my less on at that first job real fast that if I'm able to have the same, or greater output, for half the time, I keep that to myself so I can use the automation to free up my own time instead of have it filled by the company.
I wonder how much of that is happening now with AI in non-technical roles.
So why is it that the bosses are the ones that are so enthusiastic about adoption?
Generally, I spend anywhere between 15 mins and an hour setting things up (depending on how well the project is set up for AI work), and then set the agent going, coming back in a half-hour to an hour to check its progress. Generally, the tooling keeps it honest (for golang, forbidigo is AWESOME). 80% of the questions the agent asks me require a lot of thought. 20% of what it does needs correction.
The other thing to remember with LLMs is that they are NOT human, and won't react in a human way. So you'll see strikes of "brilliance" followed by the absolutely bizarre. But good guardrails keep that to a minimum.
> 80% of the questions the agent asks me require a lot of thought. 20% of what it does needs correction.
I've found even the permissions questions give me veto power over fruitless lines of exploration, especially in planning mode. For instance, it wants to use tools I don't have installed to access information that I have made available elsewhere? I get a chance to override this decision by declining the permissions check and redirecting it. Feels tedious, but helps me understand what information sources are influencing it. I head off a lot of bugs this way.
AI should be assisting us, instead it's doing the job and it's us being an assistant to it. This is a monumental shift that people seem to be missing in how knowledge working is changing and it's going beyond mere coding.
Guardrails, prompts, whatever, it's us helping it doing the job, not the other way around.
Opus 4.6 was the last genuinely good assistant LLM, but since then it's quite clear that the training/reinforcement is focused "given prompt -> do task" so it's behavior is more and more about doing it itself, not helping you. If you try to use it as an assistant it just sucks and is perma wired into finding the solution. Many times I want it to help me investigate, and his answer will still be focused on the fix, not answering my questions.
4.7 first, 4.8 later and fable are absolute disasters as assistants.
Fable in particular is so "intelligent" that it will push with very strong and intelligent takes even if it is completely wrong.
I have never disliked our job more.
To me, this feels in many ways like a technical manager or team lead's job, where I guide the process along using my knowledge and experience, and then let the agent fill in the rest (to the best of its ability).
The agent can't really learn from its mistakes (at least, not without consuming precious context), so I apply a blameless postmortem process, updating the guardrails whenever it goes astray in the same way more than once.
And really, I'd rather be contemplating the more difficult and interesting questions of architecture, environment, ergonomics and market fit, so it suits me fine.
Consider what is happening in most construction sites. The heavy work is absolutely from the technology on site. But without people there to oversee it and keep it working, it would fail.
And that is almost certainly true at any industrial site. Indeed, look up videos of high tech looms. A large portion of the technology added to them are so that the operators can locate the fault and fix it.
Are you getting LLMsplained? :)
Which is why (well, part of why) I think the long-term trend will be towards self-hosting models. Right now the frontier models are far enough ahead of the self-hosted ones that there are lots of people willing to pay by the token to rent someone else's model, because they get more value for money from that than from self-hosting models.
But the frontier companies won't be able to keep up their current levels of expenditure forever. At some point the investors are going to say "Hey, so, um, when am I going to see some return on my investment?" and then the current subsidized subscriptions (including the one my employer uses) are going to go away, much like what happened with Copilot this month.
And then the locally-hosted models are going to suddenly look like a more attractive picture. Because where you might have been willing to spend $100/month/employee to rent time on models in someone else's data center, you might suddenly balk at spending $500/month/employee. You might say "Hey, you know what? A $50,000 up-front capital investment is only, what, one month's worth of subscriptions for our 100 employees? Yeah, okay, I'll approve the hardware purchase. Get that self-hosted model set up and then we'll cancel the subscription and switch over."
Not everyone is going to do that. But once the locally-hosted models are good enough, the first few people who do so and report success are going to start a snowball effect. And it will likely be driven by money first, but it will also have the effect, that people will slowly discover, of meaning that you can better predict the model you're using. It will continue to work the same way next year that it is working this year; or if it doesn't, it's because you chose to install the new version.
And when that happens (I'm saying "when", not "if" because although it might take some time, I think it's inevitable in the long run), the frontier-model rental companies are going to struggle to stay afloat. Except for the ones who saw this coming and transitioned to a non-subscription income source somehow (maybe by selling licenses to self-host their frontier models for $$BIGNUM), or who have some other revenue stream besides renting out models.
So for my work, it's made me much better at my job. Much faster and more accurate.
This is something that I don't see discussed a lot in these conversations, but its true for a ton of folks.
I didn't end up with a career in tech because I wanted to tell a bot to do the fun part of my job for me, leaving me only with the boring tedious parts. I didn't sign up to be a full time code reviewer, and I certainly never wanted to be a manager, yet alone a manager of bots.
It also can't help but spark feelings of "Why am I getting paid 6 figures for this??" and that makes me nervous for the future.
I imagine the engineers and assemblers in factories pre-assembly line felt the same when things started getting automated there. There's an element of craftsmanship that gets taken away as the product moves from being artisanal, hand crafted to mass produced.
I wonder if its too late for me to pivot to hardware
It's actually kinda pleasant, especially when I consider all the tickets I'm not excited about doing.
This is all normal. It’s also well worth the time spent learning
The best I can say is that genAI is a self reported a 20% efficiency boost, and for a very (very) small group of people, it’s maybe a 2-3x boost. (And if you are at a frontier lab, you go fly into the big bucket of exceptions)
At this point, for most use cases, AI productivity is either the equivalent of giving people 3D printers, and seeing little benefit, or signing up for an outsourcing service, just without the development of human capital anywhere.
6 hours of debugging and docs reading is not equal to 6 hours of prompt fiddling. The return of value beyond the few fixes applied will be almost nil from the fiddling.
Welcome to the factory!
The problem is, we haven't had the debate on a societal level if we want to go the star trek route (aka, we give our darn best to automate everything so that humans have the time to do whatever they want) or the realcommunism route (we ward off automation so that we have jobs for people).
The result of that debate not having been made is the third possible outcome - rabid capitalism automates everything as soon as it is profitable and lays off the humans, focusing on getting higher margins out of less people if need be; the best example for that IMHO is Disneyland or Vegas going on ridiculous nickel-and-diming tours. In the end however, there will be no one left any more who has employment and we'll be in for quite the riots.
I’ve been told before.
I just can't imagine tanking my trust with my coworkers by doing something like that.
That's what I wonder about, what happens to all those folks.
Managers will be sure to tell you how much they respect you. Ask them if they respect the work and you'll get a blank stare.