The Rise of Token Leaderboards

From ‘A.I.-Washing’ Layoffs? + Why L.L.M.s Can’t Write Well + Tokenmaxxing — Mar 20, 2026

Excerpt from Hard Fork

‘A.I.-Washing’ Layoffs? + Why L.L.M.s Can’t Write Well + Tokenmaxxing — Mar 20, 2026 — starts at 0:00

The thing about AI for business, it may not automatically fit the way your business wor ks. At IBM, we've seen this firsthand . But by embedding AI across HR, IT, and procurement processes, we've reduced costs by millions, slash repetitive tasks, and freed thousands of hours for strategic work. Now we're helping companies get smarter by putting AI where it actually pays off, deep in the work that moves the business. Let's create smarter business. IB M. I just read the most heartwarming news this morning that I wanted to share with you, Kevin. What's that? The UK government has withdrawn a proposal to let AI companies train on copyrighted works after a backlash from artists like Dua Lipa. Did you see this? No. Dua Lipa said, don't start now with this AI. My sugar boo? She litigating Kevin. She's she's making some new rules, and she's saying we're not gonna train on my copyrighted works. Wow. And that's why she is a queen. And so do it, Lipa. If you were listening, we salute you. Yeah. Dua Lipa, you're a Dua Keepa. Period. Dua Lipa said. Artist writes. Wow . I'm Kevin Roos, a tech columnist at the New York Times. I'm Casey Nude from Platformer. And this is Hard Fork. This week, a big wave of tech layoffs is raising the question: has AI job loss truly begun? Then writer Jasmine's son is here to help us answer the question: why are chatbots bad at writing? And finally, it's token maxing time, why tech companies are building leaderboards to measure who is spending the most on AI . Well, Casey, for years now we've been monitoring for signs of an AI job apocalypse. Yeah, we've been monitoring the situation. It's true. And over the past few weeks, I think we've gotten some early indications that something is happening in the labor market, especially for tech workers. Yeah, we have certainly heard CEOs of companies announcing layoffs invoking AI as a reason that it is happening. And so that has uh gotten our attention. Yeah, so just a couple examples from the last few weeks. Last week, Atlassian announced a 10% reduction in its staff, about 1,600 jobs, that they said were going to help them fund further investment in AI and enterprise sales. That came on the heels of a big round of layoffs at Block, the financial tech company formerly known as Square, which said that it was cutting its staff by about 40% or about 4,000 jobs, uh, saying that they were shifting the way that they were working to use smaller and flatter teams. And then the big one that folks are expecting maybe as soon as this week is that Meta is reportedly poised to lay off 20% or more of the entire company. This was reported by Reuters last Friday, who said that their sources had told them that Meta was preparing to cut as many as 16,000 jobs, the largest layoffs at that company since late 22 or early 2023 when they laid off 20,000 people. So as of this recording, that hasn't happened yet that we know of, but I know that people at Meta are very on edge and are awaiting the further news about Meta, after this story came out, told Reuters that it was quote, speculative reporting. Which if you're not familiar with the language deployed by meta communication staffers, means this is happening, but we don't want to tell you it's happening yet. Correct. So, Casey, I want to hear what you make of these layoffs, but first we should do our disclosures. I work for the New York Times, which is suing OpenAI, Microsoft, and Perplexity. And my fiance works at Anthropic. So, okay, Casey, what do you make of the fact that all these companies are referencing AI in some way as a reason for their layoffs? Well, I think it's a little different at each company, Kevin. And I think we can make a decent case for and against the idea that AI is really driving the show at each of them. So maybe we should get into that. But at a highest level, I would say companies do continue to tell us now that AI is a significant factor in the reduction of these workforces. And sooner or later, I do think we're going to have to believe them. Yeah, I think this is the early warning sign for a lot of people, especially in the tech industry, who are, I think it's fair to say, going to be some of the first people to see their jobs change or disappear because of these new AI tools. But let's get into some of the specifics here. So Casey, let's start with Lassia and the first company I mentioned. Their CEO, Mike Cannonbrooks, said in a company blog post uh that the bar for what great looks like for software companies on growth, on profitability, on speed, on value creation has gone up. He said we are choosing to adapt thoughtfully, decisively, and quickly to drive durable, profitable growth. He claimed that AI was not replacing people, but he said it would be disingenuous to pretend that AI doesn't change the mix of skills we need or the number of roles required in certain areas. Yeah, so I take him at his word. It seems like he himself is trying to walk a middle path there, right? And sort of not denying that AI is a factor here, but also not saying like this is the only reason this is happening. I think some other context that is worth having is that Atlassian is one of the companies that could be part of what we've been calling the SAS Pocalypse around here, right? This is a company that makes tools for businesses. A lot of its products are essentially structured workflows, and there are those who believe that sooner or later you're just going to be able to code your own pretty cheaply. Now, maybe you will still choose to buy a product from a company like Alassian, but maybe you're not gonna be willing to pay nearly as much as you would before. And so the company's stock price has just been battered over the past year. And I think that has left them one, uh, hurting for cash a little bit, but two, and probably more importantly, looking for a different story that they can tell the stock market about what they're doing. And so today that story is we're gonna get rid of some of these workers and we're gonna figure out how to make our remaining workers more productive. So there's this term that's been floating around called AI washing, um, which is basically when a company wants to lay a bunch of people off or maybe they don't feel like they need as many people and I thought it was when a software engineer finally took a shower. And basically the thesis is like these aren't really layoffs about AI. This is just sort of a convenient excuse that these companies are using. Do you think Alassian qualifies as AI washing? Um, I would like to get a little bit more detail on exactly who they are laying off here, which is a detail that we do have about some of these other companies that helps us answer that question. So I don't know exactly how it is happening inside of Atlassian, but I think that their CEO was relatively straightforward as these things go in saying like it's a little bit about AI. It's not entirely about AI, but like, yes, keep your eye on AI. So to me, that just reads as honest, and so I'm gonna give them a pass. Okay. Let's talk about Block. Uh, Jack Dorsey, the CEO of Block, gave an explanation about their layoffs. He said, quote, we're not making this decision because we're in trouble. Our business is strong, but something has changed. I had two options, cut gradually over months or years as this shift plays out, or be honest about where we are and act on it now. I chose the latter. Casey, your take. So something to know about me and Jack Dorsey is I have a bit of a bias against him as a former Twitter user who misses that website dearly. At this point in 2026, I would not hire Jack Dorsey to run a lemonade stand. Okay. But if you want to talk about block specifically, this is a company that tripled its headcount from about 3,800 people in 2019 in what seems like just kind of classic, like inattention to what was happening in the business during pandemic era boom times, right? And I wonder if you saw this detail because it truly took me out, Kevin. Five months before they announced the layoffs, Block spent sixty-eight million dollars to fly eight thousand people to an in-person event with Jay-Z. Come on. Yeah. So that's the kind of famous attention to detail that has turned Jack Dorsey into one of the greatest visionaries in tech. So So look, is this about AI? Again, you know, what does Block really do? They have uh those little iPads at the coffee shop and then they have Cash App. Okay. How many people do you really need to run those products? Probably fewer than 10,000. Is that about AI? Uh, I don't know. Maybe if you squint, but again, this is a company whose stock price was cratering. They needed a different story to tell the market. And I do think you can make a case that AI will make the remaining workers more productive. So again, this is another one where it's like you could use AI to justify what's happening, but you also could just say this company has been mismanaged for a while now. Yeah, you could use AI washing or Jay-Z washing, which seems to be what they what they are doing here. Yes. So this did seem to have an effect on their stock price. In fact, the day after Jack Dorsey announced the layoffs, block stock stocks shot up 17%. It's gone down a little bit since then, but they're still up from where they were before these layoffs. And I think we should just say, like this is also a part of the equation here, right? These are companies, largely public ones, that have investors' attention. And right now, there's sort of this narrative power around AI, where if you seem like a company that is investing heavily in the AI tools and the AI way of working, your investors say, oh, that company is really forward-looking. They must have a plan for how to navigate this transition. And so I think there's sort of they're seeing the power in telling the story that all this is related to AI. Yeah, which by the way reminds me of like the peak of cryptomania when like some public traded companies would just add like a crypto term to their name and their stock price would shoot up by like forty thousand percent. Yes. It turns out that the public markets actually can just be tricked that easily. Yes. That would give me some relief if I was a CEO, just knowing that I could fool people like that. But anyways. So let's talk about the third large tech company that is reportedly conducting layoffs, uh, meta. We don't know exactly who or what teams are being affected by these layoffs, but uh this is a significant part of their workforce, and they seem to be saying Yeah. On a recent earnings call, Mark Zuckerberg said that, quote, projects that used to require big teams now can be accomplished by a single very talented person. And uh and we should also say that this cut is coming alongside this massive AI infrastructure investment, right? They're gonna spend $135 billion on capital expenditures this year. And even for a company of meta size, like that is real money, right? So I know that they're trying to be careful again, trying to not spook the stock markets too much. This is obviously the biggest bet in the company's history. And I think that making some substantial cuts are going to signal to the market like, hey, don't worry, we're not like completely losing our minds here. Like we're going to keep some of these expenses under control. Yeah, I think that's a really important point because what we're seeing here at some of these companies is that they are not actually sort of cutting costs in the aggregate by using these tools. They are just shifting the cost from human labor to AI. Right. They are plowing this money that they are going to save by laying off these thousands of people into the building of data centers and other AI infrastructure. And basically the bet they're making is these new AI workers are going to be faster, more efficient, maybe cheaper in the long run, maybe not, but they are going to be able to do the work that used to require many thousands of people. And that is a profound shift in the way that companies are talking about their workers. I recently talked to a venture capitalist who said that a lot of the AI startups that he sees the most AI native companies, are spending more on AI tools than they are on payroll. And that may be an outlier, but I think that is sort of where these companies believe that we are headed, where the majority of your expenses will not go to paying the salaries of human workers. It will go toward buying the AI tools and the tokens that your company runs on. Yes, I think that's absolutely the the bet that they're making. I also just think it is worth noting that this is still purely mostly speculative, right? Like in the case of Meta specifically, this is a company that has arguably been struggling when it comes to AI. They had to abandon their last model, Behemoth, because it wasn't very good. The Times reported last week that it's delaying the release of its latest model avocado because it hasn't been hitting its performance targets. It's apparently barely outperformed Gemini 2.5. What is this? Last March? Yeah, that model is really the pits. That's an avocado joke. That's very good. Thank you. So again, this is not as simple as saying they're able to cut 20% of their workforce because they've just made these massive gains. I'm sure there are individuals there who have made massive gains, but as a company, it still seems like it is somewhat mired in dysfunction. They just did yet another partial re-org of their AI teams. And that just always sort of makes me raise my eyebrows. Yeah, I I will say like one thing that's been surprising to me about this recent round of layoffs is that the companies that are making them are not the ones on the frontier, right? It is not the open AIs, the anthropics, the Googles. Those companies are not laying off people en masse because of these AI tools, which they are building and presumably have even better models than the ones they're releasing to the public. So you have to think that part of this is just companies that are sort of lagging behind their competition, saying, well, maybe if we just use a bunch of AI, it'll help us catch up. Yes, but also like OpenAI and Anthropic are much smaller companies than some of the ones that we've been talking about today, at least in number of workers, right? Like I think it it is interesting to think that Atlassian is like bigger than OpenAI in terms of the number of people who work there when you look at the you know relative like value of what what they're generating. DocuSign has seven thousand employees. There's no funnier sentence that is true in all of tech journalism. As somebody who who has a paid subscription for DocuSign that I that I truly resent paying for, get to work over there, people. Or get not to work. Get not to work. Here's another question that I would ask Kevin. Okay, so we're seeing a bunch of layoffs, like are these AI related or not? Does it actually matter if the effect on workers is the same, right? Like, you know, if you're the worker, like whether it's about AI or not, you're still out of a job. Yeah, and it's not clear to me what workers can or should be doing to sort of protect themselves against these layoffs. One uh person I talked to said, you know, they're um that they work at one of these big tech companies and they're like, well, there's just a lot of jostling and fear and anxiety right now. People don't know if they should be like using the AI tools a ton because then it shows that they're like getting with the program, or whether that just means that they're proving that their work can be automated. Like I think there's a lot of fear and suspicion and mistrust inside these companies right now. And and for good reason. Their executives are planning to lay them off. Yes, and by the way, I think at at least some of these companies, that is maybe not an explicit reason for these layoffs, but some of the executives there would see that as a uh positive byproduct, right? Because, you know, if you're like Mark Zuckerberg, you live through 20 the20 era. You had these restive employees that like wanted a lot of things from you. And they wanted to have a lot of control over what the company could and could not do and how it did it. And you know, I just know that executives over there really resented that sort of thing. And once Meta entered this new era of massive layoffs, employees over there did get really scared for all of the reasons that you would assume. They were like, oh God, like, you know, maybe I actually am gonna lose my job. And all of a sudden they got a lot more quiet. And you started to see a lot fewer protests over there. So I'm not gonna say that like these occasional mass layoffs are a way of like keeping the workforce in line, but I have noticed that it seems to be having that effect. Totally. And it makes me wonder whether something that I predicted was going to happen a year or two ago that did not happen, which is the sort of sudden and mass unionization of workers at these companies, may actually start to happen in the next year or two. I think one major difference between what's happening now at these tech companies and what has been happening for decades at manufacturing companies, car companies, you know, factory workers, is that those workers were by and large unionized. And so when the employers said, hey, we're gonna lay a bunch of you off, they were able to negotiate. They were able to say, hey, maybe instead of laying us all off, maybe you could find other jobs for us if our jobs are being automated, maybe we should be allowed to sort of retrain to do something else. And that was largely successful. There were still layoffs, of course, but not the number that we're seeing today at these tech companies. So do you think there's any possibility of that? Or is that just sort of a union fever dream? Here's what I will say. I cannot think of anything that would make Mark Zuckerberg more mad than a union of software engineers at Meta. And I think the software engineers at Meta should use that information how they will. You think that would make him more mad than getting booted a UFC fight? Absolutely. I think that probably just made him really sad. Well, there you have it. If you want to make Mark Zuckerberg mad that employees, sign your union c ard. When we come back, why aren't chat bots as good at writing as I am? Well asked Jasmine Sun . We've all been there. Your team's feedback is scattered across emails, chats, and sticky notes. It's a mess. But PDF Spaces in Adobe Acrobat gives you one collaborative workspace to streamline every file and comment. So if you need six departments to finally agree on a proposal, do that with Acrobat. Need to turn a mountain of feedback into one plan of action? Do that with Acrobat. Want to stop searching for files and finally get everyone on the same page? Do that. Do that. Do that with Acrobat. Learn more at Adobe.com slash The right technology can strengthen human judgment. That's why Deloitte brings together AI and data analytics with multidisciplinary teams who can help you connect the dots across your enterprise, from risk to operations to customer needs, so opportunities don't slip by and surprises don't spread. Because the smarter your systems, the sharper your instincts. That's how technology makes people better at what they do best. Deloitte Together Makes Progress. Learn more at Deloitte.com slash Together Makes Progress. When you use the trusted investing and savings app Betterment to help grow your money automatically, you have more time for new niche hobbies, like collecting miniatures. The joy that brings helps you sleep better at night and even motivates you to always use your PM moisturizer. Now you've got a dewy glow and a sense of balance to match. Not worrying where your money is growing. That's the Betterment effect. Get started today at Betterment.com. Investing in Volveris performance not guarante ed. Well, Casey, over the last couple of years, we've talked on this show about how AI models are getting better at so many things. They are getting better at coding, at competition math, at solving novel physics problems, mass domestic surveillance, autonomous weapons. And I think the story of the last few years in AI has been one of sort of rapid, steady progress. But these systems are still sort of jagged and they have flaws and weaknesses. And one place where they arguably haven't improved that much is in writing. Now that's our domain. Yes. At least that is the argument that Jasmine Sun made in the Atlantic this week. She is a freelance journalist. Her piece was called The Human Skill That Eludes AI. And it's her attempt to understand why, despite so much progress in all these different areas, the models of today don't seem to be writing anything particularly good or compelling. Yeah. And while I think the question of are LLMs good at writing is highly subjective and dependent on the use case. I do think Jasmine makes a really interesting technical case for why these models write the way they do. Yes. And we should say before we bring her in, Jasmine is a friend of mine. She has also been my researcher on the upcoming book that I'm working on. And I just think she's like one of the best people writing about AI today. She writes on her Substack Week bring on one of your enemies. Okay, let's bring her in Jasmine Son, welcome to Hard Fork. Thanks for having me. I'm excited. Hi Jasmine. So you wrote this great piece in The Atlantic this week about the human skill that eludes AI, and I want to start by challenging the subtitle of your piece. Why can't language models write well? Can't language models write well? So I do say in the piece that most writing period is very bad. And so I think that language models are definitely better at writing and language than most humans are. But the question that I was really curious about is why can't they write at a sort of literary creative fiction level. Because the thing is if you listen to these AI leaders talk about their aspirations, they say, we're gonna cure cancer, we're gonna solve physics, we're gonna build a superhuman coder. They are not shy about, oh, our AI models are gonna be better than 75% of human coders. They're saying, no, we will literally build a self-replicating factory tomorrow. And then Tyler Cowan asked Sam Altman in an interview from last October, when do you think GPT will be able to write a Naruto poem? And Sam Altman says, maybe in the future, ChatGPT will be able to write, quote, a real poet's okay poem. So that was the thing that fascinated me is even these guys who are more bullish than anybody else about the capabilities of their technology, they are very reserved about how much literary writing their models can do. And so that was the gap that I was really interested in. And you start your piece with this interesting provocation, which is that in some ways, GPT two was the peak of AI when it comes to creative writing. So explain that. Part of what got me interested in this piece was um I was actually doing research for your book. Um and I was going through all of these previous generations of models and reading the outputs. And the thing that really shocked me is that like in a way the writing style of GPD 2 and GPD 3 I found so much more compelling than ChatGPT today. It doesn't have any of the annoying ticks, it doesn't have the M dashes, the tripartite lists, the it's not this but that. The tone was much more variable. Like it would actually surprise you. It would be funny. It would be poetic. And that shocked me to sort of like go back a few generations and realize that maybe, you know, they were also lying all the time and all sorts of other things. But from a writing style perspective, I kind of preferred it and I wanted to investigate. They were weird. That shocks me. What to me talking to GPT Two was like talking to somebody who had just fallen down the stairs. You know what I mean? Where it was like do we I think what do we need to get you to the hospital? Do you smell toast? Yeah, you there's there are these amazing prompts for this like early open AI prompt library where they would say like, you know, I just won a hundred and seventy-five thousand dollars in Las Vegas. What do I need to know about taxes? And GPT2 would say like start just writing some short story about like an orphanage and like they were surprising they were like nutty they were they were they were weird they would absolutely be a terrible corporate assistant horrible like coding intern. It can't do any of the things that modern LLMs can do that I'm very grateful for. But like from pure writing style perspective, they're very good. So GBD3 in particular, like they um there's I found this like set of samples that some guy did where it was like, oh, write in the style of Paul Graham, write in the style of uh Richard Dawkins, whatever. And it could style match much better than modern LLMs can. And particularly because so much of sort of literary writing comes from voice and style, that was one of the things I was really interested in. It's like, what did we lose that the LLMs can no longer emulate Paul Graham's style or whoever's style? Because I would put in the same exact prompt that this guy gave GPD three, put it into chat GPT five point four thinking or whatever, and it would be god awful. And I was like, that's really weird. Aaron Powell So tell us about what you learned, about what happened after the GPT 2 and 3 era that changed the way that these models respond to us. Yeah, I mean I, think the the answer is post-training. basically So they started adding a post-training layer, which is basically saying we have these like crazy, unpredictable, like nut job, concussed models. And um they need to learn how to behave because a model that can't behave is a very bad corporate assistant. And so the AI researchers give them example dialogues and scripts to learn from. They give them words that they can and can't say. Uh they do RLHF, which is a process by which human graders will rate like which response is the most helpful sounding, or something like this. And so now these post-trained models have been trapped in a way or trained or guided towards a very particular character or persona that is a very helpful assistant, but might be very bad at writing in creative and surprising ways. I mean, the way that you described it was that there is uh a phase within the post-training phase where these AI models are evaluated by humans. And that's part of what they call RLHF or reinforcement learning from human feedback. And w what struck me in your reporting is that you actually've talked to some people who have done this kind of feedback uh to the models who say that they're they're just being asked to grade things that in ways that don't make sense. Yeah. Right. Tell us about that. Yeah. I mean this is super interesting because like the these job listings you'll see on like places like Mercor or XAI Elon's company will list them directly. It'll be like creative writing expert, forty-five dollars an hour, must be a New York Times bestseller and have like a starred Kirkus review or something like this. Have you ever gotten a start Kirkcast review, Roos? I think so. Okay, good job. Sure. All right. You might qualify to help Elon to help Annie from Grok write a little bit better. Yeah, we're gonna get in that job listing. But okay, you were saying. Um yeah, so anyway, so these companies, because they realize that these AI researchers, they're really good at knowing like what good coding is, but they don't actually know what good writing is. So they're like, why don't we hire some humans to find out? And so they'll commission like MFAs and published authors, and sometimes just like random guys with a blog or whatever. And one of the people I talked to who was a contractor for scale AI as a writing evaluator, and um he was doing this for one of the bigger labs, he said that the rubrics just didn't make any sense. He would be told things like, uh, you have to grade them based on the number of exclamation marks that there are. And so if something has three exclamation marks, that's too many. And so you have to ding that one. Yeah. And I have to say, generally not bad writing advice. I mean, I guess it depends on the length of the text, but three feels like a lot for many scenarios. This is what they tell women in business communications. It's like take all those exclamation marks, replace them with periods. Like we are just gonna remove all of the ideas. We teach women to shrink themselves. Exactly. Um but yeah, so he was he was sort of like being asked to grade these things or another one was he got a bunch of fanfictions and he was supposed to grade them on their factuality. Uh since that was one of the criteria. Um I do imagine that one could, you know, devise better rubrics than this particular evaluator was given, but I think it does show at least that uh some of these like very big companies that are very well resourced simply do not know how to think about what good writing is. Briefly, like I like I I wanna underline that because to me that seems like the whole story. We are taking the entire internet and we are grading it on factuality. And like so the the LLM that you're gonna get out of that is just probably not gonna be all that creative. Well and I wonder how much of it is related to this sort of verifiable reward system that a lot of these companies are using where you you have a system generate a bunch of code and then you have another evaluator model check the code to see whether it's good or not. And that works in domains like programming where the code either runs or it doesn't, but creative writing doesn't work that way. You can't have an evaluator tell you with any sort of consistency w,hether something is good or not. And so it may just come down to preference. And so I guess I'm curious: like, do you see this as a technical problem that the labs are frustrated trying to solve? Or is this just demand-related? Is this just what people want chatbots to sound like? And in every test where they pit different models against one another, the one that sounds like a bland corporate assistant wins. And so they go with that. I think both are true. It's like the majority of writing that we are asking the models to do is write this email for me, right? And like they excel at that. They are truly great corporate email writers. They are much better at the whole like passive aggressive thing than I am. At the same time, I do think like you said, there is a technical challenge that has to do largely with verifiability. Is like there are people who have spent decades of their lives attempting to articulate what makes Shakespeare Shakespeare or what makes a Naruda poem a Naruda poem. And they will still not know in any kind of certain way. They will still get into debates with their fellow academics and literary critics about which writer is better than the other. And because these things are subjective, because they are ineffable, because they are hard to put in a rubric and that is the nature of art. And to that point, you know, y you started uh this segment by talking about Sam Altman saying, like, hey, you know, we just basically can't write a great poem yet. Sam Altman a year ago said the company had trained a good creative writing model and posted a short story on X. Many people found it compelling. Is Sam Altman just not being consistently candid with us, Jasmine? Ooh. Uh wouldn't be the first time. But that short story, if you remember, I'm sure you guys recall, had some great lines like talking about the seams of mirrors or Thursday, the what was it? It was like the liminal almost Friday or something like that. Yeah, the liminal Sometimes with the metaphors, but also the language is not grounded in the life. And that was my other thing. Is aside from the verifiability, fundamentally, when I think about the writers who I really love, when I think about whether it's journalists or poets or whatever, like they are writing from life, right? Like a journalist goes out and talks to people and they like see stuff and observe like the color of the sky in a particular way. Or um like a poet is thinking about personal experiences that they've had. Their writing has stakes, it comes from an emotional place. And the fact that like LLMs wh,ile being very talented, grammatically pristine, whatever, they don't have lives. That means that all of the metaphors they choose, all of the words they choose, the examples they choose, they're just ungrounded, right? Like it's not coming from a point of view or a particular experience or a particular community that makes the writing believable. I think part of what voice and style is, is that it is very specific to the life that a person has had, and LLMs cannot get there in the same way a human who hasn't really lived that life like cannot get there. I don't know. I feel like it's case dependent. You know, I'm a big music fan. And um over the past few months, I have enjoyed putting questions about music and in particular the sounds of certain bands to an LLM, which sounds like a joke prompt because an LLM has never heard anything, right? And yet I find that in general, the models can have good conversations with me about the sound of music. Now, it may be that they are just pattern matching based on a bunch of public writing on the internet by people who do have ears and have heard, right? Like I'm very open to that. But I I I I I again have just sort of been struck about the way that it is able to like um sort of write about sensory topics in an evocative way that to at least to me like surpasses what I would predict they would be able to do. Yeah. I I want to pose a couple objections that I think someone might make to your article. One of them is this is Cope. This is Jasmine, a writer, a very talented writer, sort of finding the things that AI, in her view, is not good at yet, and saying this is categorical proof that they will, it will be very hard for AI to do these things. This is the same reaction that software engineers had when models started getting really good at code. They would say, oh, well, it can't do these other 10 things that I do, and that basically just wait a few years and the models will be better than all of us at everything, including writing. Um I would love for it to be cope because I try to automate myself away all the time. I have no sort of deep attachment to having to like like I like writing, but like I have tried over and over and over for the past three years to automate my own job away and to get Claude to do my job for me. It cannot do it. This is very frustrating. It's not out of a lack of trying, you know? Um, and again, I'm going back to the CEOs themselves and the things that they themselves are saying, right? Like, it's not just me a writer, it's Sam Altman saying this thing will cure cancer and solve physics, but it will not write better than a real poet's okay poem. And so like I think like that suggests that there is something that is at least perceived as a little bit different. I think it's very possible the models will get much better at writing over the next few years. I don't think it's like a never thing. I do think that you know, like reporting is hard to replicate. I think that like having life experiences that are real and verifiable is hard to replicate. I think the style stuff can be improved, especially if you fine-tune the models. But I think what's also interesting to me about this piece is that it shows how the um the market incentives, the demand incentives of these companies do shape what we see uh their abilities are today. The other objection I I imagining people might have who are very AI pilled is that this is all in the eye of the beholder, right? There have been several studies now that have shown that if you give people a blind taste test of AI writing versus human writing. They prefer the AI writing until you tell them that it's AI writing and then the value in their eyes plummets. I did one of these in a New York Times quiz just recently. So is it possible that the models have already become superhuman at writing, but that the minute we learn that they are AI models generating text and not humans writing words with their fingers, uh we lose all interest in it just because of the source, not because of the quality of the writing. Aaron Powell I mean I think it's definitely interesting and true that people don't want to like AI writing. And that is part of what bothers them when they s see AI text that is obviously AI, even though, like you said, like in these quizzes and tests, AI can outperform uh human writers in those narrow scenarios. I mean, my quibble with a lot of these quizzes and tests is that like as a writer and you guys are writers too, how much of your job is actually text generation? I think AI is a superhuman text generator, right? My job, I am generating text probably 25% of the hours in my day. I spend a lot of time interviewing people. I spend a lot of time coming up with ideas. I spend a lot of time reading and not just reading indiscriminately, but like reading the very particular sources that feel like the right ones. And so like, you know, usually at the point that you are doing one of these tests, you're saying like generate like one paragraph very specifically about like why Trump won the 2016 election, 500 words or less. And like you've already given the prompt, which I think is a critical part of writing is like what are you gonna write about? You've often like supplied some of the evidence and the guidance and the form of it, saying like 500 words or less. And at that point, I do think that AI is probably a better text generator than almost all humans are. But again, when I think about it, you know, AI is still very bad at coming up with ideas for articles. It is still very bad at reporting. The non-text generation parts of the role feel um further away from automation. Again, like I'm not a never say like like I'm I'm sort of like never say never, like maybe you'll get there. Um, I would be totally happy if you know Claude was able to give me good ideas for my next essays, but it's not there yet. Aaron Powell Well, we're we're already s seeing the LLMs make huge progress in genre fiction, right? So like recently on the show, we talked to the author of a story in the Times about how authors of romance novels are now able to generate dozens of novels a year using uh LLMs. In fact, much of the discussion that we had was around how you just have to prompt them differently and sort of relentlessly in order to get what you want. You know, your piece, Jasmine, made me wonder like how much of getting a model to just write weird can be achieved by repeatingly telling it in different ways. Hey, be a little weirder. Some of it, but not all of it. I mean so I talked to, for example, James Yu, who is the co-founder of Pseudorite, which is one of the earliest creative fiction AI writing assistants. I talked to some other folks who similarly were in the fiction writing LLM space. And like you said, to an extent, a lot of writers are already using these a lot, are already leaning on LLMs to generate large amounts of text and it can be very successful and it can meet readers' needs and whatever. But like even these people who I was talking to, they were describing to me how freaking hard it is to undo all of the post-training that the labs have done. So they are like applying immense amounts of engineering effort that clearly when the in my conversations with them clearly frustrates them that it is so hard to get these models to stop being so chirpy, so sycophantic, so PG thirteen and everything, uh, in in order to get them to this sort of like base model state where they're able to be weird again. Um so I think it's certainly possible, but I think the labs have made it quite challenging just because of the way that these models are trained. The other thing that I think is important is I tend to think that writing and a lot of creative work is actually like the perfect use case for these centaur models, right? Like the idea that the human plus AI collaboration is where you can get the furthest. And when I listen to the interviews that you guys did about the fiction authors, I was thinking this is a centaur model, right? Because without the human prompting and bullying the AI into getting weird and getting sensual and whatever, like it was not gonna do that on its own. And like I myself, like I do use LLMs as a research assistant. Like I wrote about that inside the Atlantic piece, about the way that Claude has now sort of helped me uh edit my own work in a way that I found incredibly useful. But I do feel like the collaborative element is m is important for any domain where the personal perspective, lived experience, whatever really matters. Aaron Powell Talk about that a little bit. You mentioned your editing process. How are you using AI to help you edit your work and are you finding it useful? Yeah, so I feel like I really cracked this over the last like couple months, which I'm very excited about. Because again, I've tried to make these things like write and edit for me over and over and over, and they've never really been able to do it. So the thing that I realized was um if I make Claude into an editor, that is not just trying to grade and give feedback on my work against some genericized standard of what good writing is, but actually we what we did against basically what my personal, Jasmine's personal aspirations for writing are, it can give feedback that I find much, much more helpful. So what I did was basically I fed Claude my entire subsec archive of the writing that I've previously done, as well as some of my freelance work. And just to get real specific, is this inside like a Claude project or how have you set this up? Because I know like our listeners are gonna want to try this. Yes. Um I did it in a project, but on Claude's advice. I was like, Do I need to cloud code something? And Claude was like, No, that's serverkill. Um so you don't need to code or anything. Um so in a CLAUD project, I gave it my whole archive of writing. I also personally write retro notes to myself after everything I publish. So I have a notes app that's just like me writing what was good and bad about everything I've ever written. Um just a few bullet points. This is why Jasmine's gonna be our boss. I mean this is a very low quality bullet points, but um I also gave it that because I wanted to learn my taste. I wanted to learn what do I aspire to be and where do I see myself falling short and where what am I proud of, right Right?. And so from those two things, plus a little bit more information about like here's my audience, this is my beat, this is my goals. Um, we were able to co-develop a rubric of instead of like how many exclamation marks does it have, it would say things like, does this take advantage of your quote unquote like insider anthropologist position in Silicon Valley? Because that's one of the things that Claude and I think distinguish my voice. Or it'll also notice like, oh Jasmine, you tend to move between registers. You'll switch between, you know, startup jargon and like internet slang and whatever. And like I think the fact that you can do the high-low or move from like policy to personal scene, this is something that is characteristic of your writing. And so again, we're co-developing these qualitative criteria. Um, and then I split it into phases of like ideation phase, uh, rubric, structure rubric, prose rubric, final fact checking. And so what I do now, I put this all in a CLOD project, I said, your job is to evaluate my drafts based on this criteria, but not to do the writing for me and to make sure to prompt out of me like what I can do better. I dumped a draft into Claude. Claude will run like phase two structure on it. It'll say things like your conclusion is just a summary, and this is really boring. In fact, in your piece about this and that, you actually ended on a scene, and I thought that was much more powerful. So why don't you try ending this one on a scene? And Claude will say, rather than inventing a scene, it will say, what were you thinking when the plane took off? What were you feeling inside? Can you think of a scenario where you had a conversation with say a kid's safety advocate about AI that really resonated with you? Because right now it sounds like dry policy explainer. And like that feedback I actually found incredibly useful. Like I'm still applying my own judgment to say do I take it or not. But I'm like, you know, this is about me becoming the best version of myself as a writer. It's about like me self-improving and Claude pushing me to do that, which I found much, much more helpful. I want to ask you both a question as fellow writers. Do you feel the impulse to make your writing weirder because of AI to sort of stand out from the sea of slop? Because I find myself feeling this tug of like, oh, that's a little weird aside that probably I should cut, but I think I'm gonna leave it in because like Claude would never do that, right? It's like it's like a marker that I am typing these words and I feel like that's sort of my imprimatur that I'm leaving. My answer to you is yes, I absolutely feel that way. And I've I've like gone back and tried to edit sentences to like make them feel a little bit more like weird or like in particular to make them sound colloquial in a way that I know like an LLM generally would not be. And like yes, it is for that reason. I think that writing right now, like we're all not all, many of us are on such high alert for the uh prospect that we might be reading slop that I think if you' a wreriter who does not want to be producing slop, like you should be asking yourself that question. I I think it makes me a lot more comfortable writing the way I want to write in the first place. Like I think like maybe unlike both of you, I didn't sort of come up through newsrooms where I was like learning a very specific house style and all of these norms. Like I can do news writing now. It's something I've learned now, but like I'm actually much more quote unquote like internet and blogging native, which is a form that is voicy and irreverent and not as pristine and like we'll make inappropriate jokes and like, you know, it's just a looser form of writing. And so I think what it's actually done is made me more comfortable doing the bloggy thing instead of sort of always trying to write in um a more professionalized journalistic tone. So I think we should leave this with a question for you, Jasmine, which is you know, you your your piece makes the case very convincingly that today's AIs are not very good at the kind of writing that I think we all value. Do you think they will get there? And what should the companies do to make their models better at writing? I think that if we separate out text generation from reporting, which I am not that bullish on the models doing, and we are just talking about, say, literary fiction or here's a bunch of interview transcripts, write a magazine feature or something. I think that if they applied as many resources towards that task as they do towards coding agents and things that actually make the money, I think that they could get there. Um, will the companies ever find it financially advisable to spend all their resources on that instead of automating 23-year-old software engineers? Probably not. I would be grateful for that world. I don't need them to take my job or these folks' jobs, but I think it's possible. Look, they're gonna get around to it eventually. Okay? You know, it's like I mean I I I hear that. Have you seen what writers make in this economy, Casey? Like those aren't gonna pay for a lot of data centers. No, the there is economic value in writing, like and eventually the AI companies will want that all to themselves. You know what would be a very funny outcome of this, you know, uh taking your your point about the sort of guardrails of the models. Maybe the great the next great American novel will be written by Grok. Oh God. That's an And with that, Jasmine Sun, thank you for joining us. Thank you. Thank you very much. Kevin and Casey . When we come back, everyone's spending money on tokens, Kevin. Great, keep going. You've gotta be token. Token maxing that is. Keep going. Yes, and. When we come back, what are you talking about? That's the question being asked by a leaderport that's sweeping Silicon Valley. The sweeping. It's sh it's really sweeping . So there's a lot of noise about AI, but time's too tight for more promises. So let's talk about results. At IBM, we work with our employees to integrate technology right into the systems they need. Now a global workforce of 300,000 can use AI to fill their HR questions, resolving 95% of common questions. Not noise. Proof of how we can help companies get smarter by putting AI where it actually pays off. Deep in the work that moves the business. Let's create smarter business. The day begins at the Chase Sapphire Lounge by the club at Boston Logan Airport. You get the clam chowder. Mm. In San Diego, it's Tostadas. New York, Espresso Martini. It's 10 a.m. Why not? It's the quiet before your next flight, the shower that resets your day, the menu that lets you know where you are. This is access to over 1300 airport lounges and every Sapphire Lounge by the club. And one card that gets you in. Chase Sapphire Reserve. Now even more rewarding. Learn more at Chase.com slash Sapphire Reserve. Cards issued by JP Morgan Chase Bank and a member FDIC. Subject to credit app roval. In the New York Times. How does it feel to see your name in print again? Feels great. Hasn't happened yet, but when it does, it'll be great. Well, I got to take an early read at a story that uh you are publishing about the fact that tech companies have now created leaderboards to show which employees are using the most AI tokens in their work. Yes, it's a token frenzy out there, and uh the employees of these companies are competing among their colleagues sort of informally and sort of for fun, but they're taking it very seriously. They want to be the people at their company who are using the most AI tokens. So let me just ask a basic question for listeners who may not be familiar. What is a token? And why is that something you might start keeping track of? So a token is the basic atomic unit of AI labor. It's basically a a fragment of a word, and it is how AI model providers measure their consumption. So if you type in a prompt, uh, you know, help me uh write this essay, an old model might have given you a couple hundred tokens in response. That would be a couple hundred words. And what has been happening over the past year or so as these agentic coding tools have started taking off is that the models are just much more token hungry. You can use now uh hundreds of thousands or even millions of tokens in a single session. And so that is what is propelling these leaderboards is the idea that the more sort of coding you're doing, the more agenc tools you're using, the more simultaneous processes you're running, the higher your token count will be. One measurement I found useful was that apparently it takes about ten thousand tokens to generate seventy five hundred words. If that sort of, you know, helps to ground you at all. But as you just said, and I want to hear more about this. The more advanced systems are using way more tokens than that. So tell me about some of the numbers that some of the sort of token all stars are putting up on the on the boards. So I don't know all of the exact numbers, but I did learn that at OpenAI, where they do track this kind of leaderboard, the highest employee token count over a seven-day period recently was a guy who used 210 billion tokens. And this is for rough scale about 33 Wikipedia's worth of text. And now all of that is not sort of typing and receiving a response, some of that is what they call cached tokens. So it's not all sort of you know being extruded from the model for the first time. But these are the kinds of numbers that I think even a year ago would have sounded completely insane. Right. Now was this guy working on a new mass domestic surveillance program for the Department of Defense? I don't know. And OpenAI did not make him available for interviews. But what I wanted to do in writing this column was to try to call up a bunch of people or talk to a bunch of people who are in this sort of billion token club, right? The sort of extreme power users, and just ask them like hey how are you guys using all those tokens and isn't that very expensive and how are you paying for it all and I learned a lot. Yeah well okay well so tell us first of all just how expensive it is. Very expensive. Yeah in fact um I I uh heard that the top user of Claude Code, the top individual user of Claude Code, uh as measured by Anthropic, uh spent more than a hundred and fifty thousand dollars on tokens last month. So extrapolate that. That is like a an employee making more than a million dollars a year. Yeah. And they are burning that in a month. And I heard similar figures from some of these other extreme coders who are spending something on the order of thousands of dollars a day on tokens from these models. Now, we should also say the employees of these companies get their tokens for free, right? So they're not shelling out their, companies are not shelling out. But at other companies, this is starting to become an issue because they are out sort of outstripping their budgets for these things. So there are companies where there are engineers who legitimately are costing their employers maybe $150,000 a week because they're getting tokens from one of the big providers? Yeah, I talked to a software engineer in Sweden who said that he probably spends more than his salary on Claude. So this is essentially becoming like a very expensive job perk for some of these coders. So talk to me about why employers want to create leaderboards to promote this to employees because I could see other companies saying if you spent a hundred and fifty thousand dollars on tokens last month, you actually don't work at this company anymore. Because we're bankrupt. Right. So this was a big question that I had is like, why is this going on? And it seems to be some combination of sort of employee motivation and worker tracking, right? There are executives at these companies who think that the more tokens you use, the more productive you probably are. And as we discussed in a previous segment on this show, these companies are very eager to have their workers start embracing the AI tools. And so at a number of these companies, I talked to people who said, yeah, this is just basically them trying to see who is really all in on the new way of programming. And you've talked to a number of people who are ranking high on these leaderboards. I realize you probably haven't dug deep into their code, but what is your sense of how productive they actually are? Like what is the relationship between token usage and taking my company to the next level? I mean it's very unclear, right? Some of these people may be just generating like worthless projects. I think the the thing that worries a lot of the people I talk to about these leaderboards is that they just incentivize you to like run up your token count. Right. Yes. Cause then you look like the special, you know, 10X engineer or 100X engineer who's like outperforming all your colleagues. So I think there are a number of uh companies that that see this leaderboard business as a little strange and maybe counterproductive, but I do think that there is a feeling among the most sort of heavy token users that they are being productive. Yeah, I have to say when I read your column, I thought this just seems like it would create the worst incentives. Right. There's this idea of good heart's law, right? Like when a measure becomes a target, it ceases to become a good measure. I can't think of a better way to ensure that tokens usage becomes a bad measure than creating a leaderboard for it. Totally. What are the people inside the company saying about that? Well, some of them are opposed to this whole leaderboard thing. I also talked with someend folks who defed the leaderboards. They said, look, it's never been all that easy to track the productivity of programmers. Some people have had their productivity measured by like how many lines of code they generated or how many poll requests they made. These are sort of these imperfect proxies for like how hard are you working? How much are you doing? But the employees of these companies also see this, I think wisely, as a key to their own success. A number of these companies are now using AI token use and consumption as part of the performance review cycle. So you go in for your annual review, your boss says, hey, it looks like you only use, you know seventy million tokens uh last month. What's going on? And so I think the engineers of these companies are getting wise to the fact that if they want to have a long, successful career, they better start using some tokens. Yeah, but I imagine that some of them are really nervous about that though, right? Because like it seems clear to me that at least some of these companies want to incentivize token usage because the companies themselves suspect that the more we can get them using this stuff, the less long we will have to employ the humans. Aaron Powell Maybe. Although I think it's less about like the AI systems replacing the humans and more about like it is just a radically different way of working, right? These are people who most of them have had long careers in software engineering. They grew up writing code by hand. They maybe grew up using some sort of like AI assistant, like GitHub co-pilot. And what people at these companies are saying is that these agentic engineering systems are just really different. You have to approach them in a different way. You have to spend a lot of time with them to understand what they're good and not good at. And to them, this is sort of a way of motivating their employees to say, hey, go out and try the new thing. Yeah. I don't I don't know. I've been thinking a lot about this question of like if I were an engineer at one of these companies and I had this incentive to get on the leaderboard, like how would I approach it? And I do think that like the instinct to just like waste a bunch of tokens to to like rise higher on the leaderboard like ultimately if you rise too high people are gonna ask you what you did with all the tokens. If you're number one at like 10 billion tokens and you only manage to, you know, like uh you know, vibe code a calculator or something, people are probably gonna get mad at you. Yeah, and I actually did talk to one person who speculated that actually the people at the top of the leaderboards are all doing side projects. They're starting their uh their side hustles. They start a new company with uh with the boss's money. And if you're doing that, I just want to say I salute you. Like that is the right way to work. Yeah, maybe don't be the number one on the leaderboard if you're doing that. Maybe try to stick around six or seven. Yeah, like middle of the pack is is kind of where you want to aim yourself. I mean l let me ask, is there any kind of token tracking that you think offers a reasonable signal? Like do you think that if you're uh like a tech company, you should create a leaderboard? No, I d I think that's a bad idea. Um, for all the reasons that we just talked about, including Good Heart's Law, which is I think this is just going to lead to people just wasting tokens, doing side projects. But if I'm the budget manager at a company and I'm seeing that people are spending multiples of their salary on AI tokens, I'm asking them some questions about what they're doing with that all. And if their answer is not, I built an amazing new product that's gonna generate billions of dollars a year in revenue. I'm trying to say, hey, could you maybe use a little less next month? Yeah. I I have to say I have been struck at how this idea of the token leaderboard just represents a new incarnation of something that the software industry has been trying to figure out for a long time, which is how can I figure out if my software engineers are productive? You know, I was talking recently to this uh very handsome software engineer who I'm engaged to about your column. And he was telling me that you know he used to be evaluated on how many lines of code he contributed. And he told me about all the games that people used to play back in the day with with, oh you, know, I like wrote a quick algorithm to like, you know, translate a bunch of stuff into some new languages and it's like completely worthless, but it makes me look like I had a very productive week. And so I went back and looked into this, and they were doing this in the 60s and seventies. And there's this saying from the early days of computer programming that eventually arises that says, quote, measuring programming progress by lines of code is like measuring aircraft building progress by weight. And I have to say, I think that the same thing kind of applies here, right? That like, yes, if you squint and at the right level of abstraction, it's probably true that some people who are using a lot of tokens are more productive than some people who aren't. It just doesn't quite seem like the right way to measure these things. And I just wonder how quickly the industry is going to figure that out. Yeah, I think it's going to be pretty soon, in part because the budgets are just getting very ridiculous. And and especially the AI model providers are now seeing individual users consuming amounts of their services that entire companies would have consumed just a few months ago. You know, maybe the kind of last question I have for you about this is just what implications you think it has for the broader economy, right? Because we know that in so many different sectors of the economy, managers are saying, I want to incentivize my employees to use AI and I want to track how they're using AI. So do you think that as knowledge of these leaderboards spread, we're going to see people in non-technical fields try to adopt their own version of them. I hope not. I think it's really a bad move, not just for tracking actual productivity and output, but just for morale, right? Like I remember years ago when like Gawker would have like a traffic leaderboard at their office so you could see how many clicks your stories were getting relative to other people. And I don't think anyone who like worked there at the time thought that was like incentivizing the right things or creating like high morale among employees. Basically everyone j was just competing with each other all the time. And I think in this case it's even worse because it's not necessarily even correlated with like any success. It's just pure sort of like, you know, how many agents can you run in a parallel swarm to sort of work 247 doing tasks of uncertain value? Which is a great question to ask on a first date in San Francisco too, by the way. But anyways, I have to say I I worry that this idea of like token maxing is going to spread into the broader economy. I was talking with somebody who works in marketing this week and she was telling me that you know her job used to be evaluated solely on creativity. And then recently uh the performance review got a new AI section and everyone is being evaluated on how much AI did use. And like from her perspective, she was kind of like it this was working fine. You know, like I didn't need to use like an AI tool to help me. But now like, you know, my bonus might be based on how much of it I use.

This excerpt was generated by Pod-telligence

Listen to Hard Fork in Podtastic

Podcast Listening Magic

All podcast names and trademarks are the property of their respective owners. Podcasts listed on Podtastic are publicly available shows distributed via RSS. Podtastic does not endorse nor is endorsed by any podcast or podcast creator listed in this directory.