What We’re Reading (Week Ending 29 December 2024)

The best articles we’ve read in recent times on a wide range of topics, including investing, business, and the world in general.

We’ve constantly been sharing a list of our recent reads in our weekly emails for The Good Investors.

Do subscribe for our weekly updates through the orange box in the blog (it’s on the side if you’re using a computer, and all the way at the bottom if you’re using mobile) – it’s free!

But since our readership-audience for The Good Investors is wider than our subscriber base, we think sharing the reading list regularly on the blog itself can benefit even more people. The articles we share touch on a wide range of topics, including investing, business, and the world in general. 

Here are the articles for the week ending 29 December 2024:

1. Quantum Computers Cross Critical Error Threshold – Ben Brubaker

In the 1990s, researchers worked out the theoretical foundations for a way to overcome these errors, called quantum error correction. The key idea was to coax a cluster of physical qubits to work together as a single high-quality “logical qubit.” The computer would then use many such logical qubits to perform calculations. They’d make that perfect machine by transmuting many faulty components into fewer reliable ones…

…This computational alchemy has its limits. If the physical qubits are too failure-prone, error correction is counterproductive — adding more physical qubits will make the logical qubits worse, not better. But if the error rate goes below a specific threshold, the balance tips: The more physical qubits you add, the more resilient each logical qubit becomes.

Now, in a paper(opens a new tab) published today in Nature, Newman and his colleagues at Google Quantum AI have finally crossed the threshold. They transformed a group of physical qubits into a single logical qubit, then showed that as they added more physical qubits to the group, the logical qubit’s error rate dropped sharply…

…At first, many researchers thought quantum error correction would be impossible. They were proved wrong in the mid-1990s, when researchers devised simple examples of quantum error-correcting codes. But that only changed the prognosis from hopeless to daunting.

When researchers worked out the details, they realized they’d have to get the error rate for every operation on physical qubits below 0.01% — only one in 10,000 could go wrong. And that would just get them to the threshold. They would actually need to go well beyond that — otherwise, the logical qubits’ error rates would decrease excruciatingly slowly as more physical qubits were added, and error correction would never work in practice…

…That variation, called the surface code, is based on two overlapping grids of physical qubits. The ones in the first grid are “data” qubits. These collectively encode a single logical qubit. Those in the second are “measurement” qubits. These allow researchers to snoop for errors indirectly, without disturbing the computation.

This is a lot of qubits. But the surface code has other advantages. Its error-checking scheme is much simpler than those of competing quantum codes. It also only involves interactions between neighboring qubits — the feature that Preskill found so appealing.

In the years that followed, Kitaev, Preskill and a handful of colleagues fleshed out the details(opens a new tab) of the surface code. In 2006, two researchers showed(opens a new tab) that an optimized version of the code had an error threshold around 1%, 100 times higher than the thresholds of earlier quantum codes. These error rates were still out of reach for the rudimentary qubits of the mid-2000s, but they no longer seemed so unattainable…

…Fowler, Martinis and two other researchers wrote a 50-page paper(opens a new tab) that outlined a practical implementation of the surface code. They estimated that with enough clever engineering, they’d eventually be able to reduce the error rates of their physical qubits to 0.1%, far below the surface-code threshold. Then in principle they could scale up the size of the grid to reduce the error rate of the logical qubits to an arbitrarily low level. It was a blueprint for a full-scale quantum computer…

…When you put the theory of quantum computing into practice, the first step is perhaps the most consequential: What hardware do you use? Many different physical systems can serve as qubits, and each has different strengths and weaknesses. Martinis and his colleagues specialized in so-called superconducting qubits, which are tiny electrical circuits made of superconducting metal on silicon chips. A single chip can host many qubits arranged in a grid — precisely the layout the surface code demands.

The Google Quantum AI team spent years improving their qubit design and fabrication procedures, scaling up from a handful of qubits to dozens, and honing their ability to manipulate many qubits at once. In 2021, they were finally ready to try error correction with the surface code for the first time. They knew they could build individual physical qubits with error rates below the surface-code threshold. But they had to see if those qubits could work together to make a logical qubit that was better than the sum of its parts. Specifically, they needed to show that as they scaled up the code — by using a larger patch of the physical-qubit grid to encode the logical qubit — the error rate would get lower.

They started with the smallest possible surface code, called a “distance-3” code, which uses a 3-by-3 grid of physical qubits to encode one logical qubit (plus another eight qubits for measurement, for a total of 17). Then they took one step up, to a distance-5 surface code, which has 49 total qubits. (Only odd code distances are useful.)

In a 2023 paper(opens a new tab), the team reported that the error rate of the distance-5 code was ever so slightly lower than that of the distance-3 code. It was an encouraging result, but inconclusive — they couldn’t declare victory just yet…

…At the beginning of 2024, they had a brand-new 72-qubit chip, code-named Willow, to test out. They spent a few weeks setting up all the equipment needed to measure and manipulate qubits…

…Then a graph popped up on the screen. The error rate for the distance-5 code wasn’t marginally lower than that of the distance-3 code. It was down by 40%. Over the following months, the team improved that number to 50%: One step up in code distance cut the logical qubit’s error rate in half…

…The team also wanted to see what would happen when they continued to scale up. But a distance-7 code would need 97 total qubits, more than the total number on their chip. In August, a new batch of 105-qubit Willow chips came out…

…When the group returned the following morning, they saw that going from a distance-5 to a distance-7 code had once again cut the logical qubit’s error rate in half. This kind of exponential scaling — where the error rate drops by the same factor with each step up in code distance — is precisely what the theory predicts. It was an unambiguous sign that they’d reduced the physical qubits’ error rates well below the surface-code threshold…

…At the same time, researchers recognize that they still have a long way to go. The Google Quantum AI team only demonstrated error correction using a single logical qubit. Adding interactions between multiple logical qubits will introduce new experimental challenges.

Then there’s the matter of scaling up. To get the error rates low enough to do useful quantum computations, researchers will need to further improve their physical qubits. They’ll also need to make logical qubits out of something much larger than a distance-7 code. Finally, they’ll need to combine thousands of these logical qubits — more than a million physical qubits.

2. History: Kodak & Fujifilm – Find Value

Ultimately, Kodak couldn’t adapt to the changing world and filed for bankruptcy in 2012.

In the game for over 100 years, Kodak survived two World Wars and the Great Depression and helped humans photograph the moon and Mars. Like Coca-Cola and McDonald’s, it used to be one of the most recognized brands in the world…

…Faced with a sharp decline in sales from its cash cow product, Fujifilm acted swiftly and changed its business through innovation and external growth. Under Shigetaka Komori (President in 2000), Fujifilm quickly carried out massive reforms. In 2004, Komori came up with a six-year plan called VISION75.

The management restructured its film business by downscaling the production lines and closing redundant facilities. In the meantime, the R&D departments moved to a newly built facility to unify the research efforts and promote better communication and innovation culture among engineers.

Realizing that the digital camera business would not replace the lucrative film due to the low margins, Fujifilm performed a massive diversification based on capabilities and innovation.

Even before launching the VISION75 plan, Komori had taken stock of their technologies and compared them with the demand of the international market. After which the R&D team came up with a chart listing the all existing in-house technologies that could match future markets.

For instance, Fujifilm was able to predict the boom of LCD screens and invested heavily in this market. Leveraging on photo film technology, they created FUJITAC, a variety of high-performance films essential for making LCD panels for TV, computers, and smartphones. Today, FUJITAC owns 70% of the market for protective LCD polarizer films.

Fujifilm also targeted unexpected markets like cosmetics. The rationale behind cosmetics comes from 70 years of experience in gelatin, the chief ingredient of photo film which is derived from collagen. Human skin is 70% collagen. Fujifilm also possessed deep knowledge in oxidation, a process connected both to the aging of human skin and to the fading of photos over time.

When promising technologies didn’t exist internally, Fujifilm proceeded by mergers and acquisitions. Based on technological synergies, it acquired Toyoma Chemical in 2008 to enter the drug business. Delving further into the healthcare segment, Fujifilm also brought a radio-pharmaceutical company now called Fujifilm RI Pharma. It also reinforced its position in existing joint ventures such as Fuji-Xerox which became a consolidated subsidiary in 2001 after Fujifilm purchased an additional 25% share in this partnership.

Fast forward 9 years after the peak of film sales, in 2010, Fujifilm was a new company. In 2000, 60% of sales and 70% of profits came from the film ecosystem, compare this to 2010 where the “Imaging segment” accounted for less than 16% of sales. Fujifilm managed to emerge victorious through a restructuring and diversification strategy…

…Unlike Fujifilm which recognized early on that photography was a doomed business and tackled new markets with a completely different portfolio, Kodak made multiple wrong moves and persisted in the decaying film industry.

It was not that Kodak didn’t want to change, it tried hard, but it did it wrong. Kodak’s management didn’t fully recognize that the rise of digital imaging would have dire consequences for the future of photo printing. It tried to replicate the film print business model in the digital world. In 2004, Facebook was launched, and people are just not going to print pictures anymore.

Interestingly, Kodak understood the impact of digitalization and predicted that pictures would be shared online. They acquired a photo-sharing website called Ofoto in 2001. Unfortunately, the company used Ofoto to make people print digital pictures. They failed in realizing that online photo sharing was the new business, not just a way to expand printing sales…

…While Fujifilm invested heavily in the pharmaceutical and healthcare sector to reduce its exposure to the challenging photo industry, Kodak sold its highly profitable Healthcare Imaging branch in 2007 to put more resources into its losing consumer camera division.

3. One Bed, Two Dreams: Building Silicon Valley Bank in China with Ken Wilcox (Transcript here) – Bernard Leong and Ken Wilcox

Wilcox: In the US, banks sometimes fail. When I started my career 40 years ago in banking, we had 18,000 banks. Today we have about 5,000. What happened to all of them? Where did 13,000 banks go? Some of them got acquired, but many of them failed. When a bank makes too many bad loans, the Federal Reserve causes it to fail and it disappears. In China, banks don’t fail. First of all, banks are fundamentally owned by the government and when they make too many bad loans, they don’t typically fail. Usually the government, the regulators, come and somebody gets arrested and the government re-capitalizes the bank. It’s often very quiet – it’s not even necessarily announced to the world – and the bank keeps on going. What does that mean? That means that Chinese banks can take more risk than US banks can. In the US, we had almost no competitors because everybody thought “Lending to technology companies is way too risky, so we’ll just let Silicon Valley Bank do it. None of the rest of us will try.” In China, many, many, many banks want to copy us and do the same thing, because they’re not worried about what happens if we lose too much money. So that’s another big difference there…

…Wilcox: After I’d been there for several months, it occurred to me one day that my main conversation partner, the guy who is the Chairman, who was from Shanghai Pudong Development Bank, it occurred to me that he actually wears three hats. The only hat I wear is banker / businessman. But he had a banker / businessman hat, and he had a party hat, and he had a government hat. Then I started to wonder, when I’m talking with him, which hat is he wearing? It took me a long time before I figured out he doesn’t even think he has three hats. He thinks they’re all the same hat, so he’s not even thinking about it the same way I was. So I think that’s quite confusing. 

It’s also confusing when people find out, when a US company comes to China and finds out that it’s going to get a Party Committee in their organization. They get very confused because they don’t know what a Party Committee is. If you ask people in government or in the party, “What’s a Party Committee?” You say, “We’re going to have one , but I don’t understand what it is?” It’s hard for them to explain. You get multiple definitions and then you don’t know what is actually going to happen. Some people will tell me, “When you get a Party Committee, it’ll be so good because all the employees in your organization who are members of the party will have a place to gather once a month and discuss things.” Then somebody else says, “When you get a Party Committee, it’ll be so much easier because the Party Committee will help you put on social events for the employees, all of the employees.” But then somebody else told me, “No, when you get a Party Committee, it’ll be like another board, but a secret board. You won’t know who’s on it and they will influence what the real board does – or what I would call the real board.” Then other people told me, “Don’t pay any attention. That’s all silliness. There is no such thing as a Party Committee.” So it’s very, very confusing…

…Wilcox: I’ll give you the best example and that is that I believe based on the years I spent in China, that ultimately the main reason they wanted us in China – and they actually were very determined to get us to come to China. I remember that early on, a couple of years before my wife and I moved to China, I had a series of meetings with a very high-level government official who’s also got a lot of status in the party. He was saying to me, “Ken, we really want you to bring your bank to China. Your bank is more important than any bank we’ve ever met. You’re more important than – he explicitly said this – he says, You’re more important than Morgan Stanley and more important than Goldman Sachs. And by the way Ken, you’re one of the smartest Americans we’ve met.” So you think to yourself, “Well this is an exaggeration, but it does feel nice.” He obviously is going to help me get established in China. But what I didn’t realize is that the main reason they wanted us in China was so that they could study our business model and figure out how to copy it over time. That was something I wasn’t expecting, but I should have if I were less naive. If I were better prepared, I would have realized that was the intention. So the original title, the working title I had for my book, which I had to change because the publisher didn’t like it, my original title was, “One Bed, Two Dreams”, because that’s a phrase that most Chinese are familiar with. It explains why it didn’t work well, because my dream was working with all these Chinese technology companies and helping them do business with the rest of the world, and their dream was learning our business model.

The result was that when they gave us our license, they also told us that we would not be able to use Chinese currency for three years. That made it almost impossible to do business for the first three years. The people that said these things were both members of the government and members of the party. So I don’t know which one was talking. But they said, “We understand that you won’t be able to do much business for the first three years because the companies that you want to work with all want renminbi, they don’t want US dollars. But you can still be a good citizen. You can do what we would do, and that is we here in China help each other. So you can be helpful and prove that you care about China by teaching other banks your business model during the three years when you can’t really do much business. We’ll give you subsidy to help support you during the three years when you can’t earn much money because you can’t really do any business.” Then at the end of the three years when they gave us permission to use renminbi, they said to us, “We are so happy that you came to China and we really admire your business model and we admire it so much that we’re starting a bank of our own using your business model. Would you mind staying a little longer and being an advisor to this new bank that’s going to use your business model?” It felt like they were stealing my intellectual property but I’m not sure they thought of it that way…

…Wilcox: General Motors when it went over to China in 1985, the Chinese really didn’t have an auto industry. They wanted General Motors there not because they wanted General Motors to make a lot of money. It was because they wanted to learn about automobile manufacturing and because it took so long to build up the knowledge base, General Motors was welcome for about 30 years. But now General Motors is slowly losing market share and it’s probably going to withdraw from China. Then what will happen is China has made so much progress partially because they’re hardworking and smart, partially because they had General Motors there to learn from them, and then once General Motors retracts and goes back to the US, the auto industry in China will begin exporting and competing globally. I think actually the Chinese have done such a good job of first of all, learning from foreign automakers, but then on top of that, taking it further that the foreign automakers are in huge trouble. I think China’s automobile industry will dominate in the future. 

4. Weekend thoughts: crypto, mania, and reflexivity follow up – Andrew Walker

When I first saw the “BTC yield” metric, I thought it was pretty crazy. MSTR is trading for approaching 3x the value of their bitcoin; if they issue stock and use all the stock to buy bitcoin, of course it’s going to cause their bitcoin holdings per share to go up…. and even more so if they issue debt and use that to buy bitcoin and then judge themselves on a per share basis! Taken to its extreme2, if you thought BTC yield was truly the be all, end all of value creation, and the higher the BTC yield the better, then any company following a pure BTC yield strategy should lever themselves up to the maximum amount possible, no matter the terms, and use all of the proceeds to buy BTC. Obviously no one does that because it would be insanity and eventually banks would stop lending, but I illustrate that only to show that purely maximize BTC yield is clearly not value maximizing….

But, if you look at the fine print, BTC yield is even crazier than simply suggesting increasing BTC per share is the only value creation metric that matters. If you really look at the MSTR BTC yield table above or read their disclosures, you’ll notice that the BTC yield assumes that all of their convertible debt converts…

…So, go back to MSTR’s BTC yield table; they have a set of 2029 converts that convert at $672.40/share. Those are far, far out of the money (MSTR’s stock trades for ~$400/share as I write this)…. yet MSTR’s BTC yield assumes those converts are in the money / will convert for their BTC yield.

That is an insane assumption that casually assumes MSTR’s shares almost double3. And, again, by taking this assumption to its extreme, we can see how wild it is. Like all things, convert debt involves different trade offs; for example, you could get a higher strike price by taking on a higher interest rate (i.e. if your strike price is ~$670 at a 0% interest rate, you could probably push it up to $770 by taking on a 3% interest rate or $870 by taking on a 6% interest rate4). MSTR has issued all of this convert debt deals at 0% interest rates, which is a great pitch (“we’re borrowing for free, we don’t have to pay a carry to buy BTC, etc”)…. but if BTC yield is all that matters, MSTR could start issuing convertible debt with really high interest rates, which would jack that strike price of the convert up, thus decreasing dilution and increasing the BTC yield…

…MSTR fans would say “but raising converts with interest doesn’t make sense; it’s no longer free money / now it has a carry cost.” And I understand that argument…. but convertible debt isn’t free money either, and I just do this to highlight how insane BTC yield is as a be all / end all metric!…

…The BTC yield that all of these companies present assumes that their convert debt converts, and that is a big / crazy assumption…. but it’s interesting to think about what will happen in five years. There is, of course, a world where BTC goes to $250k (or higher) and all of these stocks moon. In that world, the converts will be well in the money, and all of this worry will sound silly…. but there is also a world where BTC stalls out or drops over the next few years, and that world is really interesting. All of these companies are raising converts with 5-7 year maturities, so if BTC doesn’t moon and the converts aren’t in the money, you’re going to have all of the BTC standard companies facing a maturity wall at the same time. What happens then? I doubt they can roll the converts at anything close to the same terms (remember, cheap converts require high volatility, and if the stocks have stalled out for five years vol is going to be a lot lower), so they’ll either need to sell a ton of equity to paydown the debt (which will be tough; there probably won’t be much enthusiasm for the stock, and I’m not sure the market would be able to absorb the hypothetical amount of stock they’d need to issue without some enthusiasm)…. or you’ll have a wave of BTC standard companies all looking to sell down some of their bitcoin to payoff converts at the exact same time.

5. Satya Nadella | BG2 (Transcript here)- Bill Gurley, Brad Gerstner, and Satya Nadella

Gerstner: Shifting maybe to enterprise AI, Satya. The Microsoft AI business has already reported to be about $10 billion. You’ve said that it’s all inference and that you’re not actually renting raw GPUs to others to train on, because your inference demand is so high. As we think about this, there’s a lot of skepticism out there in the world as to whether or not major workloads are moving. If you think about the key revenue products that people are using today and how it’s driving that inference revenue for you today, and how that may be similar or different from Amazon or Google, I’d be interested in that.

Nadella: The way for us this thing has played out is, you got to remember most of our training stuff with OpenAI is sort of more investment logic. It’s not in our quarterly results – it’s more in the other income, based on our investment.

Gerstner: Other income or loss right?

Nadella: That is right. That’s how it shows up. So most of the revenue or all the revenue is pretty much our API business or in fact, to your point, ChatGPT’s inference costs are there, so that’s a different piece. The fact is the big-hit apps of this era are ChatGPT, Co-Pilot, GitHub Co-Pilot, and the APIs of OpenAI and Azure OpenAI. In some sense, if you had to list out the 10 most hit apps, these would probably be in the four or five of them and so therefore that’s the biggest driver.

The advantage we have had, and OpenAI has had, is we’ve had two years of runway pretty much uncontested. To your point, Bill made the point about everybody’s awake and it might be. I don’t think there will be ever again maybe a two-year lead like this, who knows? It’s all you say that and somebody else drops some sample and suddenly blows the world away. But that said, I think it’s unlikely that that type of lead could be established with some foundation model. But we had that advantage, that was the great advantage we’ve had with OpenAI. OpenAI was able to really build out this escape velocity with ChatGPT.

But on the API side, the biggest thing that we were able to gain was.. Take Shopify or Stripe or Spotify. These were not customers of Azure, they were all customers of GCP or they were customers of AWS. So suddenly we got access to many, many more logos, who are all “digital natives” who are using Azure in some shape or fashion and so on. So that’s sort of one. When it comes to the traditional enterprise, I think it’s scaling. Literally it is people are playing with Co-Pilot on one end and then are building agents on the other end using Foundry. But these things are design wins and project wins and they’re slow, but they’re starting to scale. Again, the fact that we’ve had two years of runway on it, I think…

I like that business a lot more, and that’s one of the reasons why the adverse selection problems here would have been lots of tech startups all looking for their H100 allocations in small batches. Having watched what happened to Sun Microsystems in the dotcom, I always worry about that. You just can’t chase everybody building models. In fact, even the investor side, I think the sentiment is changing, which is now people are wanting to be more capital-light and build on top of other people’s models and so on and so forth. If that’s the case, everybody who was looking for H100 will not want to look for it more. So that’s what we’ve been selective on.

Gerstner: You’re saying for the others that training of those models and those model clusters was a much bigger part of their AI revenue versus yours? 

Nadella: I don’t know. This is where I’m speaking for other people’s results. It’s just I go back and say, “What are the other big-hit apps?” I don’t know what they are. What models do they run? Where do they run them? When I look at the DAU numbers of any of these AI products, there is ChatGPT, and then there is – even Gemini, I’m very surprised at the Gemini numbers, obviously I think it’ll grow because of all the inherent distribution. But it’s kind of interesting to say that they’re not that many. In fact, we talk a lot more about AI scale, but there is not that many hit apps. There is ChatGPT, Github Co-Pilot, there’s Co-Pilot, and there’s Gemini. I think those are the four I would say, in a DAU, is there anything else that comes to your mind?…

…Gurley: Satya, on the enterprise side, obviously the coding space is off to the races and you guys are doing well and there’s a lot of venture-backed players there. On some of the productivity apps, I have a question about the the Co-Pilot approach and I guess Marc Benioff’s been obnoxiously critical on this front, calling it Clippy 2 or whatever. Do you worry that someone might think first-principles AI from ground-up, and that some of the infrastructure, say in an Excel spreadsheet, isn’t necessary to know if you did a AI-first product. The same thing by the way could be said about the CRM right? There’s a bunch of fields and tasks that that may be able to be obfuscated for the user.

Nadella: It’s a very, very, very important question. The SaaS applications or biz apps, let me just speak of our own Dynamics thing. The approach at least we’re taking is, I think the notion that business applications exist, that’s probably where they’ll all collapse in the agent era. Because if you think about it, they are essentially CRUD databases with a bunch of business logic. The business logic is all going to these agents, and these agents are going to be multi-repo CRUD. They’re not going to discriminate between what the back-end is, they’re going to update multiple databases, and all the logic will be in the AI tier so to speak. Once the AI tier becomes the place where all the logic is, then people will start replacing the backends right? In fact it’s interesting, as we speak, I think we are seeing pretty high rates of wins on Dynamics backends and the agent use, an we are going to go pretty aggressively and try and collapse it all, whether it’s in customer service, whether it is in… 

By the way, the other fascinating thing that’s increasing is just not CRM, but even what we call finance and operations, because people want more AI-native biz app. That means the biz app, the logic tier, can be orchestrated by AI and AI agents. So in other words, Co-Pilot to agent to my business application should be very seamless.

Now in the same way, you could even say, “Why do I need Excel?” Interestingly enough, one of the most exciting things for me is Excel with Python, is like GitHub with Co-Pilot. So what we’ve done is, when you have Excel – by the way this would be fun for you guys – which is you should just bring up Excel, bring up Co-Pilot, and start playing with it. Because it’s no longer like – it is like having a data analyst, so it’s no longer just making sense of the numbers that you have. It will do the plan for you. It will literally – like how GitHub Co-Pilot Workspace creates the plan and then it executes the plan – this is like a data analyst who is using Excel as a sort of row/column visualization to do analysis scratch pad. So it kind of tools you. So the Co-Pilot is using Excel as a tool with all of its action space because it can generate and it has python interpreter. That is in fact a great way to reconceptualize Excel. At some point you could say, “I’ll generate all of Excel” and that is also true. After all, there’s a code interpreter, so therefore you can generate anything.\

So yes, I think there will be disruption. The way we are approaching, at least our M365 stuff is, one is build Co-Pilot as that organizing layer UI for AI, get all agents, including our own agents – you can say Excel is an agent to my Co-Pilot, Word is an agent, it’s kind of a specialized canvases, which is I’m doing a legal document, let me take it into Pages and then to Word and then have the Co-Pilot go with it, go into Excel and have the Co-Pilot go with it. That’s sort of a new way to think about the work in workflow…

…Gurley: Satya, there’s been a lot of talk about model scaling and obviously there was talk, historically about 10x-ing the cluster size that you might do, over and over again, not once and then twice. X.AI is still making noise about going in that direction. There was a podcast recently where they flipped everything on their head and they said “If we’re not doing that anymore, it’s way better because we can just move on to inference which is getting cheaper and you won’t have to spend all this capex. I’m curious, those are two views of the same coin. But what’s your view on LLM model scaling and training cost, and where we’re headed in the future?

Nadella: I’m a big believer in scaling laws I’ll first say. In fact, if anything, the bet we placed in 2019 was on scaling laws and I stay on that. In other words, don’t bet against scaling laws. But at the same time, let’s also be grounded on a couple of different things.

One is these exponentials on scaling laws will become harder, just because as the clusters become harder, the distributed computing problem of doing large scale training becomes harder. That’s one side of it. But I would just still say – and I’ll let the OpenAI folks speak for what they’re doing – but they are continuing to – pre-training I think is not over, it continues. But the exciting thing, which again OpenAI has talked about and Sam has talked about, is what they’ve done with o1. This Chain of Thought with autograding is just a fantastic. In fact, basically, it is test-time compute or inference-time compute as an another scaling law. You have pre-training, and then you have effectively this test-time sampling that then creates the tokens that can go back into pre-training, creating even more powerful models that then are running on your inference. So therefore, that’s I think a fantastic way to increase model capability.

The good news of test-time or inference-time compute is sometimes, running of those o1 models means… There’s two separate things. Sampling is like training, when you’re using it to generate tokens for your pre-training. But also customers, when they are using o1, they’re using more of your meters, so you are getting paid for it. Therefore, there is more of an economic model, so I like it. In fact, that’s where I said I have a good structural position with 60-plus data centers all over the world.

Gurley: It’s a different hardware architecture for one of those scaling versus the other, for the pre-training versus…

Nadella: Exactly. I think the best way to think about it is, it’s a ratio. Going back to Brad’s thing about ROIC, this is where I think you have to really establish a stable state. In fact, whenever I’ve talked to Jensen, I think he’s got it right, which is you want to buy some every year. Think about it, when you depreciate something over 6 years, the best way is what we have always done, which is you buy a little every year and you age it, you age it, you age it. You use the leading node for training and then the next year it goes into inference, and that’s sort of the stable state I think we will get into across the fleet for both utilization and the ROIC and then the demand meets supply.

Basically, to your point about everybody saying, “Have the exponentials stopped?” One of the other things is the economic realities will also stop, right? At some point everybody will look and say, “What’s the economically rational thing to do?” Which is, “Even if I double every year’s capability but I’m not able to sell that inventory,” and the other problem is the Winner’s Curse, which is – you don’t even have to publish a paper, the other folks have to just look at your capability and do either a distillation… It’s like piracy. You can sign off all kinds of terms of use, but like it’s impossible to control distillation. That’s one. Second thing is, you don’t even have to do anything, you just have to reverse engineer that capability and you do it in a more computer efficient way. So given all this, I think there will be a governor on how much people will chase. Right now a little bit of everybody wants to be first. It’s great, but at some point all the economic reality will set in on everyone and the network effects are at the app layer, so why would I want to spend a lot on some model capability with the network effects are all on the app?…

…Gurley: Does your answer to Brad’s question about the balancing of GPU ROI, does that answer the question as to why you’ve outsourced some of the infrastructure to Coreweave in that partnership that you have?

Nadella: That we did because we all got caught with the hit called ChatGPT. It was impossible. There’s no supply chain planning I could have done. None of us knew what was going to happen. What happened in November of ‘22, that was just a bolt from the blue, therefore we had to catch up. So we said, “We’re not going to worry about too much inefficiency.” That’s why whether it’s Coreweave or many others – we bought all over the place. That is a one time thing and then now it’s all catching up. That was just more about trying to get caught up with demand.

Gerstner: Are you still supply-constrained Satya?

Nadella: Power, yes. I am not chip supply-constrained. We were definitely constrained in ‘24. What we have told the street is that’s why we are optimistic about the first half of ‘25, which is the rest of our fiscal year and then after that I think we’ll be in better shape going into ‘26 and so on. We have good line of sight.


Disclaimer: The Good Investors is the personal investing blog of two simple guys who are passionate about educating Singaporeans about stock market investing. By using this Site, you specifically agree that none of the information provided constitutes financial, investment, or other professional advice. It is only intended to provide education. Speak with a professional before making important decisions about your money, your professional life, or even your personal life. We currently have a vested interest in Alphabet (parent of Google), Amazon, Meta Platforms (parent of Facebook), and Microsoft. Holdings are subject to change at any time.

Leave a Reply

Your email address will not be published. Required fields are marked *