We’ve constantly been sharing a list of our recent reads in our weekly emails for The Good Investors.
Do subscribe for our weekly updates through the orange box in the blog (it’s on the side if you’re using a computer, and all the way at the bottom if you’re using mobile) – it’s free!
But since our readership-audience for The Good Investors is wider than our subscriber base, we think sharing the reading list regularly on the blog itself can benefit even more people. The articles we share touch on a wide range of topics, including investing, business, and the world in general.
Here are the articles for the week ending 19 January 2025:
1. OpenAI o3 Breakthrough High Score on ARC-AGI-Pub – François Chollet
OpenAI’s new o3 system – trained on the ARC-AGI-1 Public Training set – has scored a breakthrough 75.7% on the Semi-Private Evaluation set at our stated public leaderboard $10k compute limit. A high-compute (172x) o3 configuration scored 87.5%.
This is a surprising and important step-function increase in AI capabilities, showing novel task adaptation ability never seen before in the GPT-family models. For context, ARC-AGI-1 took 4 years to go from 0% with GPT-3 in 2020 to 5% in 2024 with GPT-4o. All intuition about AI capabilities will need to get updated for o3…
…The high-efficiency score of 75.7% is within the budget rules of ARC-AGI-Pub (costs <$10k) and therefore qualifies as 1st place on the public leaderboard!
The low-efficiency score of 87.5% is quite expensive, but still shows that performance on novel tasks does improve with increased compute (at least up to this level.)
Despite the significant cost per task, these numbers aren’t just the result of applying brute force compute to the benchmark. OpenAI’s new o3 model represents a significant leap forward in AI’s ability to adapt to novel tasks. This is not merely incremental improvement, but a genuine breakthrough, marking a qualitative shift in AI capabilities compared to the prior limitations of LLMs. o3 is a system capable of adapting to tasks it has never encountered before, arguably approaching human-level performance in the ARC-AGI domain.
Of course, such generality comes at a steep cost, and wouldn’t quite be economical yet: you could pay a human to solve ARC-AGI tasks for roughly $5 per task (we know, we did that), while consuming mere cents in energy. Meanwhile o3 requires $17-20 per task in the low-compute mode. But cost-performance will likely improve quite dramatically over the next few months and years, so you should plan for these capabilities to become competitive with human work within a fairly short timeline.
o3’s improvement over the GPT series proves that architecture is everything. You couldn’t throw more compute at GPT-4 and get these results. Simply scaling up the things we were doing from 2019 to 2023 – take the same architecture, train a bigger version on more data – is not enough. Further progress is about new ideas…
…Passing ARC-AGI does not equate to achieving AGI, and, as a matter of fact, I don’t think o3 is AGI yet. o3 still fails on some very easy tasks, indicating fundamental differences with human intelligence.
Furthermore, early data points suggest that the upcoming ARC-AGI-2 benchmark will still pose a significant challenge to o3, potentially reducing its score to under 30% even at high compute (while a smart human would still be able to score over 95% with no training). This demonstrates the continued possibility of creating challenging, unsaturated benchmarks without having to rely on expert domain knowledge. You’ll know AGI is here when the exercise of creating tasks that are easy for regular humans but hard for AI becomes simply impossible…
…To adapt to novelty, you need two things. First, you need knowledge – a set of reusable functions or programs to draw upon. LLMs have more than enough of that. Second, you need the ability to recombine these functions into a brand new program when facing a new task – a program that models the task at hand. Program synthesis. LLMs have long lacked this feature. The o series of models fixes that.
For now, we can only speculate about the exact specifics of how o3 works. But o3’s core mechanism appears to be natural language program search and execution within token space – at test time, the model searches over the space of possible Chains of Thought (CoTs) describing the steps required to solve the task, in a fashion perhaps not too dissimilar to AlphaZero-style Monte-Carlo tree search. In the case of o3, the search is presumably guided by some kind of evaluator model. To note, Demis Hassabis hinted back in a June 2023 interview that DeepMind had been researching this very idea – this line of work has been a long time coming.
So while single-generation LLMs struggle with novelty, o3 overcomes this by generating and executing its own programs, where the program itself (the CoT) becomes the artifact of knowledge recombination. Although this is not the only viable approach to test-time knowledge recombination (you could also do test-time training, or search in latent space), it represents the current state-of-the-art as per these new ARC-AGI numbers.
Effectively, o3 represents a form of deep learning-guided program search. The model does test-time search over a space of “programs” (in this case, natural language programs – the space of CoTs that describe the steps to solve the task at hand), guided by a deep learning prior (the base LLM). The reason why solving a single ARC-AGI task can end up taking up tens of millions of tokens and cost thousands of dollars is because this search process has to explore an enormous number of paths through program space – including backtracking.
2. Energy Cheat Sheet – Brian Potter
Most energy we consume gets wasted. Of the 93.6 quads (~27,400 TWh) the US consumed in 2023, only around 1/3rd of that went towards producing useful work. The rest was lost due to various inefficiencies, such as heat engine and transmission losses…
…Another obvious fact is that despite the burgeoning construction of renewable energy infrastructure, the majority of our energy still comes from burning hydrocarbons. Petroleum, coal, and natural gas combined are responsible for roughly 82% of total energy consumption in the US.
Related to this fact is that electricity generation is a relatively small fraction of our energy system: roughly ⅓ of energy inputs go towards generating electricity. For residential and commercial consumption, only around half of energy use comes from electricity. For industrial and transportation energy (the two largest sources of consumption), electricity is around 13% and less than 0.1%.
What this chart makes clear, but also sort of abstracts away, is the enormous amount of infrastructure we’ve built for moving around hydrocarbons. The US has close to 1 million oil and natural gas wells, 3 million miles of natural gas pipeline, 145,000 gas stations, and capacity to refine 18.4 million barrels of oil a day.
This is why environmental advocates often focus on electrifying everything: decarbonizing energy infrastructure requires much more than just building low-carbon sources of energy like solar panels and wind turbines — it requires fundamentally reworking how our society moves energy around. It’s also why eliminating roadblocks and bottlenecks to energy infrastructure construction is so important.
We can also dive deeper and look at a sector-by-sector breakdown of energy use. The residential sector uses around 11.5 quads (3370 TWh) of energy, a little over 12% of total US energy consumption…
…One major takeaway here is that most residential energy consumption goes into heating things up: Space heating (5.74 quads), water heating (1.69 quads), and clothes dryers (0.26 quads) together account for ⅔rds of residential energy consumption.4 You sometimes see air conditioners decried as wasteful by energy-minded environmentalists, but air conditioning is a much smaller share of energy consumption than heating…
…Most transportation energy in the US is consumed in the form of gasoline and diesel fuel, with a relatively small amount of jet fuel. If we look at it by transportation mode, most energy (~78%) is consumed by cars, trucks, and motorcycles…
…The huge amount of energy used by transportation also means that households are using a lot of energy that isn’t captured by the residential energy consumption statistics above. In fact, in a year, the average US household consumes more energy from burning gasoline (~24,000 kilowatt-hours) than what’s used by the entire rest of the house (~22,500 kilowatt-hours).
The commercial sector is not that different from the residential sector, with heating air and water using the largest fraction, with cooling and ventilation (ie: moving air around) also using large fractions.5 As with residential, its energy consumption is roughly split between electricity and natural gas…
…With industrial energy use, we see a lot of the same patterns that we see in other sectors. One is that utility electricity is a relatively small amount of industrial energy consumption (less than 20%). Most industrial energy comes from burning fuel (mostly natural gas) directly. Once again, we see that heating things up accounts for a huge fraction of energy consumption: roughly half of all manufacturing energy goes into process heating: If we add process heat to residential and commercial air and water heating, we find that roughly 20% of total US energy consumption goes towards heating things up…
…It’s clear that most energy used in the US is ultimately wasted, with only a small fraction being used to perform useful work (moving cars, heating homes, operating electronics, and so on). Moving energy around and changing its form can’t be done perfectly efficiently (thanks in part to the 2nd law of thermodynamics), and all those conversions we require to get energy where it needs to be and in the form we need it whittle away the energy available to get things done…
…The biggest source of losses is probably heat engine inefficiencies. In our hydrocarbon-based energy economy, we often need to transform energy by burning fuel and converting the heat into useful work. There are limits to how efficiently we can transform heat into mechanical work (for more about how heat engines work, see my essay about gas turbines).
The thermal efficiency of an engine is the fraction of heat energy it can transform into useful work. Coal power plant typically operates at around 30 to 40% thermal efficiency. A combined cycle gas turbine will hit closer to 60% thermal efficiency. A gas-powered car, on the other hand, operates at around 25% thermal efficiency. The large fraction of energy lost by heat engines is why some thermal electricity generation plants list their capacity in MWe, the power output in megawatts of electricity…
…The low thermal efficiency of ICE cars and heat engines in general and the high efficiency of electrical equipment (especially things like heat pumps) are the biggest counterweight to the high energy capacity of hydrocarbons. The gas tank on an ICE car technically stores much more energy than a Tesla battery pack but only a small fraction of that gasoline energy can be converted into useful motion. Switching to EVs, even if that electricity is still provided by burning fossil fuels, could save large amounts of energy (and thus carbon emissions), as it could mean switching from a 25% efficient gasoline engine to a 60% efficient combined cycle gas turbine. And of course, with electric vehicles, there’s the possibility of powering them by non-carbon emitting sources of electricity like solar or wind.
3. Stocks Are More Expensive Than They Used to Be – Michael Batnick
In January 2018, they wrote an article, CAPE Fear: Why CAPE Naysayers Are Wrong. The article featured yours truly…
…It’s hard to believe seven years have passed since this article. It’s harder to believe that the S&P 500 is up almost 100% since their article came out, and delivered the highest 7-year performance for any CAPE starting at 33x. I did not see this coming. At all.
My whole thing was, yes, valuations are high. But companies are better today and deserve the premium multiple. I was not saying that a high CAPE is bullish. In fact, I ended most of my posts on this topic with the message of, “Expect lower returns.” I’ve never been happier to be wrong.
I want to return to some of the arguments I made, and what the CAPE zealots missed.
To use a long-term average that goes back to the late 1800s is foolish for three reasons. First, we didn’t have CAPE data back in 1929. It was first “discovered” in the late 90s. The discovery of data in financial markets changes the very essence of it. Markets are not governed by the laws of physics. They’re alive. They adapt and evolve and adjust, like an micro organism.
Second, the CAPE ratio has been rising over time since the 1980s. We’ve only visited the long-term average once in the last 25 years, and that was at the bottom of the GFC. If that’s what it takes to return to the long-term average, maybe you should reconsider what an appropriate comp level really is.
Third, and most important, the companies are far better today than they were in the past.
4. AI’s Uneven Arrival – Ben Thompson
What o3 and inference-time scaling point to is something different: AI’s that can actually be given tasks and trusted to complete them. This, by extension, looks a lot more like an independent worker than an assistant — ammunition, rather than a rifle sight. That may seem an odd analogy, but it comes from a talk Keith Rabois gave at Stanford:
So I like this idea of barrels and ammunition. Most companies, once they get into hiring mode…just hire a lot of people, you expect that when you add more people your horsepower or your velocity of shipping things is going to increase. Turns out it doesn’t work that way. When you hire more engineers you don’t get that much more done. You actually sometimes get less done. You hire more designers, you definitely don’t get more done, you get less done in a day.
The reason why is because most great people actually are ammunition. But what you need in your company are barrels. And you can only shoot through the number of unique barrels that you have. That’s how the velocity of your company improves is adding barrels. Then you stock them with ammunition, then you can do a lot. You go from one barrel company, which is mostly how you start, to a two barrel company, suddenly you get twice as many things done in a day, per week, per quarter. If you go to three barrels, great. If you go to four barrels, awesome. Barrels are very difficult to find. But when you have them, give them lots of equity. Promote them, take them to dinner every week, because they are virtually irreplaceable. They are also very culturally specific. So a barrel at one company may not be a barrel at another company because one of the ways, the definition of a barrel is, they can take an idea from conception and take it all the way to shipping and bring people with them. And that’s a very cultural skill set.
The promise of AI generally, and inference-time scaling models in particular, is that they can be ammunition; in this context, the costs — even marginal ones — will in the long run be immaterial compared to the costs of people, particularly once you factor in non-salary costs like coordination and motivation…
…What will become clear once AI ammunition becomes available is just how unsuited most companies are for high precision agents, just as P&G was unsuited for highly-targeted advertising. No matter how well-documented a company’s processes might be, it will become clear that there are massive gaps that were filled through experience and tacit knowledge by the human ammunition.
SaaS companies, meanwhile, are the ad agencies. The ad agencies had value by providing a means for advertisers to scale to all sorts of media across geographies; SaaS companies have value by giving human ammunition software to do their job. Ad agencies, meanwhile, made money by charging a commission on the advertising they bought; SaaS companies make money by charging a per-seat licensing fee. Look again at that S-1 excerpt I opened with:
Our business model focuses on maximizing the lifetime value of a customer relationship. We make significant investments in acquiring new customers and believe that we will be able to achieve a positive return on these investments by retaining customers and expanding the size of our deployments within our customer base over time…
The positive return on investment comes from retaining and increasing seat licenses; those seats, however, are proxies for actually getting work done, just as advertising was just a proxy for actually selling something. Part of what made direct response digital advertising fundamentally different is that it was tied to actually making a sale, as opposed to lifting brand awareness, which is a proxy for the ultimate goal of increasing revenue. To that end, AI — particularly AI’s like o3 that scale with compute — will be priced according to the value of the task they complete; the amount that companies will pay for inference time compute will be a function of how much the task is worth. This is analogous to digital ads that are priced by conversion, not CPM.
The companies that actually leveraged that capability, however, were not, at least for a good long while, the companies that dominated the old advertising paradigm. Facebook became a juggernaut by creating its own customer base, not by being the advertising platform of choice for companies like P&G; meanwhile, TV and the economy built on it stayed relevant far longer than anyone expected. And, by the time TV truly collapsed, both the old guard and digital advertising had evolved to the point that they could work together.
If something similar plays out with AI agents, then the most important AI customers will primarily be new companies, and probably a lot of them will be long tail type entities that take the barrel and ammunition analogy to its logical extreme. Traditional companies, meanwhile, will struggle to incorporate AI (outside of whole-scale job replacement a la the mainframe); the true AI takeover of enterprises that retain real world differentiation will likely take years.
None of this is to diminish what is coming with AI; rather, as the saying goes, the future may arrive but be unevenly distributed, and, contrary to what you might think, the larger and more successful a company is the less they may benefit in the short term. Everything that makes a company work today is about harnessing people — and the entire SaaS ecosystem is predicated on monetizing this reality; the entities that will truly leverage AI, however, will not be the ones that replace them, but start without them.
5. Don’t let interest-rate predictions dictate your investment decisions – Chin Hui Leong
A little over a year ago, the US Federal Reserve signalled its intention to cut interest rates three times in 2024. This commentary sparked a flurry of predictions, with market watchers vying to outguess the Fed on the number, timing, and size of these cuts. Goldman Sachs, for instance, boldly predicted five cuts.
We ended up with just three interest-rate cuts in 2024 – a significant miss, to say the least…
…According to Visual Capitalist, four firms – Morgan Stanley, Bank of America, Citigroup and Nomura – pencilled in a one-percentage-point cut for 2024. Credit should be given where it’s due: their forecasts were right.
However, did getting these predictions right matter in the end? As it turns out, not so much.
Morgan Stanley, Bank of America and Citi set 2024’s S&P 500 price targets at 4,500, 5,000 and 5,100 respectively…
…The S&P 500, of course, closed the year at 5,881…
…Forecasts and expectations may look similar, but they are different. My friend Eugene Ng puts it best: Forecasts rely on knowing when something will occur. Expectations, on the other hand, are the acknowledgement of what’s likely to occur without professing insight into when it will happen.
For example, it’s reasonable to expect the stock market to fall by 10 per cent or more sometime in the future. After all, history has shown that corrections are a common occurrence…
…In my eyes, calmness can be achieved by having the right expectations, and preparing well for any market turbulence even when we don’t know when the market will fall.
If you are prepared, you will have fewer worries. If you worry less, you will stand a better chance of doing better than average. And that’s more than any investor can hope for, whether the forecasts are right or wrong.
Disclaimer: The Good Investors is the personal investing blog of two simple guys who are passionate about educating Singaporeans about stock market investing. By using this Site, you specifically agree that none of the information provided constitutes financial, investment, or other professional advice. It is only intended to provide education. Speak with a professional before making important decisions about your money, your professional life, or even your personal life. We currently have a vested interest in Alphabet (parent of Deepmind), Meta Platforms (parent of Facebook), and Tesla. Holdings are subject to change at any time.