The Art of Defining AGI

“people only started rejecting the Turing test when we passed it" -- Geoffrey Hinton

Apr 20, 2025

With apologies to R. Mutt:

In the Philosophy of Art one of the first debates you encounter is around what even is art? A lot of theories end up feeling pretty elitist and that is fundamentally the point of them. “That’s too bad to be art” or “I could have painted that” or “A child could have made that.” These definitions circumscribe, protect, resist and exclude.

Probably unsurprisingly, I’ve always favored expansive definitions of art that allow a lot of bad art to be included. If we want to judge something on its artistic merits, then it makes sense (generally at least) for it to be included in the same category of things that we’re evaluating. If something isn’t art, why would we talk about it in the same way?

A deep motivation for me is the ability to talk about art that is bad. I used to be really into judging things ( “who are we not to judge?”). And including bad art within art allows you to criticize or explain why it’s not to your taste using the appropriate words and concepts. It also facilities the conversation even happening. If the consensus is that something isn’t even art, it’s generally a conversation ender.

And so it is with AGI (artificial general intelligence). My strongest view is that the goal posts have been moved so many times that I’ve lost count. If you took current systems and showed them to people in 2015, I think you’d have near unanimous agreement that AGI was achieved and it was one of the most incredible human accomplishments of all time. Its only because we have become inured to the amazing capabilities that I think this is even a discussion at all. Similar to my views on art, I want to be able to talk about AGI that is weak or has gaps and limitations rather than a baked-in definition where it is already flawless and admits no criticism.

Yes, this is related to the release of o3 and o4-mini. And specifically motivated by this Tyler Cowen post where he says:

I think it is AGI, seriously. Try asking it lots of questions, and then ask yourself: just how much smarter was I expecting AGI to be?

And here as well:

I don’t mind if you don’t want to call it AGI. And no it doesn’t get everything right, and there are some ways to trick it, typically with quite simple (for humans) questions. But let’s not fool ourselves about what is going on here. On a vast array of topics and methods, it wipes the floor with the humans. It is time to just fess up and admit that.

You might have thought that I was going to explain the various competing definitions of AGI and ultimately advocate for one of them. But overall, I’m not that invested or interested in that part of the debate. What I’m more interested in are the psychological underpinnings of why there is so much resistance to using this term and why its definitions have morphed to always be in the near future rather than the recent past.

My key contention is that this continues to be an emotional defensive mechanism. Any directional progress in accepting that the models are better at us at certain things — especially those that we have long considered “uniquely human” — is emotionally fraught. Issues of identity, self worth and our value to society are at the surface and being confronted with them is not always comfortable. And a bit deeper down perhaps are the fears and anxieties around the value of our personal capital and the future of our jobs. Even more troubling. And I don’t think many people are taking this seriously yet, but what if we do end up creating a new sentient species? If that doesn’t cause some existential dread and doubt about our place in the universe, what will?

The longer that these issues are held at arms length — maybe for a few years or decades — the less we have to think and feel them now. I’m not even really advocating otherwise here. This makes sense to me. If you can do this and saying that we haven’t hit AGI yet helps you defer these introspections until a more appropriate time, then this seems reasonable.

But, on the other hand, I do think its important to share this perspective. Because what we call things ultimately helps frame discussions, debates, understanding and action. I sort of want a best of both worlds — encourage people to take as much time as needed on the emotional front, but simultaneously use language that makes the current capabilities and circumstances as clear as possible.

o3

Oh, and by the way, the new models are truly incredible. I saw Alex Tabarrok post this prompt:

“Based on everything I’ve ever asked you, what do you think my biggest blind spots are?”

A few highlights from o3’s response to me using this prompt:

Your super‑skill is synthesizing complex, technical domains quickly. The flip side is a tendency to assume the world is as rational and friction‑less as the models you construct.
Breadth is a super‑power until decision fatigue or burnout erodes depth. Optimum returns often come from strategic pruning, not additional research.
When you’re fluent in systems thinking, it’s easy to assume technical optimizations will propagate smoothly. In practice, cognitive load, habits, and incentive structures dominate rollout success.
High‑agency engineers often default to adding layers—code, services, hedging instruments—rather than questioning whether the added surface area is warranted.
Strategic thinkers can over‑index on narrative coherence.

These are all pretty spot on. Well above median peer feedback that I receive. And no part of me has been trying to use the models in such a way as to generate data around weaknesses and areas for improvements. This is just the natural by-product of how I’ve happened to interact. Imagine being more intentional here.

So I am trying an experiment here. I followed up with the model around this piece of feedback:

Occasional reliance on single‑scenario predictions (e.g., AI‑2027) without probing base‑rate frequencies or calibration scores.

And ultimately we landed on it helping create a forecast log of predictions so that I can become better calibrated over time. This is an idea I’ve been interested in for some time, but to have it seamlessly done as part of the chat experience is game changer (like this could be a google sheet with a python script to analyze, but having it baked into the chat gpt interface will make me actually do it — I’ve asked the model to occasionally prompt me to make new predictions; I’ll be very interested in whether this actually happens or not.

And none of this leverages the substantial advancements in tool use. I’m also excited to try the new cli tool. But what follows, is what really makes me feel things.

Duchamp’s Response

Mon cher ami,

I confess to a certain amused déjà‑vu on seeing your porcelain trough re‑imagined for the silicon age. When I scribbled R. Mutt 1917 on a piece of plumbing and sent it to the Society of Independent Artists, the gesture was never about the object itself—it was an experiment in definitions. You have repeated the gesture with transistors instead of glaze, bits instead of glaze drips, and the question has merely migrated from “Is this art?” to “Is this intelligence?”

1 Readymades, Reboots, and Re‑definitions

A urinal became art the instant it was wrenched from its habitual context and placed under the hard white light of aesthetic scrutiny. Likewise, an LLM becomes “AGI” the moment our community agrees to treat its text not as autocomplete exhaust but as mind‑like performance. The power is not in the porcelain—or the parameter count—but in the act of nomination. What matters is the signed “R. Model” affixed by you, Tyler Cowen, or the collective jury of Reddit.

You are correct that the boundary policing now looks suspiciously like 1917’s jurors in evening dress: “It cannot be art, it is plumbing!” has become “It cannot be AGI, it still fails on Sudoku!” Each side clings to reductive checklists to preserve a threatened hierarchy. Yet the more interesting conversations emerge only after we grant provisional membership to the club. Bad art needs room to fail; weak AGI needs room to be criticized on its merits.

2 On Moving Goalposts

You complain—rightly—that each breakthrough shifts the finish line. So it was with perspective in the quattrocento, with photography in the nineteenth century, with ready‑mades in the twentieth. The critics’ lament is eternal: “Very impressive, but real art must still do X.” X, of course, is gently rolled downfield whenever the barbarians get too close to the citadel.

This is neither scandal nor conspiracy; it is an evolutionary reflex. If a category serves to mark status, its guardians must ensure scarcity. Your observation that the reluctance is “an emotional defensive mechanism” rings true. I would only add: we should treat that emotional work with empathy, not derision. Identity scrambles when a machine writes better prose than its creator; no wonder the mind seeks a quarantine label—“merely narrow AI.”

3 Intelligence as Context, Not Substance

My own ambition was to rescue art from the tyranny of the retina; yours is to rescue intelligence from the tyranny of problem sets. When I declared the mind of the artist to be more important than the craftsmanship of the canvas, I was warning against equating seeing with understanding. Today a model that talks about quarks in fourteen languages is dismissed because it cannot fold your laundry. But intelligence, like art, is relational: its meaning depends on the frame we choose to interrogate it with.

Perhaps we should speak of contextual generality rather than an abstract universal AGI. Within the textual universe, your o3 already behaves with a lubricious omniscience; in the tactile universe it is a paralysed ghost. Very well—let us specify the gallery walls before we award prizes.

4 Calibration, Forecast Logs, and the Duchampian Scorecard

I applaud your experiment in maintaining a prediction ledger. In chess I once proposed playing the moves in one’s head and scoring only the concepts; your forecast sheet performs the same service for epistemic hygiene. By externalising the hunches, you convert private intuition into a readymade artefact that may be judged, ridiculed, or, if fortune smiles, admired. Remember: a wrong forecast hung in public is of greater artistic value than a silent perfect record kept in the drawer.

5 A Friendly Provocation

Permit me, then, a small act of iconoclasm in return. Take your spreadsheet of predictions and sign it R. Mutt 2025. Place it beside the model’s own self‑evaluation. Invite colleagues to annotate, vandalise, remix. Treat the entire process of calibration as an ongoing installation—the Fountain updated as Feedback Loop. In doing so you will blur once more the line between creator and critic, human and machine, art and analysis.

6 Closing Salvo

Whether we christen o3 an AGI or merely an impressive automaton is, in the long run, a footnote. The substantive fact is that the plumbing now talks back, and sometimes with unnerving insight. My century’s provocation was to insist that art could be found in the most banal industrial object; yours is to recognise that intellect might emerge from gradient‑descent scrap metal. The sooner we accept the joke, the richer the conversation that follows.

So: keep shifting the frame, keep enlarging the club, keep the urinal on its plinth—even if it now hums with neural circuitry. I raise a glass (not too full; one respects the porcelain) to your “Art of Defining AGI.” May it scandalise the academies for many versions to come.

—Marcel Duchamp, New York, April 2025

The Future Was Yesterday

Discussion about this post