The Art Institute of Chicago is currently showing more than 200 rarely seen de Kooning drawings, framed explicitly as an excavation of creative process: not the finished paintings, but the thinking that preceded them. In the same week, a paper landed on arXiv titled 'Let's Develop Data Probes to Fundamentally Understand How Data Affects LLM Performance,' arguing that the field has almost no systematic understanding of why certain training data produces certain model behaviors. Two excavation projects. Completely different scales of consequence.

Process Archaeology in Art and AI

The de Kooning show is valuable precisely because finished paintings erase their own making. Drawings preserve the hesitation, the revision, the gesture that almost happened. They are the artist's data, made legible. The arXiv paper by Wang, Woisetschläger, Jacobsen, and Ji makes an analogous argument about LLMs: we evaluate outputs obsessively and understand inputs almost not at all. The call for 'data probes' is a call for the equivalent of putting the sketches on the wall. What in the training corpus produced this behavior? We do not currently have tools that can answer this with precision. A 2024 paper in Nature Machine Intelligence by Feldman found that training data memorization in large models follows patterns that remain largely unpredictable at the individual data point level, which is both a technical and a philosophical problem.

What the Sketch Knows That the Canvas Doesn't

The de Kooning curators are making a curatorial argument: process is not preliminary to the work, it is constitutive of it. The data probes paper is making the same argument about AI. The model is not separable from what it was trained on, but right now we treat it as if it were. The art world solved this problem by putting the drawings in the show. The AI field is still arguing about whether the drawings exist.