Everything I Can Use and Nothing I Can Explain, or, The New Shape of Not Knowing

Everything I Can Use and Nothing I Can Explain, or, The New Shape of Not Knowing
Photo by kayla phaneuf / Unsplash

There is a thing my friend Darren does when he looks at something I have built. He tilts his head slightly to the left, just slightly, the way you might tilt your head at a piece of furniture that is not quite level, and then he says "huh," and then he begins a sentence with the phrase "you know, what I'd have done is..." I have known Darren since we both lived in the same terrible apartment, back when neither of us had much furniture worth leveling, and so I know that this is not the performance of critique. It is genuine helpfulness. Which is, I have found, actually harder to deal with.

This particular Saturday he was at my kitchen table while I made coffee, and he had my laptop in front of him, and what was on the screen was the personal task management system I had been building for the past several months, the one that routes different kinds of work through different processing modes depending on what the work actually requires. I watched him read through it. I watched him tilt his head.

"You know, what I'd have done is just tag everything by urgency and let the system sort it. That's basically what every productivity tool recommends. GPT says the same thing."

I turned from the coffee machine and said, actually, I had looked at that approach and decided against it. Urgency as the primary sorting dimension flattens everything into a single axis of how-soon, which turns out to be the wrong question most of the time, because most of what I need to manage is not about when but about how much cognitive overhead it requires. A high-urgency task that takes forty seconds to dispatch should not be competing for the same planning attention as a high-urgency task that needs two hours of uninterrupted focus to do properly. So I designed around two axes. Which is why the routing works differently than Darren expected, and differently than the AI would have suggested.

Darren looked at me for a moment. Then he said: "But that's not how it's usually done."

Which was true. And I understood, in a way I hadn't quite articulated to myself before, what was actually happening.

He was using the AI's answer as the ground. Not deliberately. Not as a choice. Just as the natural result of where his information about this domain had come from.

The cognitive psychology name for what produces that kind of ground is the illusion of explanatory depth. Leonid Rozenblit and Frank Keil documented it in a 2002 paper with the kind of deadpan experimental design that makes you want to immediately run it on everyone you have ever met. They asked people to rate their understanding of ordinary objects: how a toilet flushes, how a zipper works, how a bicycle stays upright. Participants rated themselves as moderately to confidently knowledgeable. Then they asked those same participants to write step-by-step explanations of the actual mechanisms. The self-assessments collapsed. Thoroughly. The people who believed they understood the toilet did not, in fact, understand the toilet. What they had was the feeling of understanding, generated by familiarity with the outcome, without the internal model that actual understanding would require.1

The feeling and the thing it mimics are functionally separable in a way that is both obvious in retrospect and genuinely difficult to catch in the moment. You can be completely convinced you understand something without understanding it. The feeling is not a reliable signal.2 And I think there is a version of this phenomenon that the original researchers did not anticipate, because the relevant technology did not exist yet, which is the fluency illusion applied specifically to AI-generated outputs.

Here is the structure of that version: you ask an AI tool for a recommendation on how to organize something, or build something, or approach a problem. The AI gives you an answer that is, in the nature of these systems, a high-quality synthesis of what most people do, what has generally worked, what the available evidence supports. It is a very good answer. You read through it and it makes sense, because it was designed to be legible, because coherence and clarity are among the primary things these systems optimize for, and the result is a response that reads smoothly, that does not catch or snag, that leaves you with the feeling of having followed it and therefore understood it. You absorb the recommendation. You feel you understand the reasoning behind it, because you understood the words and the words were clear. And then, without any particular decision being made, you begin using that output as your reference for what correct looks like.3

Darren was not doing anything unusual. He had used AI tools extensively over the preceding years, as most people have, and the recommendations had generally been useful, and his experience of engaging with them had been smooth, and so his mental model of best practice in a dozen different domains had gradually calibrated itself to what AI tends to suggest. None of this involved an error in reasoning. It was just the natural consequence of receiving a large volume of confident, coherent conclusions without having gone through the iteration that would let you evaluate when those conclusions apply, when they don't, and why.

Michael Polanyi, who was a chemist before he became a philosopher of knowledge and whose trajectory I find instructive for that reason, argued in The Tacit Dimension that all knowledge has two components: the explicit part, which can be articulated and transmitted as propositions and procedures, and the tacit part, which cannot, or at least cannot fully. The tacit layer is what the carpenter knows when he feels the joint is right, what the physician knows when she hears something wrong in a cough, what the craftsman accumulates through the specific experience of making things, failing, and correcting. Polanyi’s formulation is that we can know more than we can tell. Which gets cited constantly as a description of the limits of expertise, the stuff that resists language, but which I have started to read as also carrying a warning that runs in the other direction: if you only have the tell layer, if you received the recommendation without the experience that generated the reasoning underneath it, what you have is the description without the architecture, the conclusion without the map of tradeoffs that would let you evaluate whether the conclusion applies here.4

What I actually did, in the months before Darren came over and tilted his head, was run my own work through several different organizational schemes and observe what failed. The urgency-first approach failed in the exact way I described: it reduced a complex decision landscape to a single pressure dimension and made me systematically ignore anything that was not immediately urgent, including things with high leverage and low time pressure, which as it turns out is a substantial category of what matters. I noticed this. I tried other structures. I read things that turned out to be relevant, including research on cognitive overhead and task-switching costs that shaped how I was thinking about the second axis. I arrived at the two-axis structure not because it was elegant or obvious but because it was what my own iteration pointed toward. The reasoning was built from use, not borrowed from a recommendation.5

This is the part that was invisible to Darren, and the reason I want to be careful about being too easy on myself here, because what I am describing can also just be stubbornness dressed up in epistemology. The person who refuses to take the AI’s recommendation and has a sophisticated justification for why their idiosyncratic approach is actually better is not always right. Sometimes they are just a person who likes doing things their own way and has gotten good at generating explanations for it after the fact.6 The actual test, the diagnostic that separates genuine reasoning from motivated reasoning dressed as reasoning, is whether you can engage honestly with the failure modes of your own approach, the places where the AI’s synthesis would have served you better, the times you overweighted your own experience relative to a broader evidence base. If you can do that, you probably have something. If the conversation always ends with your original choice vindicated on every dimension, something other than reasoning is probably doing the work.

I can engage with the failure modes. The two-axis structure is harder to maintain than urgency-sorting. It requires a clearer view of what I am actually working on and what kind of attention it requires, and on days when I do not have that clarity, the system becomes a source of confusion rather than direction. The AI’s recommendation would have been more robust to that failure mode. I chose something more tailored and accepted the fragility that comes with it. That is a tradeoff, not a trick.

The generation effect in memory research, documented across decades and synthesized well in Make It Stick by Peter Brown, Henry Roediger, and Mark McDaniel, is the consistent finding that information you produce yourself encodes more durably than information you passively receive. The mechanism is not mystery: production requires retrieval, retrieval strengthens encoding, strengthened encoding is memory. The analog in the domain of understanding is that iteration toward a conclusion gives you the tacit layer that receiving a conclusion does not. You arrive with the map, not just the destination.7 And the map is exactly what you need when someone asks why you chose this route, or when the territory shifts and you need to navigate somewhere the destination does not tell you how to reach.

Anders Ericsson spent most of his professional life studying what deliberate practice actually requires, and the finding documented in Peak, written with Robert Pool, was that the feedback loop is the thing. You need to be attempting tasks at the edge of your current understanding, receiving signal about where you were wrong, and correcting. Exposure without feedback builds familiarity, which is real and useful and should not be dismissed. But it does not build the capacity to evaluate, to notice when a familiar approach is failing, to identify the failure mode before it becomes expensive.8 The person who has received a hundred confident recommendations across a domain has a very coherent picture of what the answers look like. The person who has tried several of those answers and watched specific ones break has something different: a picture of what the answers are for and what they are not for, where the recommendation was synthesized from and therefore where it might not apply. What Darren had was extensive exposure. What I had, at least in this particular domain, was the feedback loop. The combination of having tried the AI’s answer and observed where it broke was the thing that made my answer mine.

What I am describing, and I want to be honest that I am describing something I have only intermittently and recently, is the specific feeling of understanding your own choices at the level where they can be defended rather than just explained. Explained means: here is what I did and here are the words that describe it. Defended means: here is what I did, here are the alternatives I considered, here is the tradeoff I made, here is the condition under which I would choose differently. The second version requires having actually sat with the question.9 Most of us are in the explained category for most of what we do, which is fine, because the defended category requires real investment and not everything justifies it. The problem is not being in the explained category. The problem is being in the explained category while believing you are in the defended category, which is where the illusion of explanatory depth lives and which is, I suspect, where Darren was on Saturday morning.

The problem is being in the explained category while believing you are in the defended category

Metacognition researchers, working in a lineage that goes back to John Flavell’s foundational work in the late 1970s, have documented for decades that accurate beliefs about the limits of your own understanding are learnable but not default. The default is systematic overconfidence about what you know, generated by the cognitive shortcuts that make daily functioning possible and make epistemically careful living unusually effortful. The people who develop accuracy about their own understanding, who can tell with reasonable fidelity when they have the map versus when they just have the destination, have usually cultivated that accuracy through the specific experience of thinking they knew and then being shown they didn’t. The disconfirmation is the teacher.10

What I want to hold onto from the Saturday conversation is not the part where I was right about my system, which is honestly a fairly small and local matter. What I want to hold onto is the thing I noticed about what Darren was doing and what it reveals about a problem that will scale as AI outputs become more ubiquitous. When a large fraction of what people know about any given domain has come through the filter of AI synthesis, the synthesis becomes the reference. And the person who did the iteration, who has reasons for their choices, who can engage with the failure modes of their own approach, becomes harder to see because the thing they produced does not look the way a right answer is supposed to look. The deviation is visible. The reasoning underneath the deviation is not visible from the outside.11 You have to be told. And telling takes longer than a Saturday morning glance at a laptop screen.

This is, in ways I find I cannot be entirely sanguine about, a kind of loneliness. Not the dramatic kind. The quiet kind, the ordinary Tuesday kind, where you have thought carefully about something and your thinking has produced a conclusion that does not match the template, and you have to decide whether to defend it or defer to the template because deference is faster and the stakes are usually low enough that it doesn’t matter. The template is not wrong, exactly. It is just not yours, and "not wrong" and "yours" are different categories that the current moment is blurring in ways that I think will cost us something eventually, though I could not tell you precisely what. Most of the time I probably defer when I shouldn’t. Most of the time I probably also defend when I shouldn’t. The project is building better judgment about which is which.12

I refilled Darren’s coffee. I explained the two axes in more detail, including the specific failure modes I had observed in urgency-sorting and the reasoning behind the second dimension. Darren said: oh, okay, that actually makes sense.13 And then, a few weeks later, he sent me a message saying he had restructured part of his own system based on the idea.

He said it seemed obvious.

Most things seem obvious once someone has done the iteration to arrive at them and then explained the reasoning. The iteration is what makes them seem obvious. There is no shortcut to it except the iteration itself, which is the unfashionable answer to most questions about understanding, which I notice does not stop it from being the accurate one.


  1. The finding was more specific than this summary suggests: it was not simply that self-assessments were too high, but that the act of attempting the explanation actively lowered them, sometimes dramatically. People entered the explanation task with one level of confidence and emerged from it having learned something about the limits of what they actually knew. The attempt was the teacher. ↩︎
  2. The technical term here is processing fluency, which is the ease with which information is processed and the tendency to interpret that ease as evidence of understanding. High fluency feels like understanding. It is not, necessarily, understanding. It is smoothness. That is different. ↩︎
  3. This happens without anyone choosing it, which I think is part of what makes it hard to address. Nobody decides “the AI’s answer is now my ground truth.” It just gradually becomes that, through repeated exposure to confident outputs that are generally good enough, and the absence of the kind of failure feedback that would update the reference. ↩︎
  4. Polanyi was working inward from his own practice as a chemist, which is how he knew the tacit layer was real and not just a failure to articulate. He could do things in the laboratory that he could not fully explain. Rather than treating this as a deficiency, he built a philosophy of knowledge around it. The result is one of those books that reads quickly and turns out to have been very slow. ↩︎
  5. Worth being specific: the research I found most useful was on cognitive load, the distinction between intrinsic load (the difficulty of the task itself) and extraneous load (the overhead generated by how the task is organized or presented). Urgency-sorting optimizes for neither. It optimizes for a third dimension, time pressure, which turns out to drive behavior regardless of whether the behavior is good. ↩︎
  6. This is, to be clear, a real failure mode I have experienced in myself. The version where my idiosyncratic approach is actually just a preference dressed as reasoning is not hypothetical. I have been that person. I try to check for it. I am not always successful. ↩︎
  7. The “map vs. destination” framing is slightly loose and I want to be precise about it: the destination is the conclusion, the correct answer, the thing to do. The map is the understanding of why that conclusion holds, under what conditions, and where the edges of its applicability are. You can navigate with just a destination as long as the territory matches the one the destination was generated for. When the territory is slightly different, the map is what you need. ↩︎
  8. Ericsson was careful to distinguish deliberate practice from naive practice, which is simply doing a thing repeatedly, and from purposeful practice, which adds feedback and goal-setting. Deliberate practice additionally requires a teacher or structured method and tasks specifically designed to push past current limits. The self-explanation exercise I’m describing here is in the purposeful practice territory at minimum, not the naive. ↩︎
  9. The “actually sat with the question” phrasing is doing some work here I should acknowledge. What I mean is: you cannot reach the defended version of your choices by thinking about them in the abstract. You get there by attempting to explain them to a specific imagined questioner, ideally a skeptical one, and discovering which parts of your reasoning hold and which parts dissolve when pressure is applied. The pressure is the point. ↩︎
  10. Flavell’s original framing was about children developing as learners, which might seem like an odd origin for a concept with wide relevance to adult professional life. The relevance holds because the mechanism is the same: children who develop accurate beliefs about what they do and don’t know learn better; adults who develop accurate beliefs about what they do and don’t know perform better over time; the skill transfers across the age gap because the underlying cognitive challenge is the same. ↩︎
  11. There is something genuinely structurally odd about this, which is that the deviation and the reasoning are not separable in your own experience of having built the thing, but they are entirely separable in someone else’s experience of looking at it. You see the deviation and you see the reasoning behind it simultaneously, because you have both. They see only the deviation. The reasoning has to be transmitted separately, which requires the conversation Darren and I had, which most people with a Saturday morning and coffee to make do not have. ↩︎
  12. “Not wrong” and “yours” feel like they should be the same category, and in a world where the reasoning for any recommendation is fully legible, they basically would be. The problem is that a recommendation can be not-wrong and also not-yours simultaneously, and when the recommendation comes with enough fluency and confidence, the not-yours part becomes invisible. You think you adopted the reasoning. You adopted the conclusion. The distinction matters when conditions change. ↩︎
  13. The “oh, okay, that actually makes sense” response is interesting to me because it signals something specific: not agreement with my conclusion, which Darren might have registered as stubborn deviation from the template, but engagement with the reasoning. Once the reasoning was legible, the conclusion followed from it. Which suggests that the earlier friction was not disagreement about outcomes but a gap in visible reasoning. Once I filled the gap, the disagreement resolved. This is probably the cleanest evidence I have that what I was dealing with was the illegibility problem and not a genuine difference of opinion. ↩︎