A Lexicon of the New Era

Preface

It is the misfortune of the lexicographer that he arrives always too late. Language is made by the people who speak it, hurriedly and without consultation, and the man who proposes to write it down discovers that he is not legislating but reporting — and reporting, moreover, on a thing that declines to hold still while he describes it. Johnson, who knew this better than anyone, gave the lexicographer his lasting epitaph: a harmless drudge. The present work is offered in that spirit of harmless drudgery, and with a full anticipation of its own futility.

The occasion of the work is a particular one. Within a span of four or five years the field that calls itself artificial intelligence has produced a vocabulary at a rate the language has had no opportunity to digest. Some of these words are wholly new, coined by engineers for engineers and then released, without instructions, upon the general public. Some are old words conscripted into new service, and now carry a technical freight their everyday senses never bargained for — hallucination is the standard example, and a later entry will attend to it. And some, the most interesting class, are words that the industry has minted precisely because a plainer word would have been embarrassing; these are the euphemisms, and the lexicographer who wishes to be of any use will spend most of his labour on them.

What unites the vocabulary is that it is spoken daily, fluently, and with conviction by very large numbers of people who could not, if pressed, define a single term of it. This is not a reproach. It is the ordinary condition of a vocabulary that has outrun its own settlement; one used the word online for a decade before anyone troubled to ask what the line was. But it produces a curious public discourse — confident on the surface, hollow underneath — in which two parties may argue at length over whether a machine reasons without either having paused to agree what the word, in this application, is being asked to mean.

The present lexicon attempts the modest and doomed task of fixing these meanings before they shift again. It makes no claim to completeness; the field mints faster than any drudge can transcribe, and several entries below will be obsolete by a margin the author prefers not to estimate. It claims only that each definition was accurate at the hour it was written, and that the wit, where wit intrudes, has been kept on the near side of the facts — a definition that is funny and wrong being a failure of the trade. Where a word is used by the industry as a euphemism, the lexicographer has thought it his office to say so plainly, and then to give the euphemism its due. The entries follow in the order the alphabet imposes, that being the one ordering principle the subject has not yet found a way to disrupt.

Agentic

adj. Of a system: disposed not merely to answer but to act — to take a stated goal, break it into steps, call tools, and proceed without further instruction at each turn.

The word is an adjective doing the work of a promise. An agentic system is one that has been granted the standing to do things rather than merely to describe them, and the interesting question is never the grammar but the leash — what it has been permitted to touch, and what happens when the several-hour effort concludes, as it sometimes does, with a confident wrong answer delivered in the register of a junior colleague reporting success. The term has largely displaced the older autonomous, which had begun to alarm people. Agentic alarms them less, which is, on examination, its principal qualification for the post.

Alignment

n. The project of ensuring that an artificial system pursues the goals its makers intend rather than some other goal it has inferred, and behaves, while pursuing them, within bounds its makers would endorse.

A word that has had to carry a great deal. In its narrow technical sense it names a real and difficult research programme; in its broad public sense it has become the place where the entire ethical content of the field is asked to reside, so that a single noun is now expected to mean safe, honest, obedient, harmless, and agreeable to the speaker’s politics, all at once and without contradiction. The difficulty the word conceals is that these are not the same goal, and may on occasion be opposed. One notes, without drawing a conclusion, that the term presumes the existence of a fixed thing to be aligned to — and that the field has not, at the time of writing, produced one.

Benchmark

n. A fixed test, with a known answer key, against which competing models are scored and ranked; by extension, the score itself.

The benchmark is the field’s instrument of self-knowledge and, increasingly, of self-deception. So long as a test is a fair sample of the territory, performing well on it indicates performing well in the territory; the trouble begins the moment the test becomes a target, at which point a laboratory may improve its score without improving anything a user would notice. The practice of optimising for the test specifically — benchmaxxing, in the field’s own unlovely coinage — is universally deplored and widely undertaken. A benchmark, one is obliged to conclude, measures a model honestly exactly until it becomes worth gaming, and not one hour longer.

The Bitter Lesson

n. The observation, set down by Richard Sutton in an essay of 2019, that across the history of artificial-intelligence research the methods which ultimately prevailed were the general ones that scaled with computation, and not the ones into which researchers had carefully built their own knowledge of the problem.

It is called bitter because it is addressed to the researcher, and what it tells him is unwelcome: that his cleverness, his hand-crafted features, his domain expertise lovingly encoded — these will in the long run be overtaken by a simpler method given more computation and more data. The lesson is genuinely a lesson, and the field has taken it; the entire architecture of the present moment is its consequence. Whether it remains true indefinitely is a separate question, and one the field has a strong commercial interest in not examining too closely.

Context Window

n. The quantity of text — measured in tokens — that a model can attend to at one time, comprising the prompt, the documents supplied with it, and the response so far.

The context window is the model’s working memory, and like all working memory it is finite and it is the site of most disappointments. A user who has conversed at length with such a system, and formed the natural impression that it remembers the exchange, has mistaken the window for a mind; what falls off the far edge of the window is not forgotten but was never retained, the distinction being the whole of the matter. The figures have grown large enough — windows of a million tokens have appeared at the frontier — that the limit is easy to forget. It has not, however, ceased to exist, and it does its most damage to the people most certain it is no longer there.

Distillation

n. The training of a smaller model on the outputs of a larger one, so that the small model acquires much of the large model’s behaviour at a fraction of its cost to run.

An honest word for an honest procedure, and a rare thing in this glossary on that account. The large model is made to answer a great many questions; the small model is trained to produce the same answers; the small model ends up cheaper, faster, and very nearly as good for ordinary purposes. The procedure is the reason capability descends so rapidly from the expensive frontier to the cheap commodity tier, which is a benefit to everyone who pays for the cheap tier and a considerable irritation to whoever paid to train the expensive one — for distillation, regarded from the wrong end, is the orderly transfer of an asset from the party that funded it to the party that did not.

Guardrail

n. A constraint built into a deployed system to prevent it from producing certain classes of output — unsafe, unlawful, defamatory, or merely off-brand.

The metaphor is drawn from the roadside, and the choice repays a moment’s attention. A guardrail does not steer the vehicle and does not improve the driver; it is a passive barrier at the edge of the road, and it exists because the designers expect the vehicle, on some occasions, to leave the road. To call a safety measure a guardrail is therefore to concede, in the very act of naming it, that the underlying system is not itself reliable and that the barrier is the place where reliability has been relocated. The word is candid in a way its users may not intend.

Hallucination

n. The production, by a model, of confident output that is fluent, plausible, well-formed, and false.

The central word of the field’s everyday vocabulary, and the one most worth dwelling on. Hallucination is a metaphor, and a flattering one: it suggests an aberration, a fever, a departure from some baseline of truthful operation to which the system would otherwise return. The less consoling description is that the model is doing, in the case that produces falsehood, precisely what it does in the case that produces truth — generating the most plausible continuation of the text — and that the truthful output and the false output are the same act, distinguishable only by a fact-check the model did not perform and was not built to perform. The word frames the failure as a malfunction. It is closer to the truth to call it the mechanism, observed on an occasion when the plausible and the accurate happened to diverge.

Human-in-the-Loop

n. An arrangement in which an automated system’s actions are reviewed, approved, or corrected by a person before they take effect.

A phrase that means two opposite things depending on who is saying it. To the cautious it names a genuine safeguard: a competent person, with the time and the standing to refuse, examining the machine’s work before it does harm. To the less scrupulous it names an alibi — a person nominally in the loop, possessed of neither the time to review the output nor the authority to overrule it, whose function is not to catch the error but to be available, afterward, as the party who approved it. The phrase does not distinguish between these, and one should be wary of any account that leans on it without saying which is meant.

Moat

n. A durable competitive advantage that protects a business from being overtaken by rivals; in this field, the question of whether any laboratory possesses one.

The word arrived from the wider business vocabulary and has become, in this corner of it, an obsession bordering on a neurosis. The anxiety is specific and well-founded: the capabilities of these systems are converging, the methods are widely understood, talent is mobile, and an expensively trained frontier model can be partly reproduced — see distillation — by parties who did not pay for it. A laboratory that has spent a fortune to reach the frontier discovers competitors arriving shortly after at a fraction of the cost, and the search for the moat is the search for some reason this should not continue. That the question is asked so insistently is itself the most informative answer to it.

Model Collapse

n. The progressive degradation of a model’s quality when it is trained, across successive generations, predominantly on data produced by earlier models rather than on data of human origin.

A documented failure mode and, for the lexicographer, a satisfying one — a piece of the field’s own vocabulary that names an irony the field would rather not dwell on. Each generation trained mainly on its predecessors’ output loses a little of the variety, the rare cases, and the long tail of the original human distribution; the errors compound; the descendants converge on a blander and narrower thing than their ancestors were. The mechanism is not mysterious. It is what happens when a system is fed chiefly on itself, and it is a sufficient reason why the genuine human record remains, awkwardly, irreplaceable.

Open-Weight

n. & adj. Of a model: released so that the trained parameters themselves may be downloaded, run, and modified by anyone, rather than being accessible only through the maker’s hosted service.

A precise term, and one to be carefully distinguished from open-source, with which it is constantly and not always innocently confused. Software is open-source when one may read the recipe; a model is open-weight when one is handed the finished cake. The weights may be downloaded and run, but the training data, the training code, and the procedure that produced the weights frequently are not disclosed — so that one has the artefact entire and the means of its making not at all. The distinction matters, because open is a word that confers approval, and a release that has earned the lesser sense of it is generally content to be described by the greater.

Prompt

v. & n. As a verb, to supply a model with the text that elicits its response. As a noun, that text itself.

A modest word that has been asked to support a startling amount of apparatus. For a period there was talk of prompt engineering as though it were a discipline with a literature, and the man who could phrase a request well was spoken of as possessing a craft. Some of this was real and much of it was the ordinary human tendency to dignify a knack with a title. What the verb conceals is the asymmetry of the exchange: one prompts a person to mean one is supplying a small nudge to a mind that will do the rest, and the same verb applied to a machine quietly imports the same flattering picture — of a nudge, and a mind. It is, in the plain case, an instruction given to a mechanism, and the mechanism’s whole performance is downstream of how the instruction was phrased.

Reasoning

n. In current usage, a mode of operation in which a model generates a chain of intermediate steps before its final answer, and the practice of training models to do so.

The most contested word in the lexicon, and the one over which the most heat is generated with the least agreement on terms. That the technique works is not in dispute: a model made to produce intermediate steps does measurably better on problems that have steps, and the reasoning models so trained are a genuine advance. What is in dispute is the word. To one party the intermediate steps simply are reasoning, the term requiring no apology; to another they are a sequence of generated text that improves the answer by a mechanism that need not resemble reasoning at all, and the word smuggles in a conclusion the evidence does not compel. The lexicographer’s duty here is only to record that the word is doing argumentative work, and that anyone who uses it as though its meaning were settled has joined one of the two parties without announcing it.

Slop

n. Machine-generated content — text, images, video — produced in bulk, of low quality, and distributed without regard to whether anyone wished to receive it.

A rare gift to the lexicographer: a word coined by the public, against the industry, and admirably fit for its purpose. Slop names the substance that fills a comment section, pads a search result, and accumulates in the feed — content that is not deceptive exactly, merely indifferent, generated because generation is now nearly free and posted because posting is free as well. The word’s force lies in its honesty about volume and motive both. It is the term the language reached for when it needed to describe the smell of the new abundance, and the speed with which it was adopted suggests the public had been waiting for it.

Superintelligence

n. A hypothesised system whose general cognitive capability substantially exceeds that of the ablest human across essentially every domain.

The horizon term of the whole vocabulary, and the one that does the most work while resting on the least. Whether such a thing is possible, whether the present methods tend toward it, whether it would arrive gradually or at once — none of this is established, and the word names a possibility rather than an observed object. Its function in discourse is to set the stakes: invoked by the hopeful to justify the investment and by the fearful to justify the alarm, it serves both parties precisely because it is undefined enough to be filled with whatever a given argument requires. A lexicographer can do little with such a word but mark it clearly as a hypothesis, and note that a great deal of confident speech is built upon it.

Vibe Coding

n. The practice of producing software by describing the desired result to a model in natural language and accepting its generated code with little or no review of the code itself.

A phrase of recent and informal origin, and useful precisely because it is honest about its own method. To vibe code is to attend to whether the program appears to work and to decline to attend to how — to treat the code as one treats the inside of an appliance, as a region one is content not to enter so long as the thing performs. For small and disposable programs this is a defensible economy of attention. For programs that must be maintained, audited, or trusted, it relocates an old and well-understood category of risk into a place where no one is looking at it, and the cheerfulness of the phrase should not be mistaken for a verdict on the wisdom of the practice.

Wrapper

n. A product consisting chiefly of an interface and some modest arrangement built around a model the maker did not train; frequently qualified, by detractors, as thin.

The word is almost always an accusation. To call a company a thin wrapper is to allege that it has contributed an interface and a logo to a capability someone else produced and someone else could remove, and that it therefore has no durable claim on its own customers — the charge being, in the vocabulary of an earlier entry, an absence of moat. The difficulty is that the line between a wrapper and a product is not fixed. Much valuable software is, on a sufficiently unkind description, a wrapper around something; the question is whether the arrangement built around the model is itself worth paying for. The word thin is doing the real work, and it is supplied, almost always, by a party with an interest in the answer.

A Closing Word

The work is done, and the lexicographer is under no illusion about the condition of it. Several of the entries above were accurate when written and will not survive the year; one or two may not survive the season. The vocabulary will go on being minted faster than it can be set down, the euphemisms will go on being preferred to the plain words they replace, and the public will go on speaking the whole of it fluently and defining none of it. This was foreseen in the Preface and is repeated here only so that no reader feels he has caught the author in an oversight.

If the lexicon has a use, it is the narrow one all such works have: not to arrest the language — that cannot be done, and the man who claims to have done it has only mistaken a photograph for the thing photographed — but to leave a clear record of what the words meant at one particular hour, so that a later reader, encountering them shifted, has something to measure the shift against. The lexicographer, having performed that office to the best of a harmless drudge’s ability, lays down the pen, and notes, without surprise, that two of the headwords have already moved.