iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🤖

Thoughts on "Good Sample Code"

に公開2

Recently, while reading technical books, I've often encountered "code designed for the writer's convenience, reverse-engineered from the final product."
This often clashes with my own learning process, and since just complaining isn't productive, I've decided to summarize what I consider to be "good sample code."

A disclaimer: even I can't satisfy all these requirements at once. Furthermore, when writing for myself, I often cut corners. Even when investing effort, there is always a tradeoff between labor and feasibility once the content reaches a certain length.

My Idea of Good Sample Code

  • Start small
  • Verifiable at each step via some form of dynamic or static testing. TDD is preferred.
  • A limit of around 20 lines for each code addition.
  • The module works if you copy and paste snippets from top to bottom.

Minimal Start

This is a rough sample code assuming some kind of compiler, highlighting both good and bad points.

// BAD
// Defining the entire argument object first, reverse-engineered from the final form
// Mostly unused at the initial stage
type Options = {
  output: string;
  input: string;
  debug: boolean;
  useXXX: boolean;
  useYYY: boolean;
  //...
}

// BAD: Making the reader write all node definitions from the start.
// It's just reverse-engineered from the final version; hard to grasp initially
type AstNode = {
  type: 'program',
  statements: AstNode[]
} //...

// GOOD: Minimal code exists. It will fail once test code is added
function parse(input: string): AstNode {
  return {
    type: 'program'
  };
}

// GOOD: Shows it's unimplemented via an exception, but the type signature is clear. Type checking passes formally.
// BAD: Unnecessary at the first stage
function compile(node: AstNode): string {
  throw new Error("wip")
}

// Good
// Explaining usage at the entry point
function run(options: Options) {
  const ast = parse(options.input)
  const code = compile(ast);
  return code;
}
run();

I wrote both good and bad points, but I understand why one would write it this way.

If you present the code as a diff, where the reader has to fix or delete what was previously written, they end up having to apply a "transaction script," making it difficult to maintain the mental state. That is why authors want to present the final form from the start, but in the initial state, it's mostly dead code.

While presenting an interface reverse-engineered from the final product might be kind to the writer, it is hard on the reader. They are suddenly confronted with a massive concept, only to move forward without even using it.

Minimal Code

So, how should you write it?

Here is an example of how I would handle the first step. Assuming that the conceptual explanations of parse, compile, and AST are handled separately, I will first declare that we are starting with parse.

type AstNode = {
  type: string;
}
function parse(input: string): AstNode {
  return {
    type: 'program'
  }
}
console.log(parse('aaa'));
  • The only character involved initially is parse(input: string): AstNode.
  • Since even the concept of a test runner can be noise at the very beginning, we start with just print debugging.

Depending on the language, it is desirable for static analysis to pass at this stage.

Next, we create a state where the test passes for this.

// sample.ts

type AstNode = {
  type: 'program'
}
function parse(input: string): AstNode {
  return {
    type: 'program'
  }
}

// This time, we will write the implementation directly in the code
import {test} from "node:test";
import assert from "node:assert";
test("parse ''", () => {
  assert.deepEqual(parse(''), {
    type: 'program'
  })
});

Let's run this.

$ node --test --experimental-strip-types --experimental-transform-types --no-warnings=ExperimentalWarning sample.ts
✔ parse '' (1.019053ms)
ℹ tests 1
ℹ suites 0
ℹ pass 1
ℹ fail 0
ℹ cancelled 0
ℹ skipped 0
ℹ todo 0
ℹ duration_ms 79.61329

From here, repeat the cycle of adding tests and modifying the implementation to make them pass.

Strictly speaking, this first step isn't TDD, because we are writing a test intended to pass immediately. However, if you need to explain the concept of a test runner, I think it is safer to have it pass initially to provide the reader with a sense of accomplishment.

Tradeoffs

I've already made several tradeoffs here. To run tests written in TS, I'm requiring experimental APIs of Node.js 22.x.

I had the following options:

  • node --experimental-strip-types:
    • Pros: Built-in feature, no additional installation required. In the future, it should work just by removing --experimental (planned).
    • Cons: Requires the latest version of Node. There's a high possibility the API will change in the future.
  • vitest:
    • Pros: Likely what would be chosen for a general project today, making the knowledge highly transferable.
    • Cons: Requires concepts of package managers and how to add them. It takes up space in the text.
  • deno test:
    • Pros: All-in-one, so I could have explained this case with just Deno.test(...); and deno test.
    • Cons: Requires explaining the differences between Deno and Node, and how to install Deno.

What to choose here honestly depends on the concept at the time of writing. It's a matter of what you decide to spend time explaining beforehand.

Adding Code (TDD)

I'll skip the explanation of how to build the compiler itself this time, but let's say we are implementing parse -> compile.

// sample.ts

// ...
function compile(node: AstNode) {
  throw new Error('WIP');
}

//...
test("compile x:int = 1 to x=1", () => {
  const parsed = parse('x:int = 1');
  const compiled = compile(parsed);
  assert.equal(compiled, 'x=1')
});
  • Add a test and explain its specification.
  • Prepare the test interface. The implementation is not yet written.

This part effectively becomes an implicitly added "transaction script." Unless the previous section was understood at a high level, it becomes difficult to keep up.

Have you ever had the experience of following a book, getting stuck because the code wouldn't run, skipping that part, and then skipping to the next section only to find it still doesn't work? That's what external repositories are for. You should provide the final code and the command to run the tests there.

ch0/
  01-sample.ts
  02-sample.ts
Makefile
$ make test ...

Ideally, copying and pasting from there should restore a working state even if the local code has become a mess.

Setting aside whether it's "good code," from a learner's perspective, it's easier to manage when everything is packed into a single file. However, language or library constraints sometimes make that impossible.

Balancing good code and educational code is difficult. Conversely, one could say that educational code should probably not be used in production.

Tradeoffs in Presenting Code Snippets

In reality, while I would like to show as much of the full code as possible with every change, it can be physically impossible on printed pages. This is another tradeoff.

Including too much code every time can also become noise. Possible presentation methods include changing the background color during editing or using a diff format (though the latter means you can no longer simply copy and paste).

If you design the code so that it can be built from small fragments without needing rewrites, the "copy-paste and it works" experience for the learner is excellent, but the design itself becomes distorted as a result.

While I've cheated here by using "In-Source Testing"—writing both tests and implementation in a single file—if you were to split it into multiple files, you would need to organize the mutual import paths. For the writer, this is trivial thanks to auto-completion, but when a reader is manually transcribing code, they cannot easily follow that process, making it another high-load task.

I believe the limit for code that can be added in a single step without passing static or dynamic tests is about 20 lines. Assume that anything beyond that exceeds the reader's cognitive load. When the cognitive load is exceeded, learners stop thinking and start copy-pasting without understanding, which diminishes the learning effect.

Conclusion

Since everyone's understanding differs, this isn't necessarily always the correct approach, but I'm sharing this as one example.

Discussion

objectxobjectx

トレードオフ

Node.js 22.x の experimental なAPIを要求して いまいます

『しまいます』ですか?

最小コード

まずは parse 入る ということを宣言します。

『parse が入る』でしょうか?

eihigheihigh

ご存知かも知れませんが、このあたり非常にうまくやっているのが "Writing An Interpreter In Go" という本で、実際に「そのステップの必要十分なテスト→テスト不合格を確認→実装→テスト合格を確認」という順で常に進んでいきます。ぜひご参考にしてください。
https://interpreterbook.com/