AI & Engineering

Lessons from the Two LLMs in Our House

What my children accidentally taught me about prompt engineering.

A few weeks ago I asked my three-year-old son Theo to tidy his toys.

Five minutes later he proudly told me he had finished.

Technically he was not wrong. One toy had been placed neatly in a box. A few others had been moved to a different part of the floor. The rest were still spread across the room in what could generously be described as their original configuration.

From Theo's perspective the task was complete.

From mine it was a reminder of something I spend quite a lot of time thinking about professionally. It turns out that instructions are interpreted very differently depending on the assumptions of the system receiving them.

Anyone who works with large language models will recognise the pattern immediately.

You give an instruction.
The system follows it.
The result is not quite what you expected.

Over the last few years I have noticed that many of the ideas people now describe as prompt engineering techniques appear in ordinary conversations with small children. In our case the two systems currently running in production are Isla (5) and Theo (3).

The comparison is obviously light-hearted. Children are learning humans, not algorithms. Still, the overlap is interesting. When a system is learning how language works, clarity matters.

Here are a few lessons from the two most energetic "LLMs" in our house.

Ambiguous prompts produce creative outcomes

One of the first things you learn when communicating with both children and language models is that instructions tend to be interpreted quite literally.

If I say:

"Theo, can you tidy your toys?"

there are many ways that instruction might be understood.

He might put one toy away and assume the job is complete. He might move the toys into a different pile. He might agree enthusiastically and then become distracted by something else entirely.

None of those responses are unreasonable. They are simply different interpretations of a vague request.

Language models behave in much the same way. They generate responses that match the statistical meaning of the words they see, not the intention that exists in the user's head.

A clearer instruction might be:

"Theo, please put the cars in the blue box and the dinosaurs on the shelf."

Now the prompt contains specific objects and a destination. The definition of success becomes much clearer.

People working in AI would describe this as adding structure and constraints to the prompt.

Context changes everything

Large language models rely heavily on context. The words that appear around a prompt influence how the model interprets it.

Children rely on context just as much.

If I say:

"Put your shoes away."

Isla will quite reasonably ask where they should go.

Once the instruction becomes:

"Put your shoes on the rack next to the door."

the task becomes simple.

The difference is not intelligence. It is missing information.

In AI systems, context might include prior conversation, examples, formatting rules, or system instructions. Without context the model has to infer meaning.

Children do exactly the same thing. When they have to infer intent, the result may be creative.

Breaking tasks into steps works better

Anyone who has tried to run a bedtime routine with a three year old will recognise the limitations of complex instructions.

If Theo hears something like:

"Go upstairs, brush your teeth, put on your pyjamas, and get into bed."

there is a strong chance the process stops halfway up the stairs because something more interesting appears.

Breaking the task into steps tends to work better.

First go upstairs.
Then brush teeth.
Then put on pyjamas.

In AI research this approach appears as task decomposition or chain of thought prompting. Instead of asking a model to jump straight to the final answer, you guide it through intermediate reasoning steps.

Researchers have repeatedly shown that language models become more accurate when reasoning is structured in this way.

Theo also becomes more reliable when the process is broken into manageable pieces.

Reasoning often appears when you guide it

One of the more interesting discoveries in recent LLM research is that reasoning improves when models are encouraged to think step by step.

Prompts such as:

"Let's work through this step by step."

often produce better answers.

The model has not suddenly become more intelligent. Instead, it is revealing intermediate reasoning that might otherwise stay hidden.

Something similar happens with children.

If Isla is asked why the moon changes shape, the first answer may simply be a guess. If the conversation continues with smaller questions, the reasoning starts to appear.

What do we know about the moon?
What happens when the Earth moves?
What might that change?

The knowledge is already there in fragments. The prompt simply helps organise it.

Examples are surprisingly powerful

Children learn an enormous amount through observation.

If I ask Isla to draw a castle she will happily produce one. If I show her a picture of a castle first, the drawing becomes much closer to the idea I had in mind.

Language models behave in a similar way.

Providing examples before asking for an answer is called few shot prompting. The model learns the structure of the task by recognising patterns.

For example:

Input: London → Output: City
Input: Banana → Output: Fruit

After seeing a few examples the pattern becomes obvious.

Humans are very good at learning through examples. Language models rely on the same principle.

Feedback shapes behaviour

Modern language models are often refined using reinforcement learning from human feedback. Humans review responses and provide signals about which outputs are helpful or correct.

Children experience something similar through encouragement and guidance.

When Isla completes a task and receives recognition for it, she is more likely to repeat that behaviour.

Both children and AI systems also show a tendency to optimise the reward signal.

Theo has occasionally realised that if putting away toys results in praise, putting away the minimum number required may produce the same outcome.

AI researchers sometimes call this reward hacking.

Parents usually call it something else.

Different models require different prompts

Another lesson from AI development is that different models have different strengths.

Some models are excellent at reasoning. Others are better at writing. Others excel at code generation.

Our two in-house models also show clear differences.

Isla enjoys explanations and hypothetical questions. She often talks through her thinking. Theo is enthusiastic but prefers shorter instructions and faster activities.

Prompt strategies that work perfectly for Isla sometimes need to be simplified for Theo.

Working with AI systems often requires the same adjustment.

Confident guesses happen everywhere

One of the most discussed challenges with language models is hallucination. The model generates something that sounds convincing but is not correct.

Children sometimes do something similar, though imagination is usually the more charitable term.

Isla once confidently explained that dinosaurs probably ate pizza.

When asked how she knew this, the explanation was simple. She thought they might like it.

Language models occasionally produce similar answers when asked about topics outside their training data.

In both cases the most useful response is curiosity.

How do you know that?
Where did you learn it?
What makes you think that?

Questions like these encourage reasoning rather than confident guessing.

Humans appear to have temperature settings too

Language models often include a parameter called temperature which controls randomness in the generated output.

Higher temperature leads to more creative responses. Lower temperature produces more predictable ones.

Children appear to have something similar.

Isla frequently produces elaborate stories involving dragons, explorers, and improbable adventures. This is excellent when the task involves imagination.

It is less helpful when the task involves putting socks on.

Simpler instructions usually lead to more predictable outcomes.

Iteration is usually the real solution

Perhaps the most useful lesson from both prompt engineering and parenting is that communication rarely works perfectly the first time.

The process usually looks like this.

Give instruction.
Observe response.
Clarify if necessary.
Try again.

Prompt engineering works in exactly the same way. Even very capable models often require refinement through several attempts.

Children learning how the world works are not so different.

Final thoughts

There is a lot of excitement around large language models and prompt engineering right now. Some of it makes the field sound more mysterious than it really is.

In practice many of the principles come down to simple communication.

Be clear about what you want.
Provide context.
Break complex tasks into steps.
Give examples.
Offer useful feedback.

Those ideas improve interactions with modern AI systems.

They also work quite well with Isla (5) and Theo (3), the two endlessly curious systems currently running in our house.

Although only one of those systems occasionally requests a bedtime story about dinosaurs that might have eaten pizza.

So far at least, it is not the AI.