How we make our puzzles

The Technology Behind Our Puzzles

When you sit down with a puzzle, you’re usually focused on the challenge: spotting patterns, trying strategies, and getting that little “aha!” moment when something clicks. What you don’t see is the machinery running in the background to create those puzzles in the first place.

This is a peek behind the covers to tell you all about how we make our puzzles, the challenges we face, and how long it takes to make the huge number of puzzles we need every year for PuzzleMadness. We dive a little in to the technical details, but not too much!

Apart from Picross (which are hand-crafted by our lovely users), all of our puzzles are generated by software we’ve written ourselves. We are very fussy about what makes a puzzle fun. To make sure they play the way we want them to, we decided early on that we needed complete control over the generation process.

Why We Built Our Own Puzzle Generator

We didn’t want to rely on existing puzzle generators (there are plenty out there). While many of them will happily spit out playable puzzles, they don’t always produce puzzles that are enjoyable. A lot of the time they just brute-force a solution into place. That technically works, but it doesn’t give you any guarantee that the solving path feels good or that the difficulty matches what you’re aiming for.

By writing everything ourselves, we have complete control over our puzzles. It means we can control how a puzzle flows from start to finish, we can control exactly which solving techniques you will need for each of the difficulty levels, and we can avoid relying on those solving techniques that just makes a puzzle feel tedious.

It was also very important to us that our difficulty levels felt consistent. If you go to the Medium Sudoku puzzle for today, you know you will get a Medium difficulty Sudoku. We are puzzlers ourselves, so we know how frustrating it is when a “Medium” puzzle is sometimes too easy or too hard.

C++ for the win!

When we first started building PuzzleMadness, we were working in game development on the Nintendo DS and Wii. C++ was what we used every day, it was the language we knew best, so it was the natural choice.

It’s also a good fit. Puzzle generation is heavy on data structures and algorithms — lots of grids, some graph problems — and C++ gives us the performance and control we need. Strong typing also helps when you’re working with complex algorithmic code.

If we were starting now, we’d probably consider Go (which we use in other projects) or Rust (which looks interesting, though we haven’t touched it yet). But with all the experience we have in C++ and the amount of reusable code we’ve built up, sticking with it makes the most sense for us.

At the time of writing, our C++ project weighs in at around 120,000 lines of code. The resulting executable is deliberately CPU-intensive — it’s multithreaded and tuned to use about 75% of total CPU capacity. Despite that, it’s surprisingly light on memory, using only about 1 MB of RAM.

Why so little? Well, puzzles like Sudoku simply don’t require much storage. Everything you need to represent a standard 9x9 Sudoku (grid, candidates, and state) fits comfortably into 500 bytes.

Carrying Over the Game Dev Mindset

Working on the Nintendo DS left a mark on how we think about performance. That system had 4MB of RAM and no floating-point multiplication support — everything had to be done in fixed-point, which was rare even 20 years ago. If you didn’t write careful, efficient code, you were stuck.

That mindset carried over into PuzzleMadness. Even though today’s hardware is far more powerful, we still:

Choose algorithms carefully. Algorithm choice is by far the most important factor when it comes to the length of required time to generate a puzzle. C++ is one of the most high-performance languages available, but that counts for nothing if you are using the wrong algorithm.
Avoiding memory allocations. Memory allocations are fast — but no allocation at all is even faster. By deciding up front on the largest puzzle size we’ll support, we can statically size all of our state arrays and workspaces. This approach not only eliminates allocation overhead, it also makes the code simpler. We back it up with plenty of assertions to ensure the arrays are always large enough.
Selective Use of STL Containers: We only reach for STL containers when they genuinely make the code easier to read or maintain. For performance-critical sections, we stick with statically sized arrays and workspaces, which ties back to our goal of minimizing memory allocations. That said, we’re not dogmatic about it — there are plenty of places where dynamic allocation and STL containers make sense, and we use them when they’re the right tool for the job.

It pays off. We need to generate around 150,000 puzzles every year, and we can do that on consumer hardware in just two to three days.

Making Solvers Think Like People

One of our strongest beliefs is that puzzle solvers should behave the way people do, not the way a computer would. A lot of solvers rely on brute-force backtracking. It’s fast, it’s easy to code, but it tells you nothing about how difficult the puzzle feels to play.

We write our solvers to follow human techniques. In Sudoku, for example, once you’ve learned the basics like Single Candidate and Single Position, the next step is usually “Naked Pairs.” A computer could skip all that and just brute-force the answer, but then you’d have no idea what the actual solving path looks like. We think that’s important, so our algorithmic solvers follow the same steps a human would.

The only time we allow backtracking is right near the end of a puzzle, where a person might simply see the answer without working it out step by step. That’s intuition, and you can’t code intuition. So in that one narrow situation, we let the solver use backtracking.

Not All Puzzles Are Created Equally

One thing we’ve learned over the years is that different types of puzzles vary hugely in how long they take to generate. On the surface they may all look like “grids with rules,” but under the hood the algorithmic complexity can be very different.

Sudoku is lightning fast. On average it only takes about 1.7 milliseconds to generate a valid, human-solvable medium-difficulty puzzle. That means we can churn through a thousand puzzles in the time it takes you to shout “I love Sudoku!”
Most of our puzzles fall into the 10–200 millisecond range. That’s still quick in computer terms, but when you’re generating hundreds of thousands of puzzles a year, that difference adds up fast.
LITS is in a league of its own. A single puzzle takes about 20 seconds to generate, and about 99.8% of candidate puzzles are discarded because they don’t meet our criteria for solvability, uniqueness, or fun. In practice, that means hundreds of generated puzzles vanish behind the scenes for every one that makes it to players.

This is where our obsession with efficiency really pays off. Generating a valid solution is usually easy; generating a puzzle that’s fair and fun to solve, is what takes time. The stricter your quality bar, the more puzzles you have to throw away along the way.

Massive puzzle generation capabilities

One of the biggest advantages of our custom-built puzzle engine is sheer scale. We can generate an enormous number of puzzles in a very short time. And that power lets us do things that would be almost impossible otherwise.

A good example is our Sudoku Academy. If you’re new to Sudoku or want to level up and take on tougher challenges, the Academy is where you’ll find step-by-step guides for every major solving technique. Each technique has its own page, complete with an explanation and 10 carefully chosen practice puzzles designed to help you master it.

But here’s the catch: those 10 puzzles don’t come easy.

Take the Empty Rectangle technique. To make sure the practice puzzles only require that technique (plus the beginner basics of Single Candidate and Single Position), we can’t just throw in any old Sudoku. Most generated puzzles get rejected because they either require other techniques or don’t highlight the concept clearly enough.

So how many do we throw away? For Empty Rectangle alone, we had to generate over 300,000 puzzles just to find 10 that hit our standards.

That’s the power of our tech: we’re not just generating puzzles, we’re generating them at a scale that lets us be incredibly selective. The result is clean, focused practice puzzles that feel like they were handcrafted for you - even though they were carved out of hundreds of thousands of candidates.

Quality Control

We’re big believers in automated testing. In practice, that means we don’t just write puzzle generation code, we also write extra code to test that our algorithms behave exactly as they should.

For example, when we implement a new solving technique, we’ll handcraft a puzzle where that single technique must be applied. We then feed that puzzle into the function responsible for the technique and check that it gets used correctly. Just as importantly, we also test the opposite: making sure a technique is not applied in situations where it doesn’t belong.

This ensures our solvers are applying the right techniques at the right time.

But automated testing only goes so far. It can’t tell us whether a puzzle is satisfying to solve or aesthetically pleasing. For that, there’s no substitute for manual testing. We create and play-test hundreds of puzzles, across multiple sizes and difficulties, to make sure the end result feels exactly the way we want.

Discovering Problems

Even with all of this in place, issues still crop up from time to time, it’s the nature of writing complex algorithms.

If you’ve ever looked at the network traffic when completing a puzzle on PuzzleMadness, you may have noticed that the solution is included in the data sent back to our servers. There are two reasons for this.

Preventing cheating. In the past, some people tried to cheat by making direct API calls to mark puzzles as complete. By requiring the full solution to be submitted, that loophole is closed.
Catching errors. By comparing the submitted solution with the one we generated, we can detect when something has gone wrong. If the two don’t match, it usually means either the puzzle has multiple valid solutions or the generated solution doesn’t actually follow the rules.

It’s rare, but it does happen. The last time we ran into this was with Shingoki, a loop-based puzzle. Our solver occasionally produced answers where the loop wasn’t closed, leaving the lines open. Out of more than 30,000 Shingoki puzzles, only 15-20 were affected, but we wouldn’t have spotted it without automatically checking player solutions against our own.

That kind of automated feedback loop is essential. It lets us catch subtle issues that no amount of manual play-testing could reliably uncover.

Grids vs Graphs

At PuzzleMadness, most of our puzzles fall into one of two broad categories: grid-based puzzles and graph-based puzzles.

Grid-Based Puzzles

The majority of our puzzles are 2D grid-based. Sudoku is the most obvious example, but even puzzles like Network, Slitherlink, and Masyu (which look like they’re about tracing a path) are still fundamentally built on a grid.

Conceptually, grid puzzles are relatively straightforward to reason about. As a solver (human or computer), there are clear, well-defined techniques to apply. For developers, the task becomes codifying those techniques into code. The real challenges usually lie in making sure puzzles are:

Unique (only one valid solution)
Appropriately difficult
Fun to solve

From a technical perspective, grid-based puzzles are approachable - a great entry point if you’re learning how to write puzzle solvers.

Graph-Based Puzzles

A smaller number of our puzzles are graph-based. In computer science terms, a graph is a set of nodes (points) connected by edges (lines). A familiar analogy is a map: cities are nodes, and the roads between them are edges. Graph algorithms are how we calculate things like the shortest path between two cities.

An example of a graph-based puzzle is BallSort. You start with a set of test tubes, each containing balls of different colors. The goal is to rearrange them so that each tube holds only one color, but there are restrictions on which balls you can move and where they can go.

In graph terms:

The initial state of the puzzle is a node.
Moving one ball from one tube to another is an edge leading to a new node.
The complete graph is all possible states (nodes) and the valid moves (edges) between them.

The solver’s job (whether human or computer) is to navigate across this graph to reach the solved state.

Challenges with Graph Puzzles

Graph-based puzzles introduce different challenges:

Uncertainty of moves: Unlike in Sudoku, you can’t always say if a given move is “right.” It may bring you closer to the solution, or trap you in a dead end.
State explosion: The number of possible states is enormous, far too large to store or search exhaustively.

To handle this, we use heuristics — functions that score each state and prioritize the most promising ones when exploring solve paths. We also impose limits on the number of states we’ll explore. If a puzzle requires too much brute force to solve, we consider it not enjoyable and discard it.

Advent of Code

We’re big fans of Advent of Code here at PuzzleMadness. If you haven’t come across it before, it’s essentially an advent calendar for programmers: every day in December, a new puzzle is released that can be solved in any programming language.

The first challenge drops on December 1st, and new ones appear daily right up until Christmas. The puzzles start out fairly easy but ramp up in difficulty as the month goes on. At their heart, they’re all data structures and algorithms problems, which makes them perfect if you enjoy problem-solving and flexing your programming skills.

The medium and hard challenges are the ones that feel closest to the kind of work we do when generating and solving puzzles. In fact, one of the puzzles from a few years back was exactly the Towers puzzle, just reimagined in a whimsical “elves in the forest” storyline.

Storing Puzzle Data

Of course, generating puzzles is only half the story, we also need to store them somewhere.

We organize puzzles by type, difficulty, and month. Each file contains exactly 31 puzzles, no matter how many days are in the month. That way, we always have a full set ready to go without worrying about how many days are in each month.

Fun fact: back in the early days of PuzzleMadness, some sharp players figured out they could access “extra” puzzles by manually tweaking the URL - for example, jumping to February 31st. At the time, we only published one puzzle per day. A few dedicated puzzlers who had already solved everything used this trick to grab bonus puzzles, rack up more points, and vault themselves into the top spots on the all-time leaderboard.

Behind the scenes, we store everything in a custom binary format. That decision was deliberate:

Parsing binary is extremely fast and efficient.
It lets us include a table of contents at the start of each file, so we can quickly seek and load just one puzzle without reading the entire dataset.
JSON might be more common on the web, but extracting one puzzle from a large JSON file would require parsing the whole thing - far less efficient.

Today, all our puzzle data files combined take up just over 1 GB on disk - pretty compact, considering the sheer volume of puzzles PuzzleMadness serves up every year.

Lessons and Next Steps

Looking back, the combination of in-house tools, C++, and the game dev mindset has worked really well for us. It’s given us:

A fast, efficient generator
A codebase we know inside out
Puzzles that feel fun and fair to solve

Looking ahead, we’re curious about experimenting with other languages like Go or Rust. Whatever direction we take, the focus will stay the same: puzzles should always feel like they’re made for humans, not machines.

Wrapping Up

Behind every PuzzleMadness puzzle, there’s a lot of engineering you don’t see: algorithms tuned for fun, code shaped by old game dev constraints, and solvers designed to think like real people.

That’s what makes our puzzles work the way they do. And it’s why we’re still excited to keep building more.

Thanks for reading!