In modern software engineering, data validation is often treated as an implementation detail. It’s something we do at the edges of a system: sanitize user inputs, check constraints, and move on. However, when designing robust systems grounded in solid object-oriented principles, validation deserves deeper consideration. A particularly important lens for this discussion is the Liskov Substitution Principle (LSP), one of the five SOLID principles.
The Liskov Substitution Principle states:
Objects of a superclass should be replaceable with objects of a subclass without breaking the correctness of the program.
In simpler terms, if you have a base type T
, any subtype S
should be usable anywhere a T
is expected without surprising behavior.
This principle is frequently discussed in terms of class hierarchies and inheritance, but it also provides powerful insights into how we think about validated data models.
The Problem with Implicit Validation
Most developers perform validation like this:
Here, validation is intertwined with business logic. The function checks inputs every time it is called. If the name
or age
is invalid, an exception is raised on the spot.
This approach seems fine at first, but it becomes messy:
- Every function that manipulates
name
orage
must repeat validation or trust that upstream validation occurred. - There’s no way to distinguish between validated and unvalidated data in the type system.
- Incorrect data can leak through and break invariants in downstream code.
This design violates the spirit of LSP because we cannot substitute one “kind” of data for another safely. For example, raw input strings are being used interchangeably with guaranteed valid user names. Functions that expect valid names can’t assume their preconditions hold, forcing defensive programming everywhere.
Relating This To The Liskov Substitution Principle
LSP is fundamentally about preserving contracts. If a function accepts a User
object, it should be able to rely on the guarantees that come with that type. If a subtype or alternative representation of User
can sneak in without satisfying those guarantees, you have a substitution problem.
Consider a function:
If user
can be any dict, the function is unsafe. Someone might call:
In an LSP-compliant design, we want a guarantee: if something is called a ValidatedUser
, it must meet all preconditions (like non-empty name). This is exactly where modeling validated data as a separate type becomes valuable.
Modeling Validated Data As a Separate Type
Instead of treating validation as ad hoc logic scattered throughout the codebase, we can enforce it at the type level.
Now, any function that accepts a ValidatedUser
can safely assume the data is valid. Validation occurs once at construction, and the type system encodes that guarantee.
This design is LSP-friendly because any valid subtype or refinement of ValidatedUser
is guaranteed to meet the same constraints, and therefore any function consuming ValidatedUser
will behave correctly.
Why This Preserves LSP
Let’s break this down explicitly:
- Base contract: A
ValidatedUser
has a non-empty name and non-negative age. - Subtype compatibility: Any refinement, like
PremiumUser
, must still meet those constraints. - Substitution safety: A
PremiumUser
can be used anywhere aValidatedUser
is required, without extra validation.
Contrast that with a dictionary-based approach. If you use raw dicts, there is no compile-time or construction-time guarantee. Any code expecting a “valid user” could receive invalid data, breaking assumptions and violating LSP.
Going Beyond Basic Examples
This concept applies to any domain where invariants matter. For example, suppose we need to model bank accounts:
Once again, this makes invalid states unrepresentable. Any function accepting BankAccount
can safely assume both fields are valid without rechecking. This avoids the risk of writing functions like:
With strong types, you focus only on domain logic, not repetitive validation.
Advanced Technique: Using Smart Constructors
Sometimes you don’t want to expose direct constructors that raise exceptions. Instead, you can use smart constructors that return a safe type or an error result:
This approach works well in languages with richer type systems (like Rust or Haskell), where you can encode success/failure using Result
or Either
types. In Python, you can simulate it using exceptions or unions.
The key idea remains: validation is centralized and validated types are distinct from raw types.
Conclusion
Data validation is not just an input hygiene task—it’s fundamentally about preserving invariants in your system. When you scatter validation logic, you risk inconsistency, duplication, and violations of the Liskov Substitution Principle. Any code that expects “valid data” might unexpectedly receive “raw data,” breaking assumptions and introducing bugs.
By modeling validated data as a separate type, you make invalid states unrepresentable. This design:
- Moves validation to a single point of responsibility.
- Ensures that functions can trust their inputs.
- Eliminates defensive programming boilerplate.
- Enforces LSP compliance by making type guarantees explicit.
Think of validated types as contracts made tangible in code. Instead of just hoping everyone remembers the rules, you make it impossible to break the rules without the compiler (or runtime constructor) complaining. Whether you’re building web forms, APIs, or complex domain models, this approach leads to cleaner, safer, and more maintainable software.
In short: Treat validation as part of your type system, not just part of your logic. By separating validated data from raw input, you build systems where the Liskov Substitution Principle isn’t just a theory—it’s a guarantee.