How Data Validation Relates To The Liskov Substitution Principle — How To Model Validated Data As A Separate Type

In modern software engineering, data validation is often treated as an implementation detail. It’s something we do at the edges of a system: sanitize user inputs, check constraints, and move on. However, when designing robust systems grounded in solid object-oriented principles, validation deserves deeper consideration. A particularly important lens for this discussion is the Liskov Substitution Principle (LSP), one of the five SOLID principles.

The Liskov Substitution Principle states:

Objects of a superclass should be replaceable with objects of a subclass without breaking the correctness of the program.

In simpler terms, if you have a base type T, any subtype S should be usable anywhere a T is expected without surprising behavior.

This principle is frequently discussed in terms of class hierarchies and inheritance, but it also provides powerful insights into how we think about validated data models.

The Problem with Implicit Validation

Most developers perform validation like this:

def create_user(name: str, age: int):

if len(name) == 0:

raise ValueError(“Name cannot be empty”)

if age < 0:

raise ValueError(“Age cannot be negative”)

return {“name”: name, “age”: age}

Here, validation is intertwined with business logic. The function checks inputs every time it is called. If the name or age is invalid, an exception is raised on the spot.

This approach seems fine at first, but it becomes messy:

Every function that manipulates name or age must repeat validation or trust that upstream validation occurred.
There’s no way to distinguish between validated and unvalidated data in the type system.
Incorrect data can leak through and break invariants in downstream code.

This design violates the spirit of LSP because we cannot substitute one “kind” of data for another safely. For example, raw input strings are being used interchangeably with guaranteed valid user names. Functions that expect valid names can’t assume their preconditions hold, forcing defensive programming everywhere.

Relating This To The Liskov Substitution Principle

LSP is fundamentally about preserving contracts. If a function accepts a User object, it should be able to rely on the guarantees that come with that type. If a subtype or alternative representation of User can sneak in without satisfying those guarantees, you have a substitution problem.

Consider a function:

def send_welcome_email(user: dict):

# Assume user[“name”] is non-empty

print(f”Sending email to {user[‘name’]}“)

If user can be any dict, the function is unsafe. Someone might call:

send_welcome_email({“name”: “”}) # Invalid name, but type checker won’t help

In an LSP-compliant design, we want a guarantee: if something is called a ValidatedUser, it must meet all preconditions (like non-empty name). This is exactly where modeling validated data as a separate type becomes valuable.

Modeling Validated Data As a Separate Type

Instead of treating validation as ad hoc logic scattered throughout the codebase, we can enforce it at the type level.

from dataclasses import dataclass

@dataclass(frozen=True)

class UserName:

value: str

def __post_init__(self):

if len(self.value) == 0:

raise ValueError(“User name cannot be empty”)

@dataclass(frozen=True)

class Age:

value: int

def __post_init__(self):

if self.value < 0:

raise ValueError(“Age cannot be negative”)

@dataclass(frozen=True)

class ValidatedUser:

name: UserName

age: Age

Now, any function that accepts a ValidatedUser can safely assume the data is valid. Validation occurs once at construction, and the type system encodes that guarantee.

def send_welcome_email(user: ValidatedUser):

# Safe: user.name.value is guaranteed non-empty

print(f”Sending email to {user.name.value}“)

# Usage:

u = ValidatedUser(UserName(“Alice”), Age(30))

send_welcome_email(u)

This design is LSP-friendly because any valid subtype or refinement of ValidatedUser is guaranteed to meet the same constraints, and therefore any function consuming ValidatedUser will behave correctly.

Why This Preserves LSP

Let’s break this down explicitly:

Base contract: A ValidatedUser has a non-empty name and non-negative age.
Subtype compatibility: Any refinement, like PremiumUser, must still meet those constraints.
Substitution safety: A PremiumUser can be used anywhere a ValidatedUser is required, without extra validation.

Contrast that with a dictionary-based approach. If you use raw dicts, there is no compile-time or construction-time guarantee. Any code expecting a “valid user” could receive invalid data, breaking assumptions and violating LSP.

Going Beyond Basic Examples

This concept applies to any domain where invariants matter. For example, suppose we need to model bank accounts:

@dataclass(frozen=True)

class AccountNumber:

value: str

def __post_init__(self):

if not self.value.isdigit() or len(self.value) != 10:

raise ValueError(“Account number must be 10 digits”)

@dataclass(frozen=True)

class Balance:

value: float

def __post_init__(self):

if self.value < 0:

raise ValueError(“Balance cannot be negative”)

@dataclass(frozen=True)

class BankAccount:

account_number: AccountNumber

balance: Balance

Once again, this makes invalid states unrepresentable. Any function accepting BankAccount can safely assume both fields are valid without rechecking. This avoids the risk of writing functions like:

def withdraw(account: dict, amount: float):

if amount < 0:

raise ValueError(“Invalid amount”)

# … must recheck balance here since input might be invalid

With strong types, you focus only on domain logic, not repetitive validation.

Advanced Technique: Using Smart Constructors

Sometimes you don’t want to expose direct constructors that raise exceptions. Instead, you can use smart constructors that return a safe type or an error result:

from typing import Union

class InvalidUser(Exception):

pass

def make_user_name(name: str) -> Union[UserName, InvalidUser]:

try:

return UserName(name)

except ValueError:

return InvalidUser(“Invalid user name”)

This approach works well in languages with richer type systems (like Rust or Haskell), where you can encode success/failure using Result or Either types. In Python, you can simulate it using exceptions or unions.

The key idea remains: validation is centralized and validated types are distinct from raw types.

Conclusion

Data validation is not just an input hygiene task—it’s fundamentally about preserving invariants in your system. When you scatter validation logic, you risk inconsistency, duplication, and violations of the Liskov Substitution Principle. Any code that expects “valid data” might unexpectedly receive “raw data,” breaking assumptions and introducing bugs.

By modeling validated data as a separate type, you make invalid states unrepresentable. This design:

Moves validation to a single point of responsibility.
Ensures that functions can trust their inputs.
Eliminates defensive programming boilerplate.
Enforces LSP compliance by making type guarantees explicit.

Think of validated types as contracts made tangible in code. Instead of just hoping everyone remembers the rules, you make it impossible to break the rules without the compiler (or runtime constructor) complaining. Whether you’re building web forms, APIs, or complex domain models, this approach leads to cleaner, safer, and more maintainable software.

In short: Treat validation as part of your type system, not just part of your logic. By separating validated data from raw input, you build systems where the Liskov Substitution Principle isn’t just a theory—it’s a guarantee.