1 Data Oriented Programming
This chapter introduces data-oriented programming as a pragmatic shift in emphasis: model the information in your domain “as data,” independent of classes and behaviors, so the code itself communicates what things are. By focusing first on meaning and representation, programs become smaller, clearer, and easier to reason about. Objects are not discarded; they remain valuable for boundaries and resource management. The change is simply to elevate data representation to the center and use objects as a tool where they fit best.
The chapter shows how representation drives clarity and correctness, starting with a one-line example: replacing a vague String id with a concrete UUID id eliminates ambiguity and makes illegal states unrepresentable, removing defensive code and tests. A larger example refactors a ScheduledTask that previously encoded meaning implicitly (scheduledAt and attempts) into explicit domain data (RetryImmediately, ReattemptLater, Abandon). This makes semantics self-describing across the codebase and guards against “semantic drift.” The approach complements object-oriented design: richer data clarifies public interfaces (e.g., isAbandoned) while preserving encapsulation.
Orienting around data reshapes APIs and implementations: method signatures begin to speak in terms of domain types (e.g., reschedule(FailedTask) → RetryDecision), and bodies become expression-oriented transformations of input data to output data. Modeling data also reveals deeper domain distinctions (ScheduledTask vs. FailedTask vs. CompletedTask), guiding design that feels “inevitable.” Although the book leverages modern Java features (records, pattern matching, sealed types), the principles are tool-agnostic and workable on older JDKs with conventions and libraries. The teaching style uses small, realistic examples—including missteps—to build intuition. The core steps are simple: identify what the data really is, encode that meaning in its representation, and let the resulting clarity ripple outward through the program.
Objects and how they communicate is our focus during object-oriented design
The representation of our data is the primary focus during data oriented design
Being explicit about what a task can transition to after failing
Representing each decision as a piece of standalone data
Focusing on just the data makes us question our representation
clarifying what we’re talking about
Analyzing the data drives a deeper exploration of the domain
How data-oriented programs tend to be shaped
Summary
- Data Oriented programming is about programming with data "as data"
- Data is more than just a collection of values. It has an inherent meaning.
- Modeling “data as data” lets us focus on capturing that meaning in isolation from other concerns
- Before asking “what does it do?” data orientation starts a more bedrock question of “what is it?” We want to understand what these things in our domain are at a fundamental level
- Data Orientation is not a replacement for object orientation, functional programming, or any other paradigm. We view all of them as useful tools.
- The representations we choose for our data affects our programs as a whole.
- Good representations eliminate the potential for bugs by making it impossible to create invalid data
- Bad representations introduce problems which ripple outward through our codebase and force us to spend effort working around them
- We can replace reasoning about what vague variable assignments mean by representing that meaning with a concrete data type
- Focusing on the data inside of our objects, rather than just the interfaces, makes our objects as a whole more understandable
- When we do a good job of modeling the data, the rest of the code will feel like it’s writing itself. We just have to follow where the data leads
- Data-Oriented programs tend to be built around functions that take data as input and return new data as output
- We’ll use Java 21 throughout the book (though, you can still follow along with Java 8)
FAQ
What is Data-Oriented Programming (DOP) in Java?
DOP is a way of organizing programs around the data they manage. It emphasizes modeling data explicitly—independent of classes, operations, or behaviors—so the code clearly communicates “what it is.” Precise data representations make programs smaller, simpler, and easier to understand.Does DOP replace object-oriented programming?
No. DOP doesn’t abandon objects; it changes where and how much we use them. Objects excel at managing stateful resources and enforcing boundaries. DOP uses objects at those boundaries and relies on plain, explicit data to represent domain meaning in the core logic.What does “data as data” mean?
It means modeling domain information as ordinary values whose meaning is explicit in their representation, not implied by object behavior. The focus moves from “what does it do?” to the foundational “what is it?” so the semantics are visible directly in code.Why is the representation of data so important?
Representation communicates semantics. Imprecise representations create ambiguity and bugs. Precise ones describe intent, reduce guesswork, and make code self-explanatory. They also limit what states can be constructed, shrinking the space of possible errors.How does the String vs UUID example illustrate DOP?
Representing an identifier asString allows infinitely many invalid values; representing it as UUID encodes the semantics in the type. With UUID, illegal states (non-UUIDs) can’t be constructed, removing defensive checks and related tests.How does DOP reduce illegal states and defensive code?
By aligning types with meaning, code can only construct valid states. This eliminates scattered validations, preconditions, and “anti-corruption” layers that compensate for vague types, reducing both production code and tests.How does DOP clarify business logic in the scheduling example?
Instead of inferring meaning from fields likescheduledAt and attempts, DOP models explicit decisions as data (e.g., RetryImmediately, ReattemptLater, Abandon). The code becomes expressive and unambiguous (e.g., checking task.getStatus() instanceof Abandon instead of scheduledAt == null), improving semantic integrity.Is using instanceof acceptable here?
Whileinstanceof often signals missing polymorphism in OOP, in DOP it’s used over data representations, not behavior-rich objects. For plain data, such checks can be appropriate and will be justified with patterns explored later.How does focusing on data reshape APIs and method design?
Methods tend to accept data and return data with expressive signatures (e.g.,RetryDecision reschedule(FailedTask)). Implementations become expression-oriented (e.g., switch expressions) and objects act like pipelines that manage and vend data.
Data-Oriented Programming in Java ebook for free