Sven Balnojan

Dr. Sven Balnojan is a data technologist and product person focused on helping the world extract more value from the exponentially growing amount of data. He’s passionate about all things data, machine learning, AI, business intelligence, and many related fields. His endeavors include managing internal data teams and the transitions from being a service-oriented team to a platform-oriented team, as well as getting his hands dirty as a data developer in the fields of machine learning, data engineering, and Data DevOps. Sven holds a PhD in mathematics with a thesis in the field of singularity theory. He is the author of an opinionated newsletter, “Three Data Point Thursday.” Additionally, he blogs at https://datacisions.com and appears in talks here and there over the internet.

books & projects by Sven Balnojan

Getting Started with Data Mesh

  • August 2022
  • ISBN 9781633438668
  • 66 pages

Getting Started with Data Mesh features two chapters from Manning’s Data Mesh in Action, by expert data professionals Jacek Majchrzak, Sven Balnojan, and Marian Siwiak. Inside, you’ll discover what the data mesh architecture is, and its key concepts and advantages. You’ll get guidance from the experts on how data mesh stacks up against other data architectures and how your organization will benefit. With this small ebook, get the big data mesh picture and discover if this revolutionary data architecture is for you.

Build a Small Dockerized Data Mesh

5 weeks · 5-7 hours per week average · INTERMEDIATE

You’re a consultant working for Messflix, a movie and TV-show streaming platform. Despite having a goldmine of data, Messflix has been unsuccessful in creating a recommendation system. You’ve discovered the problem: the right data is not flowing to the right use cases. Messflix agrees with your suggestion of implementing a data mesh to decentralize data and treat it as a product instead of a byproduct.

You’ll build a Python prototype to explore a self-serve data platform and add functionality for publishing data products. You’ll create a derived recommendation data product that shows a list of recommended movies. Taking on a data product management perspective, you’ll learn to solve and prevent breaking changes. Last but not least, you’ll implement federated computational governance that balances the usefulness, interoperability, and security aspects of data products with the benefits of the data mesh. By the end of this series, you’ll have learned key principles of the data mesh and worked through all its major use cases.

Ensure Computational Governance

1 week · 3-5 hours per week · INTERMEDIATE

The development teams at Messflix, a movie and TV-show streaming platform are pushing domain data products through their newly implemented data mesh. Now the CTO has tasked you, their consultant, with striking a balance between the benefits of the data mesh, the freedoms the data products have, their usefulness, their interoperability, and aspects of their security. You’ll use Python and pandas to write policies that check the registration and pushed data, helping users provide all the required registration information. You’ll create tooling to classify data into categories for improved data labeling, and protect sensitive data with pseudonymization functions. When you’re done, you’ll have learned skills for federated computational governance that balance the benefits of data products with the benefits of the data mesh.

Manage Data Products

1 week · 6-8 hours per week · INTERMEDIATE

As a consultant for Messflix Inc., a movie and TV-show streaming platform, you’ll investigate and discover why Meshflix’s recommender system breaks. You’ll brainstorm options for changes that don’t break the system, explore their pros and cons, and choose and implement one of your options. Then you’ll create an internal versioning strategy to support all the great product changes Messflix has planned for the future.

Publish Data Products

1 week · 6-8 hours per week · INTERMEDIATE

Messflix Inc., a movie and TV-show streaming platform, wants to build a recommendation system for its movies and shows, but currently, its data landscape is too complex. As a consultant, your task is to implement a data mesh for an improved, accurate flow of its data. Using Python and JSON, you’ll help the data engineering teams sift more easily through Messflix’s data by creating separate, structured data products that can be pushed to the central data platform. From the organized data products, you’ll create a list of recommended movies, tailored to Messflix’s customers’ preferences.

Push Data into Data Products

1 week · 6-8 hours per week · INTERMEDIATE

Messflix Inc., a movie and TV-show streaming platform, is implementing a data mesh. So far, it has a self-serve data platform prototype where development teams can register their domain data products. As a consultant, your task is to build on that basic platform prototype with additional functionality: You’ll write a script in Python that will enable the development teams to push fresh data into their existing data products, write a function that adds support for versioned data, and implement a function that automatically calculates specific metadata (like row count and latest timestamp), then prints it to the screen. When you’re finished, you’ll have built a well-functioning, feature-rich, self-serve data platform, and be familiar with the requirements data-producing and data-consuming teams face daily and how to fulfill them.

Build a Self-Serve Data Platform

1 week · 6-8 hours per week · INTERMEDIATE

You’re a consultant working for Messflix Inc., a movie and TV-show streaming platform. Your task is to set up a Python prototype implementation of the data mesh to roll out the technical components of a data mesh. Using Python and pandas, you’ll write a Python function that creates an empty CSV file with the predefined attributes of your data product, builds a data catalog by creating Python functions that write to the CSV file, and sets up standardized access to the CSV datasets. When you’re done, you’ll have hands-on experience building a minimal self-serve data platform using simple techniques.

Data Mesh in Action

  • December 2022
  • ISBN 9781633439979
  • 328 pages
  • printed in black & white
  • Available translations: Russian

Data Mesh in Action teaches you pragmatic ways to decentralize your data and organize it into an effective data mesh. You’ll start by building a minimum viable data product, which you’ll expand into a self-service data platform, chapter-by-chapter. You’ll love the book’s unique “sliders” that adjust the mesh to meet your specific needs. You’ll also learn processes and leadership techniques that will change the way you and your colleagues think about data.