Simplify data science infrastructure to give data scientists an efficient path from prototype to production.
In Effective Data Science Infrastructure you will learn how to:
Design data science infrastructure that boosts productivity
Handle compute and orchestration in the cloud
Deploy machine learning to production
Monitor and manage performance and results
Combine cloud-based tools into a cohesive data science environment
Develop reproducible data science projects using Metaflow, Conda, and Docker
Architect complex applications for multiple teams and large datasets
Customize and grow data science infrastructure
Effective Data Science Infrastructure: How to make data scientists more productive is a hands-on guide to assembling infrastructure for data science and machine learning applications. It reveals the processes used at Netflix and other data-driven companies to manage their cutting edge data infrastructure. In it, you’ll master scalable techniques for data storage, computation, experiment tracking, and orchestration. You’ll also learn how to collaborate with data scientists to deliver exactly what they need to succeed.
The author is donating proceeds from this book to charities that support women and underrepresented groups in data science.
about the technology
Turning data science projects from small prototypes to sustainable business processes requires scalable and reliable infrastructure. This book lays out the workflows, components, and methods of the full infrastructure stack for data science, from data warehousing and scalable compute to modeling frameworks.
about the book
Effective Data Science Infrastructure: How to make data scientists productive is a guide to building infrastructure that will supercharge data science projects and data scientists. Based on state-of-the-art practices that power the massive data operations of Netflix, this book offers techniques and patterns relevant to companies of all shapes and sizes. You’ll learn how you can make data scientists more productive with your existing cloud infrastructure, a stack of open source software, and idiomatic Python.
As you work through this easy-to-follow guide, you’ll set up end-to-end infrastructure from the ground up, with a fully customizable process you can easily adapt to your company. You’ll build a cloud-based development environment that covers local prototyping and deployment to production, set up infrastructure that supports a real-world machine learning application, and handle a large-scale application for processing hundreds of gigabytes of data. Throughout, you’ll follow a human-centric approach focused on user experience and meeting the unique needs of data scientists.
about the reader
For infrastructure and DevOps engineers, and engineering-minded data scientists, who are familiar with Python.
about the author
Ville Tuulos has been developing tools and infrastructure for data science and machine learning for over two decades. At Netflix, he designed and built Metaflow, a full-stack framework for data science. Currently, he is the CEO of a startup focusing on data science infrastructure.
customers also reading
FREE domestic shipping on orders of three or more print books
Useful book that provides tactical guidance on how to use Metaflow to streamline data science workflows but also includes great frameworks and abstractions to consider when defining your data science infrastructure stack.
This is the ultimate book to learn how to handle infrastructure in data science!
If you need a workflow management tool to glue your data code, look at metaflow. It's simple yet efficient.