These easy to learn and apply software engineering techniques will radically improve collaboration, scaling, and deployment in your data science projects.
In Software Engineering for Data Scientists you’ll learn to improve performance and efficiency by:
Using source control
Handling exceptions and errors in your code
Improving the design of your tools and applications
Scaling code to handle large data efficiently
Testing model and data processing code before deployment
Scheduling a model to run automatically
Packaging Python code into reusable libraries
Generating automated reports for monitoring a model in production
Software Engineering for Data Scientists presents important software engineering principles that will radically improve the performance and efficiency of data science projects. Author and Meta data scientist Andrew Treadway has spent over a decade guiding models and pipelines to production. This practical handbook is full of his sage advice that will change the way you structure your code, monitor model performance, and work effectively with the software engineering teams.
about the technology
Many basic software engineering skills apply directly to data science! As a data scientist, learning the right software engineering techniques can save you a world of time and frustration. Source control simplifies sharing, tracking, and backing up code. Testing helps reduce future errors in your models or pipelines. Exception handling automatically responds to unexpected events as they crop up. Using established engineering conventions makes it easy to collaborate with software developers. This book teaches you to handle these situations and more in your data science projects.
about the book
In Software Engineering for Data Scientists you’ll find tested software engineering techniques that will make your daily life easier as a data scientist. You’ll quickly get up to speed with how software engineering can solve common problems, then dive straight into source control, object-oriented programming, code testing, and packaging. Hands-on examples make it easy to see how new principles can be put into practice in a data science context.
Improve code structuring and reusability in a customer churn prediction model
Learn to scale data processing code by experimenting with Spotify data
Build a lightweight web app to monitor a machine learning model
Master the software design conventions that make your code easy to share and modify
…and much more!
Every chapter comes with focused exercises and downloadable code for you to experiment and explore. You’ll be amazed at how a few changes in your process can make your data science projects so much easier to create and maintain.
about the reader
For data scientists who know the basics of Python.
about the author
Andrew Treadway is a Senior Research Data Scientist at Meta. He has over a decade of experience working in data science. Andrew has taught data science and programming courses through Baruch College, 365 Data Science, and Interview Kickstart. He runs a blog at TheAutomatic.net, covering topics around Python and R.