Some people like to believe that all data is ready to be used immediately. Not so! Data in the wild is hard to track and harder to understand, and the first job of data scientists to identify and prepare data so it can be used. To find your way through the data jungle successfully, you need the right perspective and guidance. (There's no point hacking at overgrowth with a spoon after all!) Identify and prepare your data well, and you'll be well set to create insight from chaos and discover important analytic patterns - to set your business on the right track.
Exploring the Data Jungle: Finding, Preparing, and Using Real-World Data is a collection of three hand-picked chapters introducing you to the often-overlooked art of putting unfamiliar data to good use. Brian Godsey, author of Think Like a Data Scientist, has selected these chapters to help you navigate data in the wild, identify and prepare raw data for analysis, modeling, machine learning, or visualization. As you explore the data jungle you'll discover real-world examples in Python, R, and other languages suitable for data science.
Think Like a Data Scientist teaches you a step-by-step approach to solving real-world data-centric problems. By breaking down carefully crafted examples, you'll learn to combine analytic, programming, and business perspectives into a repeatable process for extracting real knowledge from data. As you read, you'll discover (or remember) valuable statistical techniques and explore powerful data science software. More importantly, you'll put this knowledge together using a structured process for data science. When you've finished, you'll have a strong foundation for a lifetime of data science learning and practice.