GitHub is stuffed full of free and open-source development tools, all with no guarantee that they’re stable, up-to-date, or even still maintained! Companies need a way to ensure that the GitHub projects they’re adopting come with a clean bill of health — and for that, they turn to you!
You’re the CTO of GitHub Health, a unique startup that provides reports and analysis of GitHub software projects. But your findings aren’t based on gossip, they’re based on data. In this liveProject, you’ll build a serverless data system that can extract meaningful data from GitHub, store it in a database, and display the statistics using Google Cloud. Once you’re done, you’ll generate a report for a new client on which of three open-source projects is the best choice for its new DevOps team.
This project is designed for learning purposes and is not a complete, production-ready application or solution.
This liveProject is for early-career data analysts and programmers looking to experiment with serverless data platforms. The project is language-agnostic, and skeleton solutions for Python, Go, Node.js and SQL are available for those who need them. To begin this liveProject, you will need to be familiar with:
- UNIX shell/terminal
- Basic Git
- Basic SQL
- HTTP request/response handling
- Basic summary statistics
you will learn
In this liveProject, you’ll learn how to extract insights from data without having to provision and manage data processing infrastructure. The serverless mindset you develop will help boost your productivity and prove a valuable asset to your career in data science. While this liveProject uses Google Cloud for its solution, the techniques developed are equally applicable to Amazon Web Services and Microsoft Azure.
- Querying Github to extract data
- Executing the code and storing the results of the queries
- Loading query results into a database
- Transforming data using BigQuery SQL functions
- Using Data Studio to draw up reports