In this liveProject you’ll step into the shoes of a developer at PharmaChain Iberia, a pharmaceutical supply chain company. A big global competitor has recently set up shop in your region and has been steadily gaining market share, thanks to hyper efficient business process automation in its supply lines. This automation service can receive and read delivery notes ten times faster than the admin staff in your company—and you need to innovate fast to keep your status as the regional market leader. You’ve been tasked with implementing your own company-wide automation solution, using Python to automate your processes. To achieve this, you’ll use Python to iterate through subfolders of your current text-based PDFs and PDF form files and extract field keys and values. You’ll write scripts that can identify field IDs on web forms that correspond to the PDF documents, and automatically populate the web forms with information extracted from your PDFs—no manual data entry needed! Finally, you’ll prepare a report detailing your conclusions for your manager at PharmaChain.
This project is designed for learning purposes and is not a complete, production-ready application or solution.
The liveProject is for intermediate Python programmers who are familiar with the basics of information extraction. To begin this liveProject, you will need to be familiar with:
- Intermediate Python
- Basic pip
- Basic browser development tools
you will learn
In this liveProject you’ll learn to automate repetitive manual data processes using Python and the most popular Python automation libraries.
- File type identification
- Text and field extraction from PDFs
- Web page resource identification