Overview

4 Using a Coding Companion with Python

This chapter shifts from SQL to Python to show how an AI coding companion can accelerate common data engineering tasks. By iterating with clear, targeted prompts, you can generate templates, refine logic, and quickly converge on working code for real-world problems. The emphasis is on step-by-step, few-shot workflows that let you “speak the AI’s language” while keeping control of the outcome—useful for chores like calling APIs, parsing nested JSON, and crafting regex, which are flexible but often tedious and error-prone.

Through hands-on examples, the chapter demonstrates fetching and processing data from public APIs, starting with a zero-shot NumbersAPI script that builds requests, parses JSON, populates pandas DataFrames, and adds basic error handling. It then layers in retries, console feedback, and logging, and shows how uploading API documentation helps the model use optional parameters accurately—while warning about pitfalls like misread specs, hallucinated parameters, and model-to-model variability. For nested data, it uses JSONPlaceholder to flatten structures safely with chained .get() calls, combine fields (like lat/lng), and build resilient pipelines that tolerate missing or changing schema.

Regex work illustrates how AI can quickly draft patterns to extract, normalize, and structure phone numbers, including splitting components for DataFrames or JSON documents, while highlighting the brittleness of regex and the need for test cases and validation. A final lab with the Open Brewery DB ties everything together: pulling records, cleaning phone numbers, and extracting domains to practice prompt clarity, incremental refinement, and verification. The core habits reinforced throughout are to be explicit about execution context, treat outputs as drafts, test against real samples, and iterate deliberately for production-ready reliability.

Interacting with an API: This diagram shows how a client constructs a full request URL by combining a base endpoint with query parameters that define the data request (e.g., filters, limits, and format). The server receives the request, handles tasks like authentication and data lookup, and returns a structured JSON response. This client-server pattern is central to data engineering workflows, where APIs often serve as the primary source of external or cloud-based data. Knowing how to shape requests and interpret responses is essential for working with modern data pipelines.
Result of the zero-shot prompt response against the NumbersAPI
Result of the one-shot prompt response against the NumbersAPI
Result of the API documentation enhanced response against the NumbersAPI
Result of the simple JSON unpacking response against the JSONPlaceholder API.
Result of the complex JSON unpacking response against the JSONPlaceholder API.
Structured breakdown of phone numbers using regex and pandas
Results for Lab Answers 1
Results for Lab Answers 2
Results for Lab Answers 3

FAQ

How is collaborating with an AI coding companion in Python different from using it for SQL?Python workflows are iterative and stateful—you build scripts step by step, run cells, and refine logic. This makes few-shot prompting effective: ask, run, review, and iterate. SQL prompts often aim for one-shot queries, while Python benefits from progressive prompting for structure, retries, error handling, and data shaping.
What should I include in my prompt to get notebook-friendly code instead of overengineered scripts?Be explicit about context and intent. Say “Python code to run in a Jupyter notebook,” ask for print statements for real-time feedback, and avoid phrases like “script” if you don’t want files, main blocks, or production logging. Specify libraries, inputs, and desired outputs (e.g., “return a pandas DataFrame”).
How can an AI help me interact with APIs like NumbersAPI using Python?It can scaffold requests with the requests library, construct URLs with query parameters, parse JSON, handle errors, and load results into a pandas DataFrame. With clear prompts, it can also add timeouts, retries, and concise console messages for interactive runs.
How do I add robust retry logic and timeouts to API requests?Ask the AI to implement retries for transient errors (like timeouts), with a limited number of attempts, a delay between attempts, and a per-request timeout. Also request clear console prints for progress and outcomes, plus optional logging for post-run diagnostics.
What are common pitfalls when asking AI to use API documentation, and how do I prevent them?LLMs may misread docs, invent parameters, skip required headers, or mix path and query params. Mitigate by: uploading the docs, asking the model to list required vs optional parameters, explaining each part of the generated URL, and verifying with curl/Postman and sample responses before coding further.
How do I flatten nested JSON from APIs (e.g., JSONPlaceholder) into a DataFrame?Prompt the AI to safely access nested keys with chained .get(), select the fields you need (like address.city, company.name, company.catchPhrase), and combine fields when useful (e.g., “(lat, lng)”). Have it return a clean pandas DataFrame with clear column names.
How can I make JSON parsing resilient to missing or inconsistent fields?Use chained .get() with default fallbacks, guard parent objects that may be None, test against multiple records, and print or log a sample of the raw JSON before parsing. Ask the AI to include error handling and defaults for optional fields so the code doesn’t crash when the schema shifts.
When should I use print versus logging in notebooks?Use print for real-time, in-notebook progress and status messages; it’s simple and visible. Use logging when you need structured, persistent records (e.g., files) for debugging or production. You can combine them—console prints during development and logging for deeper diagnostics.
How can an AI help me write and refine regex for phone numbers, and what are the limits?The AI can draft regex to match multiple formats, use capture groups to extract parts, and propose normalization steps (e.g., +1-XXX-XXX-XXXX). Limits: regex is brittle on messy, real-world input. Provide examples and edge cases in your prompt, validate on real data, or switch to specialized libraries (like phonenumbers) when patterns get complex.
Should I output to a DataFrame or JSON for downstream systems?Choose DataFrame for tabular analysis and pandas-driven workflows; choose JSON for document-oriented systems (e.g., MongoDB, REST payloads). You can convert between them (DataFrame to JSON via df.to_json), but starting in the target format often simplifies integration.

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Learn AI Data Engineering in a Month of Lunches ebook for free
choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Learn AI Data Engineering in a Month of Lunches ebook for free