NLP Entity Linking for Medical Transcripts

prerequisites
intermediate Python • basic Jupyter Notebook • basic pandas • basic natural language processing (NLP)
skills learned
data preparation with pandas • leverage services from the IBM Project Debater API to assess data quality, identify key points, and expand terms based on Wikipedia • data visualization with Seaborn and Matplotlib • construct interactive dashboards in Streamlit for NLP analysis
Paco Nathan
1 week · 6-10 hours per week · INTERMEDIATE
This title has been retired and is no longer for sale.

I like how well thought-out and well-defined the instructions were! This liveProject was very informative.

Roman Bicherschii
Look inside

In this liveProject, you’re a data scientist at a healthcare provider that deals with large volumes of incoming text. Your task is to analyze a large dataset containing medical transcriptions. Leveraging technologies including pandas, the IBM Project Debater API, and Seaborn, you’ll explore a Kaggle dataset, segment text data into known categories, and extract key points.

You’ll finish by building an interactive data visualization dashboard for analysis in the open-source framework Streamlit. When you’re done, you’ll have leveled up your NLP toolbox with skills that are highly sought not only in healthcare but in law, customer support, market intelligence, media, and many other fields.

Special thanks to the IBM team of data scientists who crafted this scenario and allowed it to be shared as a broader education tool. Authors: Álvaro Corrales Cano, Yamini Rao and Adam Green.

The IBM Data Science Community is a place dedicated to people supporting the practice of data science in their businesses, for practitioners by practitioners. Whether you’re a data scientist, machine learning engineer, AI developer, or someone working on the AI lifecycle, the community lets you connect with others, engage on timely topics, and share your expertise.


liveProject mentor Cass Petrus shares what he likes about the Manning liveProject platform.

This liveProject gives you a wide variety of tools to work through. It teaches you how to use a cloud NLP API to analyze text data and takes you into how you can use that to build a dashboard.

Cass Petrus

I like how this course incorporated a brand new technology (IBM Project Debater) and also included some more established technologies like Streamlit and Seaborn. All were new to me, and I felt like it gave a good introduction to the type of work you can do with these technologies and left me excited to explore them more on my own.

Samantha Berk

project author

Paco Nathan
Paco Nathan is a managing partner at Derwen, Inc. With more than 40 years of experience in the tech industry, ranging from Bell Labs to early-stage start-ups, he’s an advisor for Amplify Partners, Recognai, and KUNGFU.AI, and he’s a lead committer on PyTextRank and kglab. Formerly, he was the director of community evangelism at Databricks and Apache Spark.

prerequisites

This liveProject is for intermediate data scientists. To begin this liveProject, you’ll need to be familiar with the following:

TOOLS
  • Intermediate Python
  • Basic Jupyter Notebook
  • Data visualization libraries (such as Seaborn and Matplotlib)
TECHNIQUES
  • Install a Python library from a local source
  • Data preparation
  • Load a CSV file into a pandas dataframe
  • Save a pandas dataframe as a CSV file
  • Reshape data using list comprehensions
  • Render data from a dataframe as a bar chart
  • Basic natural language processing (NLP)
  • Build dashboards with Streamlit

you will learn

In this liveProject, you’ll learn NLP skills widely used by data scientists in fields including healthcare, law, customer support, market intelligence, media, and others.

  • Data preparation with pandas
  • Leverage services from the IBM Project Debater API (includes sentiment analysis, argument mining, and narrative generation) to assess data quality, identify key points, and expand terms based on Wikipedia
  • Data visualization with Seaborn and Matplotlib
  • Construct interactive dashboards in Streamlit for NLP analysis

features

Self-paced
You choose the schedule and decide how much time to invest as you build your project.
Project roadmap
Each project is divided into several achievable steps.
Get Help
While within the liveProject platform, get help from other participants.
Compare with others
For each step, compare your deliverable to the solutions by the author and other participants.