Aneesha Bakharia

Aneesha Bakharia completed her PhD in interactive topic modeling at Queensland University of Technology in Australia. She is currently the Manager of Learning Analytics at The University of Queensland where she leads a team of programmers and data scientists.

projects by Aneesha Bakharia

Traditional and Neural Topic Modeling

3 weeks · 6-8 hours per week average · INTERMEDIATE

In this series of liveProjects, you’ll explore different techniques for topic modeling. Topic modeling is an incredibly useful unsupervised machine learning technique that allows you to find topics in text without needing any manual labelling. It’s a great way to quickly derive insights from text data and share them with key stakeholders. You’ll work with a variety of different text data corpuses to go hands-on with NMF algorithms from scikit-learn, LDA algorithms from Gensim, and even new neural network techniques using the OCTIS (Optimizing and Comparing Topic Models is Simple!) library.

Neural Topic Models

1 week · 6-8 hours per week · INTERMEDIATE

In this liveProject, you’ll use the neural network-inspired Contextual Topic Model to identify and visualize all of the articles in a scientific magazine’s back catalog. This cutting-edge technique is made easy by the OCTIS (Optimizing and Comparing Topic Models is Simple!) library. Once you’ve established your text-processing pipeline, you’ll use coherence and diversity metrics to evaluate the output of your topic models, tune your neural network’s hyperparameters to improve results, and visualize your results for printing on posters and other media.

Latent Dirichlet Allocation

1 week · 6-8 hours per week · INTERMEDIATE

In this liveProject, you’ll use the latent dirichlet allocation (LDA) algorithm from the Gensim library to model topics from a magazine’s article back catalog. Thanks to your work on topic modeling, the new Policy and Ethics editor will be better equipped to strategically commission new articles for under-represented topics. You’ll build your text preprocessing pipeline, use topic coherence to find the number of topics, and visualize and curate the algorithm’s output for your stakeholders to easily read.

Non-negative Matrix Factorization

1 week · 6-8 hours per week · INTERMEDIATE

In this liveProject you’ll use scikit-learn’s non-negative matrix factorization algorithm to perform topic modeling on a dataset of Twitter posts. You’ll step into the role of a data scientist tasked with summarizing Twitter discussions for the customer support team of an airline company and use this powerful algorithm to rapidly make sense of a large and complex text corpus. You’ll build a text preprocessing pipeline from scratch, visualize topic models, and finally compile a report of support topics for the customer services team.