Event box

Python for Data Science: Pandas 102 – melt to tidy data & merge to JOIN

[Online] Python can be a great option for exploration, analysis and visualization of tabular data, such as spreadsheets and CSV files, if you know which tools to use and how to get started. This workshop will take you through some practical examples of using Python and specifically the Pandas module to load data from files and transform it into a standard “tidy” format, so it's ready to analyze and visualize. We will visualize some tidy data in Seaborn, and also be learning how to merge two datasets, like a database JOIN.

This workshop builds upon, and is a continuation of, the very basic Pandas workshop I gave last semester:

  • While none of the content will be very advanced, if you don't have any programming experience, or you have never used Python at all before, the material may be too confusing to be useful – I won't be teaching the language itself. 
  • If you have at least a tiny bit of Python exposure, but haven't used Pandas much or at all, I would advise watching last semester's Pandas 101 – Intro to Tabular Data in Python and JupyterLab video before you attend. 
  • You will be expected to arrive with Python and Pandas already installed on the machine you're Zooming from.
    • Instructions are listed below for installation!
    • I will hold open Zoom walk-in hours for an hour before the workshop to help remotely troubleshoot installation issues. Email me at emonson@duke.edu to get the URL.

Anaconda Python distribution (Individual Edition): 
https://www.anaconda.com/products/individual

I strongly recommend that you install the Anaconda Python Distribution to use in class. In principle, if you have something above Python 3.8 or so, plus all the necessary modules, everything should work fine. But, the Anaconda Distribution is packaged nicely, can be installed without admin privileges, and comes with everything you’ll need. If you have another version of Python already installed and you’re going to install Anaconda, it’s best to uninstall the other version first. It can get to be a mess if you have multiple versions of Python installed on one machine. 

Go to the link above, hit Download, and choose the version for your operating system. I would recommend to just install for “yourself”, not for all users of the machine, since that way it will install everything in your Users/username folder and doesn’t require admin privileges. 

If you’re on Mac and aren’t comfortable with shell scripts on the command line, choose the Graphical Installer. 

On Windows, I would choose the 64-bit installer, unless you know you’re still running a 32-bit version of Windows on an older machine. 

If you’re sticking with your non-Anaconda version of Python, make sure you have JupyterLab, Pandas, Seaborn, and all of their respective dependencies installed.

Please try to launch Python and JupyterLab before class to make sure they’re working! JupyterLab can be started from the Anaconda Navigator application, or from the Anaconda Prompt (Windows) or a Terminal (Mac) by typing (without quotes) “jupyter lab” and hitting return. From a Python notebook or an interactive Python prompt, you can test out the main modules you’ll need by typing this and executing the code cell:

import pandas as pd
import seaborn as sns

A zoom link will be sent via email to registered participants to join the workshop.

The content of the workshop may be recorded. If you are uncomfortable with a recording being published please contact the instructor at anytime prior to the conclusion of the workshop.

Data Science Data Visualization

Date:
Thursday, November 17, 2022
Time:
10:00am - 12:00pm
Campus:
n/a
Categories:
Data and Visualization  
Registration has closed.

Event Organizer

Eric Monson
Profile photo of Center for Data and Visualization Sciences
Center for Data and Visualization Sciences