Event box

Python for Data Science: Pandas 103 – groupby and aggregation

[Online] Data exploration In Python using grouping and aggregation. This is an intermediate-level, live teaching session where you will learn how to use the Pandas module for exploring tablular (spreadsheet) data using the groupby() and pivot_table() functions, as well as some visualizations of results.

Python can be a great option for exploration, analysis and visualization of tabular data, such as spreadsheets and CSV files, if you know which tools to use and how to get started. This workshop builds upon the introductory Pandas 101 & 102 workshops I gave in Spring 2022 & Fall 2022. (Code repository. See below for recordings.) In Pandas 101, I covered the very basics of how to access your data in a Panda DataFrame and do some basic plotting. In Pandas 102, I introduced how to get data into a "tidy" form, and merge datasets (like doing an SQL JOIN). In this Pandas 103, I will show you some of the way you can explore patterns in data by aggretating across categories and time. This is similar to the process of data exploration in Tableau, but here with Python, Pandas and JupyterLab.

  • If you don't have any programming experience, or you have never used Python at all before, the material may be too confusing to be useful. I won't be teaching the language itself. 
  • If you have at least a little bit of Python exposure, but haven't used Pandas much or at all, I would advise watching at least the Spring 2020 Pandas 102 video before you attend. If you find that too advanced, or want a more complete introduction to Pandas, start instead with the Spring 2022 Pandas 101 video.

Expectations: 

  • You will be expected to have your video on for at least part of the session, although we won't be doing any group work or sharing.
  • If you need help with something during the session, you'll be expected to share your screen.
  • You will be expected to arrive with the Anaconda Python distribution already installed on the machine you're Zooming from if you want to work along with me or do the exercises during the workshop!
    • They now call this the Anaconda Individual Edition, available for Mac, Windows, or LInux
    • I would advise installing just for yourself, not for all users (installs in your Users directory, and doesn't need administrator priviledges)
    • I will hold open Zoom walk-in hours for an hour before the workshop to help remotely troubleshoot installation issues. Email me at emonson@duke.edu to get the URL.

This event is offered virtually. A zoom link will be sent via email to registered participants to join the workshop. 

The content of the workshop may be recorded. If you are uncomfortable with a recording being published please contact the instructor at anytime prior to the conclusion of the workshop.

Data Science, Data Visualization

Date:
Thursday, April 6, 2023
Time:
10:00am - 12:00pm
Campus:
n/a
Categories:
Data and Visualization  
Registration has closed.

Event Organizer

Eric Monson
Profile photo of Center for Data and Visualization Sciences
Center for Data and Visualization Sciences