Pandas Cheat Sheet: Essential Guide for Python Data Analysis

Master your data analysis skills with this comprehensive Pandas cheat sheet. Learn key functions, coding examples, and download the Panda cheat sheet PDF for quick reference.


Introduction

Pandas is a powerful library in Python, widely used for data manipulation and analysis. It offers flexible data structures, such as Series and DataFrame, to efficiently handle structured data. This pandas cheat sheet aims to provide a quick reference to the most commonly used Pandas functions, ensuring you can efficiently perform data analysis tasks. Additionally, you’ll find coding examples to illustrate these functions in action.

What is Pandas?

Pandas is an open-source Python library that provides high-performance, easy-to-use data structures, and data analysis tools. It's built on top of NumPy and is an essential tool for data scientists working in Python.

Why Use a Pandas Cheat Sheet?

A Pandas cheat sheet is a handy reference guide that helps you quickly recall and implement Pandas functions and methods. Whether you're a beginner or an experienced user, having a cheat sheet can save you time and enhance your productivity.

Getting Started with Pandas

Installing Pandas

To install Pandas, you can use pip:

pip install pandas

Importing Pandas

Once installed, you can import Pandas into your Python script:

import pandas as pd

Data Structures in Pandas

Series

A Series is a one-dimensional labeled array capable of holding any data type.

import pandas as pd

# Creating a Series
s = pd.Series([1, 3, 5, 7, 9])
print(s)

DataFrame

A DataFrame is a two-dimensional labeled data structure with columns of potentially different types.

# Creating a DataFrame
data = {
'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 24, 35, 32]
}
df = pd.DataFrame(data)
print(df)

Creating DataFrames

From Dictionaries

data = {
'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 24, 35, 32]
}
df = pd.DataFrame(data)
print(df)

From Lists

data = [['John', 28], ['Anna', 24], ['Peter', 35], ['Linda', 32]]
df = pd.DataFrame(data, columns=['Name', 'Age'])
print(df)

From CSV Files

df = pd.read_csv('data.csv')
print(df.head())

Data Inspection

Viewing Data

# Display the first 5 rows
print(df.head())

# Display the last 5 rows
print(df.tail())

Getting Data Info

# Display DataFrame info
print(df.info())

Descriptive Statistics

# Get descriptive statistics
print(df.describe())

Data Selection

Selecting Columns

# Select a single column
print(df['Name'])

# Select multiple columns
print(df[['Name', 'Age']])

Selecting Rows

# Select rows by index
print(df.iloc[0]) # First row
print(df.iloc[1:3]) # Second and third rows

# Select rows by label
print(df.loc[0])

Conditional Selection

# Select rows based on condition
print(df[df['Age'] > 30])

Data Cleaning

Handling Missing Values

# Fill missing values
df.fillna(0, inplace=True)

# Drop missing values
df.dropna(inplace=True)

Removing Duplicates

# Remove duplicate rows
df.drop_duplicates(inplace=True)

Data Transformation

Adding New Columns

# Add a new column
df['Salary'] = [50000, 60000, 70000, 80000]
print(df)

Applying Functions

# Apply a function to a column
df['Age'] = df['Age'].apply(lambda x: x + 1)
print(df)

Merging Data

# Merge DataFrames
df1 = pd.DataFrame({'Name': ['John', 'Anna'], 'Age': [28, 24]})
df2 = pd.DataFrame({'Name': ['John', 'Anna'], 'Salary': [50000, 60000]})
merged_df = pd.merge(df1, df2, on='Name')
print(merged_df)

Grouping and Aggregating

GroupBy

# Group by a column
grouped_df = df.groupby('Age')
print(grouped_df.size())

Aggregation Functions

# Aggregate data
print(df.groupby('Age')['Salary'].sum())

Data Visualization

Plotting with Pandas

import matplotlib.pyplot as plt

# Plotting a column
df['Age'].plot(kind='bar')
plt.show()

Pandas Coding Examples

Example 1: Creating DataFrames

data = {
'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 24, 35, 32]
}
df = pd.DataFrame(data)
print(df)

Example 2: Data Inspection

print(df.head())
print(df.info())
print(df.describe())

Example 3: Data Selection

print(df['Name'])
print(df.iloc[1:3])
print(df[df['Age'] > 30])

Example 4: Handling Missing Values

df.fillna(0, inplace=True)
print(df)

Example 5: Removing Duplicates

df.drop_duplicates(inplace=True)
print(df)

Example 6: Adding New Columns

df['Salary'] = [50000, 60000, 70000, 80000]
print(df)

Example 7: Applying Functions

df['Age'] = df['Age'].apply(lambda x: x + 1)
print(df)

Example 8: Merging Data

df1 = pd.DataFrame({'Name': ['John', 'Anna'], 'Age': [28, 24]})
df2 = pd.DataFrame({'Name': ['John', 'Anna'], 'Salary': [50000, 60000]})
merged_df = pd.merge(df1, df2, on='Name')
print(merged_df)

Example 9: Grouping Data

print(df.groupby('Age').size())

Example 10: Plotting Data

import matplotlib.pyplot as plt

df['Age'].plot(kind='bar')
plt.show()

Downloading the Pandas Cheat Sheet PDF

For your convenience, you can download the complete Pandas cheat sheet in PDF format from the following link:

Download Panda Cheat Sheet PDF

FAQs

What is a Pandas cheat sheet? A Pandas cheat sheet is a quick reference guide that includes the most commonly used functions and methods in the Pandas library, helping users quickly recall and implement them in their data analysis tasks.

How can I install Pandas? You can install Pandas using pip with the command pip install pandas.

What is the difference between a Series and a DataFrame in Pandas? A Series is a one-dimensional labeled array, while a DataFrame is a two-dimensional labeled data structure with columns of potentially different types.

How do I handle missing values in Pandas? You can handle missing values by using the fillna() method to fill them or the dropna() method to remove them.

Can I plot data using Pandas? Yes, Pandas has built-in plotting capabilities using the plot() method, which integrates with Matplotlib.

What are aggregation functions in Pandas? Aggregation functions in Pandas, such as sum(), mean(), and count(), are used to perform operations on groups of data.




Previous Post Next Post