Master your data analysis skills with this comprehensive Pandas cheat sheet. Learn key functions, coding examples, and download the Panda cheat sheet PDF for quick reference.
Introduction
Pandas is a powerful library in Python, widely used for data manipulation and analysis. It offers flexible data structures, such as Series and DataFrame, to efficiently handle structured data. This pandas cheat sheet aims to provide a quick reference to the most commonly used Pandas functions, ensuring you can efficiently perform data analysis tasks. Additionally, you’ll find coding examples to illustrate these functions in action.
What is Pandas?
Pandas is an open-source Python library that provides high-performance, easy-to-use data structures, and data analysis tools. It's built on top of NumPy and is an essential tool for data scientists working in Python.
Why Use a Pandas Cheat Sheet?
A Pandas cheat sheet is a handy reference guide that helps you quickly recall and implement Pandas functions and methods. Whether you're a beginner or an experienced user, having a cheat sheet can save you time and enhance your productivity.
Getting Started with Pandas
Installing Pandas
To install Pandas, you can use pip:
pip install pandas
Importing Pandas
Once installed, you can import Pandas into your Python script:
import pandas as pd
Data Structures in Pandas
Series
A Series is a one-dimensional labeled array capable of holding any data type.
import pandas as pd
# Creating a Series
s = pd.Series([1, 3, 5, 7, 9])
print(s)
DataFrame
A DataFrame is a two-dimensional labeled data structure with columns of potentially different types.
# Creating a DataFrame
data = {
'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 24, 35, 32]
}
df = pd.DataFrame(data)
print(df)
Creating DataFrames
From Dictionaries
data = {
'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 24, 35, 32]
}
df = pd.DataFrame(data)
print(df)
From Lists
data = [['John', 28], ['Anna', 24], ['Peter', 35], ['Linda', 32]]
df = pd.DataFrame(data, columns=['Name', 'Age'])
print(df)
From CSV Files
df = pd.read_csv('data.csv')
print(df.head())
Data Inspection
Viewing Data
# Display the first 5 rows
print(df.head())
# Display the last 5 rows
print(df.tail())
Getting Data Info
# Display DataFrame info
print(df.info())
Descriptive Statistics
# Get descriptive statistics
print(df.describe())
Data Selection
Selecting Columns
# Select a single column
print(df['Name'])
# Select multiple columns
print(df[['Name', 'Age']])
Selecting Rows
# Select rows by index
print(df.iloc[0]) # First row
print(df.iloc[1:3]) # Second and third rows
# Select rows by label
print(df.loc[0])
Conditional Selection
# Select rows based on condition
print(df[df['Age'] > 30])
Data Cleaning
Handling Missing Values
# Fill missing values
df.fillna(0, inplace=True)
# Drop missing values
df.dropna(inplace=True)
Removing Duplicates
# Remove duplicate rows
df.drop_duplicates(inplace=True)
Data Transformation
Adding New Columns
# Add a new column
df['Salary'] = [50000, 60000, 70000, 80000]
print(df)
Applying Functions
# Apply a function to a column
df['Age'] = df['Age'].apply(lambda x: x + 1)
print(df)
Merging Data
# Merge DataFrames
df1 = pd.DataFrame({'Name': ['John', 'Anna'], 'Age': [28, 24]})
df2 = pd.DataFrame({'Name': ['John', 'Anna'], 'Salary': [50000, 60000]})
merged_df = pd.merge(df1, df2, on='Name')
print(merged_df)
Grouping and Aggregating
GroupBy
# Group by a column
grouped_df = df.groupby('Age')
print(grouped_df.size())
Aggregation Functions
# Aggregate data
print(df.groupby('Age')['Salary'].sum())
Data Visualization
Plotting with Pandas
import matplotlib.pyplot as plt
# Plotting a column
df['Age'].plot(kind='bar')
plt.show()
Pandas Coding Examples
Example 1: Creating DataFrames
data = {
'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 24, 35, 32]
}
df = pd.DataFrame(data)
print(df)
Example 2: Data Inspection
print(df.head())
print(df.info())
print(df.describe())
Example 3: Data Selection
print(df['Name'])
print(df.iloc[1:3])
print(df[df['Age'] > 30])
Example 4: Handling Missing Values
df.fillna(0, inplace=True)
print(df)
Example 5: Removing Duplicates
df.drop_duplicates(inplace=True)
print(df)
Example 6: Adding New Columns
df['Salary'] = [50000, 60000, 70000, 80000]
print(df)
Example 7: Applying Functions
df['Age'] = df['Age'].apply(lambda x: x + 1)
print(df)
Example 8: Merging Data
df1 = pd.DataFrame({'Name': ['John', 'Anna'], 'Age': [28, 24]})
df2 = pd.DataFrame({'Name': ['John', 'Anna'], 'Salary': [50000, 60000]})
merged_df = pd.merge(df1, df2, on='Name')
print(merged_df)
Example 9: Grouping Data
print(df.groupby('Age').size())
Example 10: Plotting Data
import matplotlib.pyplot as plt
df['Age'].plot(kind='bar')
plt.show()
Downloading the Pandas Cheat Sheet PDF
For your convenience, you can download the complete Pandas cheat sheet in PDF format from the following link:
FAQs
What is a Pandas cheat sheet? A Pandas cheat sheet is a quick reference guide that includes the most commonly used functions and methods in the Pandas library, helping users quickly recall and implement them in their data analysis tasks.
How can I install Pandas? You can install Pandas using pip with the command pip install pandas
.
What is the difference between a Series and a DataFrame in Pandas? A Series is a one-dimensional labeled array, while a DataFrame is a two-dimensional labeled data structure with columns of potentially different types.
How do I handle missing values in Pandas? You can handle missing values by using the fillna()
method to fill them or the dropna()
method to remove them.
Can I plot data using Pandas? Yes, Pandas has built-in plotting capabilities using the plot()
method, which integrates with Matplotlib.
What are aggregation functions in Pandas? Aggregation functions in Pandas, such as sum()
, mean()
, and count()
, are used to perform operations on groups of data.