# Tabular Data Wrangling with Pandas

In this notebook, we'll explore how to manipulate and analyze tabular data using the powerful pandas library in Python. Pandas is essential for data scientists and analysts working with structured data.

First, let's import the pandas library and create a sample DataFrame to work with.

In [None]:
import pandas as pd
import numpy as np

# Create a sample DataFrame
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 28],
    'City': ['New York', 'San Francisco', 'London', 'Paris'],
    'Salary': [50000, 75000, 80000, 65000]
})

print(df)

Now, let's explore some basic operations on our DataFrame, such as selecting columns and filtering rows.

In [None]:
# Select specific columns
print(df[['Name', 'Age']])

# Filter rows based on a condition
print(df[df['Age'] > 30])

Pandas provides powerful functions for data manipulation. Let's look at sorting and adding new columns.

In [None]:
# Sort the DataFrame by Age
print(df.sort_values('Age'))

# Add a new column
df['Bonus'] = df['Salary'] * 0.1
print(df)

Group operations are crucial for data analysis. Let's group our data by City and calculate some statistics.

In [None]:
# Group by City and calculate mean Age and Salary
city_stats = df.groupby('City').agg({
    'Age': 'mean',
    'Salary': 'mean'
})
print(city_stats)

Finally, let's demonstrate how to handle missing data, which is common in real-world datasets.

In [None]:
# Introduce some missing values
df.loc[1, 'Salary'] = np.nan
df.loc[3, 'Age'] = np.nan
print(df)

# Fill missing values
df['Salary'] = df['Salary'].fillna(df['Salary'].mean())
df['Age'] = df['Age'].fillna(df['Age'].median())
print(df)

This notebook has covered some fundamental operations in pandas for tabular data wrangling. Practice these techniques to become proficient in data manipulation with pandas!