Filtering data is a common operation in data analysis. Pandas allows us to filter data based on different conditions.
We can filter the data in Pandas in two main ways:
- By column names (Labels)
 - By the actual data inside (Values)
 
Filter Data By Labels
We can use the filter() function to select columns by their names or labels. Let's look at an example.
import pandas as pd
# create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Department': ['HR', 'Marketing', 'Marketing', 'IT'],
        'Salary': [50000, 60000, 55000, 70000]}
df = pd.DataFrame(data)
# display the original DataFrame
print("Original DataFrame:")
print(df)
print("\n")
# use the filter() method to select columns based on a condition
filtered_df = df.filter(items=['Name', 'Salary']) 
# display the filtered DataFrame
print("Filtered DataFrame:")
print(filtered_df)
Output
Original DataFrame:
      Name Department  Salary
0    Alice         HR   50000
1      Bob  Marketing   60000
2  Charlie  Marketing   55000
3    David         IT   70000
Filtered DataFrame:
      Name  Salary
0    Alice   50000
1      Bob   60000
2  Charlie   55000
3    David   70000
In this example, we used filter() to select the columns Name and Salary using their columns names.
To learn more, visit Pandas filter().
Filter Data By Values
We can also filter data by values. Some of the common ways to filter data by values are:
- Using logical operator
 - The 
isin()method - The 
strAccessor - The 
query()method 
Logical Operators
You can filter rows based on column values using logical operators. For example,
import pandas as pd
# create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Department': ['HR', 'Marketing', 'Marketing', 'IT'],
        'Salary': [50000, 60000, 55000, 70000]}
df = pd.DataFrame(data)
# display the original DataFrame
print("Original DataFrame:")
print(df)
print("\n")
# use logical operators to filter
filtered_df = df[df.Salary > 55000]
# display the filtered DataFrame
print("Filtered DataFrame:")
print(filtered_df)
Output
Original DataFrame:
      Name Department  Salary
0    Alice         HR   50000
1      Bob  Marketing   60000
2  Charlie  Marketing   55000
3    David         IT   70000
Filtered DataFrame:
    Name Department  Salary
1    Bob  Marketing   60000
3  David         IT   70000
In the above example, we selected the rows based on the condition Salary > 55000 using logical operator >.
The isin() Method
The isin() method provides another way to filter data using column values. Let's look at an example.
import pandas as pd
# create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Department': ['HR', 'Marketing', 'Marketing', 'IT'],
        'Salary': [50000, 60000, 55000, 70000]}
df = pd.DataFrame(data)
# display the original DataFrame
print("Original DataFrame:")
print(df)
print("\n")
# use isin() method
departments = ['HR', 'IT']
filtered_df = df[df.Department.isin(departments)]
# display the filtered DataFrame
print("Filtered DataFrame:")
print(filtered_df)
Output
Original DataFrame:
      Name Department  Salary
0    Alice         HR   50000
1      Bob  Marketing   60000
2  Charlie  Marketing   55000
3    David         IT   70000
Filtered DataFrame:
    Name Department  Salary
0  Alice         HR   50000
3  David         IT   70000
In this example, we selected the rows whose Department values are present in the departments list.
The str Accessor
We can effectively filter rows based on string values using the str accessor. Let's look at an example.
import pandas as pd
# create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Department': ['HR', 'Marketing', 'Marketing', 'IT'],
        'Salary': [50000, 60000, 55000, 70000]}
df = pd.DataFrame(data)
# display the original DataFrame
print("Original DataFrame:")
print(df)
print("\n")
# use str accessor
filtered_df = df[df.Department.str.contains('Market')]
# display the filtered DataFrame
print("Filtered DataFrame:")
print(filtered_df)
Output
Original DataFrame:
      Name Department  Salary
0    Alice         HR   50000
1      Bob  Marketing   60000
2  Charlie  Marketing   55000
3    David         IT   70000
Filtered DataFrame:
      Name Department  Salary
1      Bob  Marketing   60000
2  Charlie  Marketing   55000
Here, we filtered the rows based on a string value. We only selected the rows whose Department values contained the string Market.
The query() Method
This is the most flexible method for filtering a dataframe based on column values.
A query containing the filtering conditions can be passed as a string to the query() method.
Let's look at an example.
import pandas as pd
# create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Department': ['HR', 'Marketing', 'Marketing', 'IT'],
        'Salary': [50000, 60000, 55000, 70000]}
df = pd.DataFrame(data)
# display the original DataFrame
print("Original DataFrame:")
print(df)
print("\n")
# use query method
filtered_df = df.query('Salary > 55000 and Department == "Marketing"')
# display the filtered DataFrame
print("Filtered DataFrame:")
print(filtered_df)
Output
Original DataFrame:
      Name Department  Salary
0    Alice         HR   50000
1      Bob  Marketing   60000
2  Charlie  Marketing   55000
3    David         IT   70000
Filtered DataFrame:
  Name Department  Salary
1  Bob  Marketing   60000
In this example, we selected the rows with Salary > 55000 and Department == "Marketing" using the query() method.