Important Pandas concepts for Machine Learning & Data Science Interviews✨ (Updated)

Pandas concepts that you must know to ace your coding interviews & excel in your Data Science journey.

Karan Kaul | カラン
Python in Plain English

--

Pandas apply method, groupby & lambda tutorial
Image by Author

These are the 3 concepts that I think are the most essential when it comes to Pandas.

1. Know how to use the “apply” method

This method basically allows you to “apply” certain specified operations on your data frame. Either it can be on the whole data frame or maybe on some chosen columns.

It comes in handy, especially when you want to quickly apply a certain modification to your columns, for example, “lowering” each value in a column or maybe “remove stop words” from that column.

Here is a piece of code that demonstrates the usage of the “apply” method on a given column in pandas—

# assume your dataframe has the column 'usernames'
# let's lower each value in this column
data["usernames"] = data["usernames"].apply(str.lower)

# calculate number of characters & save that info in a new column
data["character_count"] = data["usernames"].apply(len)

Note that the “lower” method works on strings only and we usually do something like: myString.lower( ), but since we are not doing it this way, we have to use it by calling the “str” class first.

2. Using the “GroupBy” statement to Group data

Many times we need to group our data based on some particular values. For example, maybe you have a data frame that looks like this —

Using the “GroupBy” statement to Group data in pandas — python code tutorial
This is a transposed view, cols & rows are interchanged

The data frame tells us:

  • House “a” has “2” rooms on floor “1”
  • House “a” has “2” rooms on floor “2”
  • House “b” has “2” rooms on floor “1”
  • House “b” has “1” room on floor “2”
  • and so on…

If you want to get the total number of rooms in each house, you will need to group this data by “house” & then “count” the total number of “rooms” on each floor.

Here is the code to group this data by “house” & add the number of rooms to get the total —

df.groupby(by="house", as_index=False).agg({
"rooms" : sum
})
grouping data in pandas to find number of rooms in each house python
# of rooms in each house

Inside the aggregate(agg) function, we specify the operation to perform on a given column, after the data is grouped. In our case, we want to “sum” the “rooms” after the data is grouped, so we used “rooms” as column & “sum” as the operation.

To understand this better, let’s also group the data by “floor” & see how many “rooms” each floor has, regardless of the “house” —

df.groupby(by="floor", as_index=False).agg({
"rooms" : sum
})
grouping data in pandas to find number of rooms on each floor python
# of rooms on each floor regardless of house

Before we move on to the third point, subscribe to my newsletter to get articles directly in your inbox —

3. Using custom “lambda” functions to Filter or Process data

In point #1 about the “apply” method, we only used inbuilt methods to process our data. Many times these inbuilt functions are not enough to give us the output that is expected.

To help us with this, we can use custom functions in the “apply” method using “lambda”. Here is how —

# assume your dataframe has the column 'query', containing numbers as well
# let's lower each value in this column like we did in point #1
data["query"] = data["query"].apply(str.lower)


# the above code will output an error as we cannot lower the 'numbers'
# let's write more complex code using lambda function to remove this bug
# we will lower only if the value is a 'string'
data["query"] = data["query"].apply(lambda q:q.lower() if type(q) == str else q)

You can also define a custom function separately & then call that inside the “apply” method like this —

# define a function
def lower_query(query):
if type(query) == str:
return query.lower()
return query

# call the defined function inside apply
data["query"] = data["query"].apply(lower_query)

There is a lot more to Pandas than what we covered in this article, but these 3 things I think are the most essential & must-know concepts in Pandas for anyone.

Thanks for reading! Please drop some claps, comment & share the article if you found my writing helpful. 🖤👏🏻

Other Recommended Posts —

In Plain English

Thank you for being a part of our community! Before you go:

--

--

✨ Machine Learning Consultant. Writes about Programming/Machine Learning! Instagram? @_krnk97