Big Data – Hojoong Chung

After finishing the first assignment of Udacity Data Analyst Nanodegree, I decided to summarize and record most commonly used Pandas commands here. The first topic I would like to post is the lambda function.

Lambda function in Pandas can be used via apply() command something like below:

df = pd.DataFrame({'A':[1,2],'B':[3,4]})
df.apply(lambda x : x + 1)

Above code adds 1 to all the data resides in the DataFrame named ‘df’. The result looks like below:

__| A B
0 | 2 4
1 | 3 5

Lambda function can be applied to a single column using below code:

df['A'].apply(lambda x : x + 1)

But the result only shows the index and values without the column name like below:

0 | 2
1 | 3

To include the column name, following code can be used:

df.apply({'A': lambda x : x + 1})

And the result will look like this:

__| A
0 | 2
1 | 3

This lambda function can be used in the combination with groupby() function as well.

df = pd.DataFrame({'Year':[2016,2016,2017,2017],'Student':['Paul','Jack', 'Paul', 'Jack'], 'Score':[90,80,100,70]})
df.groupby('Student').apply(lambda x : x[x.Score == x.Score.max()])

Above code applies the lambda function after the data is grouped by ‘Student’ column values, in this case it aggregates the data based on ‘Paul’ and ‘Jack’. Lambda function collects only the row that are matching with the maximum value in that group. The result looks like below:

__________| Score Student Year
Student
Jack    1 | 80    Jack    2016
Paul    2 | 100   Paul    2017

Also, the same can be achieved by using agg() function after the groupby():

df.groupby('Student').agg({lambda x : x.max()})

This code provides much cleaner result than the previous code:

          | Score   Year
__________|(lambda) (lambda)
Student
Jack      | 80      2016
Paul      | 100     2017

Tag: Big Data

Pandas – Lambda function