Posts Sort multiple variables
Post
Cancel

Sort multiple variables

Usually, we are proficient at sorting the data frame/table by one variable. But there are cases that we need a second variable to break the ties. In this post, I will summarize how to do this in Python, R, and SQL.

The question considered here is a table of temperature per day. We are asked to sort the temperature in descending order and while there is a lie, sort the data in ascending order.

record_idtemperaturedate
1321987-01-01
2331987-01-02
3331987-01-03
4291987-01-04
5321987-01-05
6401987-01-06

Python tuple

1
2
3
4
5
6
7
import pandas as pd

rawdata = pd.read_csv('/Users/lucyliu/Desktop/smallData.csv')
rawdata['date'] = pd.to_datetime(rawdata.date)
dataTuple = list(rawdata.itertuples(index = False, name = None))

sorted(dataTuple, key=lambda x: (-x[1], x[2]))
1
2
3
4
5
6
[(6, 40, Timestamp('1987-01-06 00:00:00')),
 (2, 33, Timestamp('1987-01-02 00:00:00')),
 (3, 33, Timestamp('1987-01-03 00:00:00')),
 (1, 32, Timestamp('1987-01-01 00:00:00')),
 (5, 32, Timestamp('1987-01-05 00:00:00')),
 (4, 29, Timestamp('1987-01-04 00:00:00'))]

In the sorted function, we used anonymous lambda function to select the variables, temperature and date. Also, there is a trick that adding - is equivalent to sorting in descending order.

Python Pandas

Pandas is a standard package to do data wrangling. The sort_values method in pandas is efficient and easy to use.

1
2
3
4
from tabulate import tabulate

sortedData = rawdata.sort_values(by=['temperature', 'date'], ascending = [False, True])
print(tabulate(sortedData, tablefmt="pipe", headers="keys", showindex = False))
record_idtemperaturedate
6401987-01-06 00:00:00
2331987-01-02 00:00:00
3331987-01-03 00:00:00
1321987-01-01 00:00:00
5321987-01-05 00:00:00
4291987-01-04 00:00:00

SQL

In SQL, we usually sort the table at the end of the command using ORDER BY.

1
ORDER BY temperature DESC, date ASC

R programming

In R, the library dplyr has similar functions as in Pandas.

1
2
3
4
library(dplyr)
smalldata %>% 
        arrange(desc(temperature), date) %>%
        kable(format = 'markdown')
record_idtemperaturedate
6401987-01-06
2331987-01-02
3331987-01-03
1321987-01-01
5321987-01-05
4291987-01-04
OLDER POST NEWER POST

Comments powered by Disqus.

Search Results