기본 코드
import pandas as pd
import numpy as np
student_list = [{'name': 'John', 'major': "Computer Science", 'sex': "male"},
{'name': 'Nate', 'major': "Computer Science", 'sex': "male"},
{'name': 'Abraham', 'major': "Physics", 'sex': "male"},
{'name': 'Brian', 'major': "Psychology", 'sex': "male"},
{'name': 'Janny', 'major': "Economics", 'sex': "female"},
{'name': 'Yuna', 'major': "Economics", 'sex': "female"},
{'name': 'Jeniffer', 'major': "Computer Science", 'sex': "female"},
{'name': 'Edward', 'major': "Computer Science", 'sex': "male"},
{'name': 'Zara', 'major': "Psychology", 'sex': "female"},
{'name': 'Wendy', 'major': "Economics", 'sex': "female"},
{'name': 'Sera', 'major': "Psychology", 'sex': "female"}
]
df = pd.DataFrame(student_list, columns = ['name', 'major', 'sex'])
groupby_major = df.groupby('major')
groupby_major.groups
{'Computer Science': [0, 1, 6, 7], 'Economics': [4, 5, 9], 'Physics': [2], 'Psychology': [3, 8, 10]}
for name, group in groupby_major:
print(name + ": " + str(len(group)))
print(group)
print()
Computer Science: 4
name major sex
0 John Computer Science male
1 Nate Computer Science male
6 Jeniffer Computer Science female
7 Edward Computer Science male
Economics: 3
name major sex
4 Janny Economics female
5 Yuna Economics female
9 Wendy Economics female
Physics: 1
name major sex
2 Abraham Physics male
Psychology: 3
name major sex
3 Brian Psychology male
8 Zara Psychology female
10 Sera Psychology female
그룹 객체를 데이터프레임으로 생성할 수 있다.
df_major_cnt = pd.DataFrame( {'count' : groupby_major.size()} )
df_major_cnt
count
major
Computer Science 4
Economics 3
Physics 1
Psychology 3
위 출력에서 major, count를 보면 이상하게 나온다.
major를 column 값으로 넣기 위해서 reset_index()하면 된다.
df_major_cnt = pd.DataFrame( {'count' : groupby_major.size()} ).reset_index()
df_major_cnt
major count
0 Computer Science 4
1 Economics 3
2 Physics 1
3 Psychology 3