가영 | Notion

Dog Breed Identification

#1. 데이터 읽어오기
import pandas as pd

#상위 다섯개 항목 가져오기
label = pd.read_csv('/content/gdrive/MyDrive/SAI_dog_breed/labels.csv')
sample_submission = pd.read_csv('/content/gdrive/MyDrive/SAI_dog_breed/sample_submission.csv')
label.head()

# (2) 종('breed')에 대한 정보 확인하기(어떤 종류가 있는지, 몇개 있는지)

# label['breed'].value_counts().index
print(len(label['breed'].unique()))
label['breed'].unique()
#(3) id 값에 해당하는 이미지 경로를 'imgpath' 필드에 저장하기

filePath = '/content/gdrive/MyDrive/SAI_dog_breed/train/'
f = lambda x: filePath + x + '.jpg'

label['imgpath'] = label['id'].apply(f)
label.head()

#(4)'imgpath'에 있는 이미지를 읽어서(load_img 활용) array로 변환(img_to_array 활용)하여 'imgArray' 필드에 저장하기
from keras.preprocessing.image import img_to_array,load_img,ImageDataGenerator
img = load_img('/content/gdrive/MyDrive/SAI_dog_breed/train/e7af8f590b4fbdca0779f5e606ef91a1.jpg', target_size = (100, 100))
img = img_to_array(img)#array로 변환
print(img.shape)
ff = lambda x: img_to_array(load_img(x , target_size = (10, 10)))
label['imgArray'] = label['imgpath'].apply(ff)
label['imgArray'][0].shape
print(label)

<aside> 💡 데이터 읽어오기

</aside>

상위 5개 항목 확인하기

Untitled

종 breed 에 대한 정보 확인하기 (종류, 개수)

3)id 값에 해당하는 이미지 경로 imgpath 필드에 저장하기

imgpath에 있는 이미지 읽어와서 array로 변환하여 imag__Array에 저장하기

⇒label 출력

#2. 데이터 전처리
#(1) X: 이미지 데이터 정규화해 할당
X_train= label['imgArray'].values / 255
#(2) Y :정답레이블(breed)을 one-hot encoding 변환
from keras.utils import np_utils
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
a = le.fit_transform(label['breed'])
print(a)
Y_train = np_utils.to_categorical(a)
Y_train[0]
#(3) train, test set 분리하기
from sklearn.model_selection import train_test_split
x_train, x_valid, y_train, y_valid = train_test_split(X_train, Y_train, test_size=0.2, random_state = 5)

<aside> 💡 데이터 전처리

</aside>

X- 이미지 데이터 정규화해 할당
Y- 정답 레이블(breed)을 one-hot encoding 변환