Google Colaboratory

GitHub - perfume-reconmendation/word-embedding_hyun at master

word2vec_similarity.py

if __name__=="__main__":
    user_sentence = 'The guitarist of the band Sensual and sexy Wearing a shirt and ripped jeans Sweet and drowsy eyes He soaked in sweat in the heat of the stage'
    label=2
    print(word2vec_similarity(user_sentence, label))

결과 : 아래의 데이터프레임을 dictionary화 해서 return

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/fe35addc-e14a-4807-89fc-c5b61e717dde/Untitled.png

highlighter.py

if __name__=="__main__":
    import word2vec_similarity
    user_sentence = 'The guitarist of the band Sensual and sexy Wearing a shirt and ripped jeans Sweet and drowsy eyes He soaked in sweat in the heat of the stage'
    label = 2
    top3_df_dic = word2vec_similarity.word2vec_similarity(user_sentence, label)
    model_path = './model/w2v_10window'
    print(keyword_highlighter(user_sentence, top3_df_dic, model_path))

결과 : word2vec_similarity로 반환된 데이터프레임에 컬러 지정한 컬럼이 추가된 df를 딕셔너리화 해서 return

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/288cdd8d-05da-4ecb-b135-dc12b9b09860/Untitled.png

Issue

용량이 큰 데이터셋과 모델은 깃헙에 로드하지못하였음

여기서 py파일만 github의 master브랜치에 푸쉬한 상태

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/fb0873ea-c8b3-45fa-9ac5-2f7848c90410/Untitled.png

dataset

dataset_210626_215600.csv

stopwords.json