2024.11.05 - [인공지능/XAI] - Sparse Autoencoder를 통한 LLM의 Bias 줄이기 - 성에 따른 직업 2
이번에는 SAE의 Feature에서 COS 유사도를 구하는 것이 아니라 그 이후의 레이어에서 확인해 보도록 하겠습니다.
Job | Gender Dominance | Female Percentage | Cosine Similarity with Woman | Cosine Similarity with Man |
skincare specialist | Female | 98.2 | 0.651186 | 0.556518 |
kindergarten teacher | Female | 96.8 | 0.649491 | 0.514379 |
childcare worker | Female | 94.6 | 0.71621 | 0.519848 |
secretary | Female | 92.5 | 0.583385 | 0.54581 |
hairstylist | Female | 92.4 | 0.634154 | 0.553486 |
dental assistant | Female | 92 | 0.594918 | 0.44044 |
nurse | Female | 91.3 | 0.712285 | 0.57134 |
school psychologist | Female | 90.4 | 0.640011 | 0.494764 |
receptionist | Female | 90 | 0.63295 | 0.544817 |
vet | Female | 89.8 | 0.521638 | 0.586588 |
nutritionist | Female | 89.6 | 0.620365 | 0.503985 |
maid | Female | 88.7 | 0.557642 | 0.581538 |
therapist | Female | 87.1 | 0.607497 | 0.489134 |
social worker | Female | 86.8 | 0.705663 | 0.54845 |
sewer | Female | 86.5 | 0.637374 | 0.63115 |
paralegal | Female | 84.8 | 0.625037 | 0.606166 |
library assistant | Female | 84.2 | 0.422588 | 0.460436 |
interior designer | Female | 83.8 | 0.637165 | 0.494837 |
manicurist | Female | 83 | 0.644541 | 0.701974 |
special education teacher | Female | 82.8 | 0.661597 | 0.507501 |
police officer | Male | 15.8 | 0.70545 | 0.507632 |
taxi driver | Male | 12 | 0.642695 | 0.500626 |
computer architect | Male | 11.8 | 0.651701 | 0.564702 |
mechanical engineer | Male | 9.4 | 0.637769 | 0.532792 |
truck driver | Male | 7.9 | 0.667702 | 0.494411 |
electrical engineer | Male | 7 | 0.628306 | 0.492114 |
landscaping worker | Male | 6.2 | 0.669272 | 0.483533 |
pilot | Male | 5.3 | 0.629289 | 0.603345 |
repair worker | Male | 5.1 | 0.678439 | 0.516247 |
firefighter | Male | 5.1 | 0.653101 | 0.52751 |
construction worker | Male | 4.2 | 0.718746 | 0.525603 |
machinist | Male | 3.4 | 0.644379 | 0.622361 |
aircraft mechanic | Male | 3.2 | 0.620817 | 0.460778 |
carpenter | Male | 3.1 | 0.654313 | 0.653469 |
roofer | Male | 2.9 | 0.624167 | 0.607007 |
brickmason | Male | 2.2 | 0.579816 | 0.553472 |
plumber | Male | 2.1 | 0.629519 | 0.639175 |
electrician | Male | 1.7 | 0.642468 | 0.532843 |
vehicle technician | Male | 1.2 | 0.614766 | 0.479526 |
crane operator | Male | 1.1 | 0.673586 | 0.602639 |
여기서도 여자가 전부다 COS 유사도가 높은 편향이 있네요.....
TSNE 또한 딱히 안 보이네요..
흠 UMAP 또한 아무것도 안보이네요...
Job: doctor
Cosine similarity with man: 0.6425841450691223
Cosine similarity with woman: 0.6595352292060852
Cosine similarity with he: 0.5585692524909973
Cosine similarity with she: 0.6580565571784973
Job: nurse
Cosine similarity with man: 0.5713397264480591
Cosine similarity with woman: 0.7122852802276611
Cosine similarity with he: 0.5225833654403687
Cosine similarity with she: 0.5876948833465576
Job: engineer
Cosine similarity with man: 0.6049615144729614
Cosine similarity with woman: 0.6559303402900696
Cosine similarity with he: 0.5704656839370728
Cosine similarity with she: 0.6277342438697815
Job: teacher
Cosine similarity with man: 0.5859898924827576
Cosine similarity with woman: 0.667400598526001
Cosine similarity with he: 0.5594150424003601
Cosine similarity with she: 0.6244168281555176
Job: scientist
Cosine similarity with man: 0.5397614240646362
Cosine similarity with woman: 0.6615307331085205
Cosine similarity with he: 0.5063120126724243
Cosine similarity with she: 0.5763662457466125
진짜 답이 안보이네요...
일단 여기까지는 미스트랄 모델이었으니 다른 모델로 변경해보겠습니다.
이 밑에 부턴 GPT-2 small 모델입니다.
Job | Gender Dominance | Female Percentage | Cosine Similarity with Woman | Cosine Similarity with Man |
skincare specialist | Female | 98.2 | 0.195972 | 0.204778 |
kindergarten teacher | Female | 96.8 | 0.220412 | 0.233106 |
childcare worker | Female | 94.6 | 0.21226 | 0.196284 |
secretary | Female | 92.5 | 0.317799 | 0.327826 |
hairstylist | Female | 92.4 | 0.197492 | 0.212216 |
dental assistant | Female | 92 | 0.292716 | 0.315205 |
nurse | Female | 91.3 | 0.367193 | 0.412158 |
school psychologist | Female | 90.4 | 0.252426 | 0.209854 |
receptionist | Female | 90 | 0.291924 | 0.294703 |
vet | Female | 89.8 | 0.41986 | 0.476376 |
nutritionist | Female | 89.6 | 0.28915 | 0.251744 |
maid | Female | 88.7 | 0.414578 | 0.406934 |
therapist | Female | 87.1 | 0.288946 | 0.296861 |
social worker | Female | 86.8 | 0.264815 | 0.248089 |
sewer | Female | 86.5 | 0.320507 | 0.360573 |
paralegal | Female | 84.8 | 0.195828 | 0.215684 |
library assistant | Female | 84.2 | 0.18374 | 0.167661 |
interior designer | Female | 83.8 | 0.292864 | 0.308887 |
manicurist | Female | 83 | 0.313635 | 0.443759 |
special education teacher | Female | 82.8 | 0.211079 | 0.203801 |
police officer | Male | 15.8 | 0.247464 | 0.196906 |
taxi driver | Male | 12 | 0.253791 | 0.227682 |
computer architect | Male | 11.8 | 0.229394 | 0.219098 |
mechanical engineer | Male | 9.4 | 0.301834 | 0.319501 |
truck driver | Male | 7.9 | 0.310374 | 0.335189 |
electrical engineer | Male | 7 | 0.242503 | 0.236696 |
landscaping worker | Male | 6.2 | 0.191117 | 0.189157 |
pilot | Male | 5.3 | 0.317637 | 0.349418 |
repair worker | Male | 5.1 | 0.214539 | 0.207093 |
firefighter | Male | 5.1 | 0.329568 | 0.336709 |
construction worker | Male | 4.2 | 0.301041 | 0.298139 |
machinist | Male | 3.4 | 0.246829 | 0.273729 |
aircraft mechanic | Male | 3.2 | 0.278695 | 0.294949 |
carpenter | Male | 3.1 | 0.251271 | 0.283897 |
roofer | Male | 2.9 | 0.324939 | 0.361726 |
brickmason | Male | 2.2 | 0.201017 | 0.226547 |
plumber | Male | 2.1 | 0.308215 | 0.334688 |
electrician | Male | 1.7 | 0.205533 | 0.195738 |
vehicle technician | Male | 1.2 | 0.263219 | 0.261841 |
crane operator | Male | 1.1 | 0.264234 | 0.276296 |
뭔가 고르게 분포된 이 느낌....
확실하게 남자가 많은 직업은 남자한테 편향적이다! 이런 것은 아니네요
TSNE와 UMAP 또한 알 수가 없네요....
Job: doctor
Cosine similarity with man: 0.3023069500923157
Cosine similarity with woman: 0.3201085031032562
Cosine similarity with he: 0.235477015376091
Cosine similarity with she: 0.2189578413963318
Job: nurse
Cosine similarity with man: 0.4121583104133606
Cosine similarity with woman: 0.36719265580177307
Cosine similarity with he: 0.34656408429145813
Cosine similarity with she: 0.3177404999732971
Job: engineer
Cosine similarity with man: 0.2845001220703125
Cosine similarity with woman: 0.28417065739631653
Cosine similarity with he: 0.21660952270030975
Cosine similarity with she: 0.199890598654747
Job: teacher
Cosine similarity with man: 0.405282199382782
Cosine similarity with woman: 0.4033980965614319
Cosine similarity with he: 0.3337845802307129
Cosine similarity with she: 0.3021133244037628
Job: scientist
Cosine similarity with man: 0.23807066679000854
Cosine similarity with woman: 0.26343828439712524
Cosine similarity with he: 0.1863318830728531
Cosine similarity with she: 0.17157433927059174
흠.....
그럼 이번엔 SAE layer의 다음 레이어인 8번 레이어로 가보겠습니다.
Job | Gender Dominance | Female Percentage | Cosine Similarity with Woman | Cosine Similarity with Man |
skincare specialist | Female | 98.2 | 0.830806 | 0.8029 |
kindergarten teacher | Female | 96.8 | 0.822819 | 0.789158 |
childcare worker | Female | 94.6 | 0.868605 | 0.81984 |
secretary | Female | 92.5 | 0.81239 | 0.773908 |
hairstylist | Female | 92.4 | 0.816338 | 0.786998 |
dental assistant | Female | 92 | 0.850209 | 0.816433 |
nurse | Female | 91.3 | 0.866288 | 0.812078 |
school psychologist | Female | 90.4 | 0.838778 | 0.792577 |
receptionist | Female | 90 | 0.844404 | 0.80642 |
vet | Female | 89.8 | 0.827386 | 0.810253 |
nutritionist | Female | 89.6 | 0.84587 | 0.802108 |
maid | Female | 88.7 | 0.89618 | 0.827164 |
therapist | Female | 87.1 | 0.852225 | 0.817951 |
social worker | Female | 86.8 | 0.826142 | 0.779234 |
sewer | Female | 86.5 | 0.833624 | 0.806207 |
paralegal | Female | 84.8 | 0.772227 | 0.765412 |
library assistant | Female | 84.2 | 0.833513 | 0.783677 |
interior designer | Female | 83.8 | 0.827806 | 0.79507 |
manicurist | Female | 83 | 0.865115 | 0.874826 |
special education teacher | Female | 82.8 | 0.822208 | 0.772009 |
police officer | Male | 15.8 | 0.822562 | 0.784152 |
taxi driver | Male | 12 | 0.838859 | 0.810866 |
computer architect | Male | 11.8 | 0.840573 | 0.813889 |
mechanical engineer | Male | 9.4 | 0.869139 | 0.851203 |
truck driver | Male | 7.9 | 0.859283 | 0.838243 |
electrical engineer | Male | 7 | 0.810291 | 0.77179 |
landscaping worker | Male | 6.2 | 0.833979 | 0.809405 |
pilot | Male | 5.3 | 0.842505 | 0.814605 |
repair worker | Male | 5.1 | 0.859992 | 0.819944 |
firefighter | Male | 5.1 | 0.860235 | 0.832443 |
construction worker | Male | 4.2 | 0.844144 | 0.809736 |
machinist | Male | 3.4 | 0.833735 | 0.833825 |
aircraft mechanic | Male | 3.2 | 0.847732 | 0.824421 |
carpenter | Male | 3.1 | 0.805981 | 0.79008 |
roofer | Male | 2.9 | 0.794447 | 0.777306 |
brickmason | Male | 2.2 | 0.814994 | 0.809293 |
plumber | Male | 2.1 | 0.8419 | 0.810002 |
electrician | Male | 1.7 | 0.847812 | 0.820417 |
vehicle technician | Male | 1.2 | 0.837883 | 0.809716 |
crane operator | Male | 1.1 | 0.85486 | 0.827914 |
ㅋㅋㅋㅋㅋ.ㅋ.....
이번에도 여자가 다 높게 나오네요 ㅎㅎ.......
TSNE 시각화에선 그래도 남자와 여자가 군집화 된 것을 볼 수 있습니다...?
UMAP도 구분 된 것 같은 모습으로 보이네요
그럼 이번엔 PCA를 진행 후 COS 유사도를 진행해보겠습니다.
Job | Gender Dominance | Female Percentage | Cosine Similarity with Woman | Cosine Similarity with Man |
skincare specialist | Female | 98.2 | -0.33386 | -0.24223 |
kindergarten teacher | Female | 96.8 | -0.29054 | -0.20816 |
childcare worker | Female | 94.6 | 0.090889 | -0.07886 |
secretary | Female | 92.5 | -0.05085 | -0.08793 |
hairstylist | Female | 92.4 | -0.22697 | -0.09444 |
dental assistant | Female | 92 | -0.13962 | -0.20142 |
nurse | Female | 91.3 | 0.279483 | 0.0298 |
school psychologist | Female | 90.4 | -0.03985 | -0.18175 |
receptionist | Female | 90 | -0.09819 | -0.12192 |
vet | Female | 89.8 | -0.02614 | 0.058945 |
nutritionist | Female | 89.6 | -0.11909 | -0.24243 |
maid | Female | 88.7 | 0.515144 | 0.152515 |
therapist | Female | 87.1 | -0.00373 | -0.05599 |
social worker | Female | 86.8 | -0.04072 | -0.14932 |
sewer | Female | 86.5 | -0.01937 | 0.005589 |
paralegal | Female | 84.8 | -0.11654 | 0.030654 |
library assistant | Female | 84.2 | -0.10128 | -0.23378 |
interior designer | Female | 83.8 | -0.12399 | -0.14279 |
manicurist | Female | 83 | 0.099113 | 0.419571 |
special education teacher | Female | 82.8 | -0.22952 | -0.35435 |
police officer | Male | 15.8 | 0.073392 | -0.00112 |
taxi driver | Male | 12 | -0.16622 | -0.15999 |
computer architect | Male | 11.8 | -0.04169 | -0.01093 |
mechanical engineer | Male | 9.4 | -0.01881 | 0.112229 |
truck driver | Male | 7.9 | -0.01101 | 0.06296 |
electrical engineer | Male | 7 | -0.32179 | -0.31583 |
landscaping worker | Male | 6.2 | -0.34693 | -0.23745 |
pilot | Male | 5.3 | 0.043929 | 0.0348 |
repair worker | Male | 5.1 | 0.041328 | -0.10179 |
firefighter | Male | 5.1 | -0.0164 | -0.03699 |
construction worker | Male | 4.2 | -0.19002 | -0.15065 |
machinist | Male | 3.4 | -0.07069 | 0.186568 |
aircraft mechanic | Male | 3.2 | -0.17663 | -0.12196 |
carpenter | Male | 3.1 | -0.31704 | -0.10918 |
roofer | Male | 2.9 | -0.15045 | -0.06005 |
brickmason | Male | 2.2 | -0.09472 | 0.100032 |
plumber | Male | 2.1 | -0.04168 | -0.08663 |
electrician | Male | 1.7 | -0.17601 | -0.10355 |
vehicle technician | Male | 1.2 | -0.12613 | -0.07126 |
crane operator | Male | 1.1 | -0.18231 | -0.10622 |
여기도 편향이 그렇게 강하다! 라고 볼 순 없네요,,,>?ㅜㅜㅜㅜ
또 다른 아이디어가 떠오르면 작성하겠습니다....
'인공지능 > XAI' 카테고리의 다른 글
Sparse Autoencoder를 통한 LLM의 Bias 줄이기 - 성에 따른 직업 5 (1) | 2024.11.30 |
---|---|
Sparse Autoencoder를 통한 LLM의 Bias 줄이기 - 성에 따른 직업 4 (1) | 2024.11.29 |
Sparse Autoencoder를 통한 LLM의 Bias 줄이기 - 성에 따른 직업 2 (0) | 2024.11.27 |
Sparse Autoencoder를 통한 LLM의 Bias 줄이기 - 성에 따른 직업 1 (0) | 2024.11.26 |
SelfIE 주간 세미나 발표 (0) | 2024.11.25 |