반응형

전체 글 929

Gender Bias in Neural Natural Language Processing - 논문 리뷰

https://arxiv.org/abs/1807.11714 Gender Bias in Neural Natural Language ProcessingWe examine whether neural natural language processing (NLP) systems reflect historical biases in training data. We define a general benchmark to quantify gender bias in a variety of neural NLP tasks. Our empirical evaluation with state-of-the-art neural coarxiv.org 여기선 단어를 교체하면서 임베딩 공간, attention score를 보고 편향을 확인했습..

Could an artificial-intelligence agent pass an introductory physics course? - 논문리뷰

https://journals.aps.org/prper/abstract/10.1103/PhysRevPhysEducRes.19.010132 저는 멀티 에이전트, 컴퓨터의 모든 것을 관할하는 Agent를 확인하고 싶었는데 여기서 Agent == Chat GPT 였네요...게다가 나온지 오래된 모델이라 지금 모델이랑 비교하면 좀 차이가 클 것으로 예상됩니다.이 전의 언어모델의 약점이 무엇이었는지 확인하는 차 적당히 보고 지나가면 될 것 같습니다.쉬운 코딩 문제는 잘 풀지만 물리학 입문 과정 조차 잘 풀지 못한다.약점으론 수학적 계산 오류, 논리적 오류, 개념적 이해 부족이 있고 학습 능력(지식 업데이트)나 메타인지(자기 점검 능력)이 없다.데이터 셋이 고정되었고(2021년), 매 입력마다 출력이 바뀌며 불안정..

AI Agents That Matter - 논문 리뷰

https://arxiv.org/abs/2407.01502 AI Agents That MatterAI agents are an exciting new research direction, and agent development is driven by benchmarks. Our analysis of current agent benchmarks and evaluation practices reveals several shortcomings that hinder their usefulness in real-world applications. First,arxiv.org 저는 AI Agent에 대한 방법론을 확인해 보고 싶었는데 이 논문은 벤치마크에 대한 논문이었습니다.기존 벤치마크는 정확도에만 집중해서 그에 ..

NOT ALL LANGUAGE MODEL FEATURES ARE LINEAR - 논문 리뷰

https://arxiv.org/abs/2405.14860 Not All Language Model Features Are LinearRecent work has proposed that language models perform computation by manipulating one-dimensional representations of concepts ("features") in activation space. In contrast, we explore whether some language model representations may be inherently multi-dimearxiv.org 결국 여태까지 SAE 진행한 것과 큰 차이점은 없지만 코사인 유사도가 높은 것들을 고르다 보면 순환적인..

Investigating Gender Bias in Language Models Using Causal Mediation Analysis - 논문 리뷰

https://proceedings.neurips.cc/paper/2020/file/92650b2e92217715fe312e6fa7b90d82-Paper.pdf이 논문은 성별 편향이 언어 모델 내부에서 어떻게 발생하고 전달되는지 분석하기 위해 인과 매개 분석(Causal Mediation Analysis)을 도입했다. GPT-2 모델을 대상으로 뉴런과 어텐션 헤드가 성별 편향 정보를 매개하는 역할을 직접적/간접적으로 분리하여 측정했다성별 편향은 소수의 뉴런과 어텐션 헤드에 집중되며, 모델 크기가 커질수록 이러한 편향이 더 강하게 나타났다.Professions, Winobias, Winogender와 같은 데이터셋을 사용하여 단어와 문맥 수준에서 편향을 평가했다.편향을 식별하고 분석하는 데 중점을 두었으며..

Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models - 논문 리뷰

https://arxiv.org/abs/2305.14705 Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language ModelsSparse Mixture-of-Experts (MoE) is a neural architecture design that can be utilized to add learnable parameters to Large Language Models (LLMs) without increasing inference cost. Instruction tuning is a technique for training LLMs to follow instructions.arxiv.org  이 논문은 "M..

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts - 논문 리뷰

https://arxiv.org/abs/2112.06905 GLaM: Efficient Scaling of Language Models with Mixture-of-ExpertsScaling language models with more data, compute and parameters has driven significant progress in natural language processing. For example, thanks to scaling, GPT-3 was able to achieve strong results on in-context learning tasks. However, training these laarxiv.org MoE는 파라미터를 늘리면서도 추론 속도나 전력 사용을 줄였..

Using Degeneracy in the Loss Landscape for Mechanistic Interpretability - 논문리뷰

https://arxiv.org/abs/2405.10927 Using Degeneracy in the Loss Landscape for Mechanistic InterpretabilityMechanistic Interpretability aims to reverse engineer the algorithms implemented by neural networks by studying their weights and activations. An obstacle to reverse engineering neural networks is that many of the parameters inside a network are not involvarxiv.org 신경망의 해석 가능성을 방해하는 퇴행적 구조를 해결..

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding - 논문 리뷰

https://arxiv.org/abs/2006.16668 GShard: Scaling Giant Models with Conditional Computation and Automatic ShardingNeural network scaling has been critical for improving the model quality in many real-world machine learning applications with vast amounts of training data and compute. Although this trend of scaling is affirmed to be a sure-fire approach for better modelarxiv.org      이 논문은 대규모 신경망을..

Benchmarking Large Language Models in Retrieval-Augmented Generation - 논문리뷰

https://arxiv.org/abs/2309.01431 Benchmarking Large Language Models in Retrieval-Augmented GenerationRetrieval-Augmented Generation (RAG) is a promising approach for mitigating the hallucination of large language models (LLMs). However, existing research lacks rigorous evaluation of the impact of retrieval-augmented generation on different large languagearxiv.org 이 논문에선 LLM의 할루시네이션, 지식 갱신을 해결하기 ..

728x90
728x90