반응형

소프트웨어 1039

Adversarial Attacks in NLP 관련 논문 정리 - 6

https://arxiv.org/abs/2503.11517 Prompt Injection Detection and Mitigation via AI Multi-Agent NLP FrameworksPrompt injection constitutes a significant challenge for generative AI systems by inducing unintended outputs. We introduce a multi-agent NLP framework specifically designed to address prompt injection vulnerabilities through layered detection and enforcemarxiv.org이 것도 Agent 구조인데...결국 많은 필..

Adversarial Attacks in NLP 관련 논문 정리 - 5

https://aclanthology.org/2025.findings-naacl.123/ Attention Tracker: Detecting Prompt Injection Attacks in LLMsKuo-Han Hung, Ching-Yun Ko, Ambrish Rawat, I-Hsin Chung, Winston H. Hsu, Pin-Yu Chen. Findings of the Association for Computational Linguistics: NAACL 2025. 2025.aclanthology.org이 논문은 Attention 패턴 관점에서 prompt injection 공격 메커니즘을 분석합니다.black box 모델에선 불 가능한 조건이 되는 거죠...원래는 Instruction에 높은 ..

Adversarial Attacks in NLP 관련 논문 정리 - 4

https://arxiv.org/abs/2401.15897 Red-Teaming for Generative AI: Silver Bullet or Security Theater?In response to rising concerns surrounding the safety, security, and trustworthiness of Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red-teaming as a key component of their strategies for identifying and mitigating tharxiv.orgSurvey 논문 이네요 미 백악관에서 행정명령으로 발표한 AI..

Adversarial Attacks in NLP 관련 논문 정리 - 3

https://www.semanticscholar.org/paper/A-Survey-of-Adversarial-Defenses-and-Robustness-in-Goyal-Doddapaneni/83cebf919635504786fc220d569284842b0f0a09 https://www.semanticscholar.org/paper/A-Survey-of-Adversarial-Defenses-and-Robustness-in-Goyal-Doddapaneni/83cebf919635504786fc220d569284842b0f0a09 www.semanticscholar.org 서베이 논문은 너무 길어서 적당히 보고 넘어 가는 것으로...방어 방법에 대한 논문이었습니다학습 - 데이터 증강, 정규화, GAN, VAT,..

Adversarial Attacks in NLP 관련 논문 정리 - 2

https://arxiv.org/abs/2004.14174 Reevaluating Adversarial Examples in Natural LanguageState-of-the-art attacks on NLP models lack a shared definition of a what constitutes a successful attack. We distill ideas from past work into a unified framework: a successful natural language adversarial example is a perturbation that fools the model anarxiv.org여기선 문장의 의미, 문법, 가시적인지를 확인하며 공격을 진행합니다. 단어 유사도, ..

Adversarial Attacks in NLP 관련 논문 정리 - 1

https://arxiv.org/abs/2312.04730 DeceptPrompt: Exploiting LLM-driven Code Generation via Adversarial Natural Language InstructionsWith the advancement of Large Language Models (LLMs), significant progress has been made in code generation, enabling LLMs to transform natural language into programming code. These Code LLMs have been widely accepted by massive users and organizations. Hoarxiv.org 이 ..

Uncertainty estimation 관련 논문 정리 - 2

https://arxiv.org/abs/2112.13776 Transformer Uncertainty Estimation with Hierarchical Stochastic AttentionTransformers are state-of-the-art in a wide range of NLP tasks and have also been applied to many real-world products. Understanding the reliability and certainty of transformer model predictions is crucial for building trustable machine learning applicatiarxiv.org 이 논문은 기존 transformer구조가 Un..

728x90
728x90