Hugging face Chat-ui, Vllm으로 챗봇 만들기

인공지능/자연어 처리

Hugging face Chat-ui, Vllm으로 챗봇 만들기

이게될까 2024. 10. 28. 22:09

728x90

GitHub - huggingface/chat-ui: Open source codebase powering the HuggingChat app

Open source codebase powering the HuggingChat app. Contribute to huggingface/chat-ui development by creating an account on GitHub.

github.com

https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html

Chat UI는 모델을 잘 보일 수 있게 사이트를 만들어주는 느낌이고, Vllm은 모델을 빠르게 돌려주는 툴 인것 같네요

일단 Mongo DB도 설치해서 켜줘야 합니다.

mongod --dbpath ~/data/db --port {Mongo DB 포트 번호}

tmux로 하나 켜서 돌려놓으면 됩니다.

이건 sh파일로 만들어서 나중에 실행시킬 겁니다.

# @@
source ../miniconda3/bin/activate

# 기존 활성화된 환경 비활성화
conda deactivate

# 'llm' 환경 활성화
conda activate llm

# vllm 서버 실행
CUDA_VISIBLE_DEVICES=1 python3 -m vllm.entrypoints.openai.api_server \
--host localhost \
--port {내부 포트 번호} \
--model {모델 경로} \
--tokenizer {토크나이저 경로} \
--max_model_len 120000 \
--gpu_memory_utilization 0.9 &



# 서버가 시작될 시간을 기다림
sleep 10

# 'chat-ui' 디렉토리로 이동
cd chat-ui

# UI 서버 실행
npm run dev -- --host 0.0.0.0 --port {외부 포트 번호} &

cd ../

GPU 크기, 모델 크기에 따라 밑에 숫자는 변경해줘야 합니다.

만약 peft를 진행한 모델이라면 이렇게 lora 모듈도 실어줘야 합니다.

CUDA_VISIBLE_DEVICES=1 python3 -m vllm.entrypoints.openai.api_server \
--host localhost \
--port {내부 포트 번호} \
--model {베이스 모델} \
--tokenizer {토크나이저} \
--lora-modules '{"name": "이름 정해주기", "path": "경로 넣어주기"}' \
--served-model-name {위에서 정한 이름 넣어주기 } \
--max_model_len 120000 \
--gpu_memory_utilization 0.9 &

vllm은 그냥 이렇게 설치해도 돌아가더라고요

pip install vllm

이제 chat ui는 위 사이트에서 깃 클론 해오시면 됩니다.

이제 그 깃 클론 해온 chat ui에 .env.local 파일을 만듭니다.

저희는 vllm을 사용하기에 Readme와는 살짝 다른 점이 있기에 그 부분만 명확히 하면 됩니다.

MODELS=`[
  {
    "name": "Local microsoft/Phi-3-mini-4k-instruct-gguf",
    "tokenizer": "microsoft/Phi-3-mini-4k-instruct-gguf",
    "preprompt": "",
    "chatPromptTemplate": "<s>{{preprompt}}{{#each messages}}{{#ifUser}}<|user|>\n{{content}}<|end|>\n<|assistant|>\n{{/ifUser}}{{#ifAssistant}}{{content}}<|end|>\n{{/ifAssistant}}{{/each}}",
    "parameters": {
      "stop": ["<|end|>", "<|endoftext|>", "<|assistant|>"],
      "temperature": 0.7,
      "max_new_tokens": 1024,
      "truncate": 3071
    },
    "endpoints": [{
      "type" : "llamacpp",
      "baseURL": "http://localhost:8080"
    }],
  },
]`

이게 read me 에 있는 .env.local 예시인데 endpoints가 바뀔 예정입니다.

MODELS=`[
 {
    "name": "모델 이름",
    "id": "모델 이름",
    "displayName": "표시할 이름",
    "description": "설명 하실 것 있으면 작성하시면 됩니다.",
    "websiteUrl": "연결하고 싶으신 Url",
    "preprompt": "System prompt 작성하실 것 있다면 작성하세요 ",
    "chatPromptTemplate": " {{preprompt}}{{#each messages}}{{#ifUser}}[INST]{{#if @first}}{{#if @root.preprompt}}{{@root.preprompt}}\n{{/if}}{{/if}} {{content}} [/INST]{{/ifUser}}{{#ifAssistant}} {{content}}</s> {{/ifAssistant}}{{/each}}",
    

    "parameters": {
       "temperature": 0.5,
       "truncate": 8000,
       "max_new_tokens": 5000,
       "stop": ["<eos>"],
    },
    "endpoints": [
       {
         "type": "openai",
         "completion": "completions",
         "baseURL": "http://localhost:{내부 포트}/v1",
       },
    ],
  },
]


PUBLIC_ORIGIN= 내보낼 주소
PUBLIC_VERSION=1.0

여기서 chatPromptTemplate도 모델에 따라 다르니 Prompts.md확인하시고 맞는 프롬포트 탬플릿 찾으시면 됩니다.

이렇게 작성하시고 아까 작성한 sh 파일 실행하면 됩니다!

이렇게 잘 되는 것을 볼 수 있습니다!

https://www.tistory.com/event/write-challenge-2024

작심삼주 오블완 챌린지

오늘 블로그 완료! 21일 동안 매일 블로그에 글 쓰고 글력을 키워보세요.

www.tistory.com

저작자표시 (새창열림)

'인공지능 > 자연어 처리' 카테고리의 다른 글

DEPS와 GITM 비교 (0)	2024.11.27
MoE란? - Mixture of Experts (2)	2024.10.29
ESC task 발표 준비 (0)	2024.10.08
SAE tutorial - logits lens with features (5)	2024.09.23
SAE tutorials - SAE basic (2)	2024.09.22

현재글Hugging face Chat-ui, Vllm으로 챗봇 만들기

인공지능, 자율주행에 관심있는 공대생의 일기장...?

Today :
Yesterday :

공대생 도전 일지