일단 저는 driving scenario tool box와 reinforcement learning에 대한 정보가 부족하기 때문에 그것에 대한 정보를 번갈아서 넣어 줄 예정입니다.
url을 하나씩 남기면서 모아볼게요...
아직 웹 크롤링은 잘 모르겠더라고여....
https://kr.mathworks.com/help/driving/ref/drivingscenariodesigner-app.html
https://www.mathworks.com/help/driving/ref/scenarioreader.html
드라이빙 툴 박스의 내용이 강화학습 보다 더 많아서 강화학습 2개에 드라이빙 1개씩 넣어주겠습니다.
https://www.mathworks.com/help/driving/ref/roadrunnerscenarioreader.html
https://www.mathworks.com/help/reinforcement-learning/ug/design-dqn-using-rl-designer.html
https://www.mathworks.com/help/reinforcement-learning/ug/what-is-reinforcement-learning.html
https://www.mathworks.com/help/driving/ref/roadrunnerscenariowriter.html
https://www.mathworks.com/help/reinforcement-learning/ug/create-custom-simulink-environments.html
https://www.mathworks.com/help/driving/ref/roadrunnerscenario.html
https://www.mathworks.com/help/reinforcement-learning/ref/rlsimulinkenv.html
https://www.mathworks.com/help/driving/ug/trajectory-follower-with-roadrunner-scenario.html
https://www.mathworks.com/help/reinforcement-learning/ref/createintegratedenv.html
https://www.mathworks.com/help/driving/ref/scenariosimulation.html
https://www.mathworks.com/help/driving/ref/roadrunner.openscenario.html
https://www.mathworks.com/help/driving/ref/roadrunner.html
https://www.mathworks.com/help/reinforcement-learning/ref/rl.env.basicgridworld.getactioninfo.html
https://www.mathworks.com/help/reinforcement-learning/ug/define-reward-and-observation-signals.html
https://www.mathworks.com/help/driving/ref/ultrasonicdetectiongenerator-system-object.html
https://www.mathworks.com/help/reinforcement-learning/ug/train-reinforcement-learning-agents.html
https://www.mathworks.com/help/driving/ref/drivingradardatagenerator-system-object.html
https://www.mathworks.com/help/reinforcement-learning/ref/generaterewardfunction.html
https://www.mathworks.com/help/driving/ref/lidarpointcloudgenerator-system-object.html
https://www.mathworks.com/help/reinforcement-learning/ref/generaterewardfunction.html
https://www.mathworks.com/help/driving/ref/visiondetectiongenerator-system-object.html
https://www.mathworks.com/help/reinforcement-learning/ug/ddpg-agents.html
https://www.mathworks.com/help/driving/ref/visiondetectiongenerator-system-object.html
https://www.mathworks.com/help/driving/ref/objectdetection.html
https://www.mathworks.com/help/reinforcement-learning/ref/rl.option.rlddpgagentoptions.html
https://www.mathworks.com/help/vdynblks/ref/vehiclebody3dof.html
https://www.mathworks.com/help/reinforcement-learning/ug/proximal-policy-optimization-agents.html
https://www.mathworks.com/help/reinforcement-learning/ref/rl.agent.rlppoagent.html
https://www.mathworks.com/help/vdynblks/ref/vehiclebody3doflongitudinal.html
https://www.mathworks.com/help/reinforcement-learning/ref/rl.function.rlcontinuousgaussianactor.html
대략 여기까지 하면 645,117자가 되네요
한번 학습 시켜보고 다시 한번 보겠습니다.
열심히 크롤링 했다 생각했는데.....
생각보다 별로.....ㅠ
일단 한번 해봅시다.
이번에는 한 번에 작성해 보겠습니다.
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
HfArgumentParser,
TrainingArguments,
pipeline,
logging,
Trainer,
DataCollatorForLanguageModeling
)
from peft import (
LoraConfig,
PeftModel,
prepare_model_for_kbit_training,
get_peft_model,
)
import os, torch
from datasets import load_dataset
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device
base_model = "/kaggle/input/llama-3/transformers/8b-hf/1"
new_model = "llama-3-8b-chat-matlab"
dataset = load_dataset('csv', data_files='/kaggle/input/clean-long-matlabdata/cleaned_long_matlab_data.csv')
base_model = "/kaggle/input/llama-3/transformers/8b-hf/1"
new_model = "llama-3-8b-chat-matlab"
dataset = load_dataset('csv', data_files='/kaggle/input/clean-long-matlabdata/cleaned_long_matlab_data.csv')
torch_dtype = torch.float16
attn_implementation = "eager"
# QLoRA config
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch_dtype,
bnb_4bit_use_double_quant=True,
)
# Load model
model = AutoModelForCausalLM.from_pretrained(
base_model,
quantization_config=bnb_config,
device_map="auto",
attn_implementation=attn_implementation
)
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model)
tokenizer.pad_token = tokenizer.eos_token
# LoRA config
peft_config = LoraConfig(
r=32,
lora_alpha=32,
lora_dropout=0.2,
bias="none",
task_type="CAUSAL_LM",
target_modules=['up_proj', 'down_proj', 'gate_proj', 'k_proj', 'q_proj', 'v_proj', 'o_proj']
)
model = get_peft_model(model, peft_config)
def tokenize_function(examples):
return tokenizer(examples['Text'], padding='max_length', truncation=True)
tokenized_dataset = dataset.map(tokenize_function, batched=True)
model = get_peft_model(model, peft_config)
training_args = TrainingArguments(
output_dir= new_model,
overwrite_output_dir=True,
num_train_epochs=1,
optim="paged_adamw_32bit",
per_device_train_batch_size=2,
gradient_accumulation_steps=2,
save_steps=10_000,
save_total_limit=2,
prediction_loss_only=True,
evaluation_strategy="steps",
eval_steps=0.2,
logging_steps=1,
warmup_steps=10,
logging_strategy="steps",
learning_rate=2e-4,
fp16=False,
bf16=False,
group_by_length=True,
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_dataset['train'],
data_collator=DataCollatorForLanguageModeling(
tokenizer=tokenizer,
mlm=False,
),
)
os.environ["WANDB_DISABLED"] = "true"
trainer.train()
GPU크기에서 자꾸 오류가 나네요.......
T4 두개로 진행해보겠습니다.
이것도 오류가............
흐..........
GPU용량이 터지네요....
일단 오늘은 여기까지 하고
내일 다시 머리 잘 돌아가는 상태에서 해보겠습니다.
'인공지능 > 자연어 처리' 카테고리의 다른 글
python 실습 - Huggingface SmolLM fine-tuning 하기 with LoRA - matlab data (0) | 2024.07.25 |
---|---|
SLM Phi-3 활용해서 Parameter efficient fine-tuning 진행하기 (1) | 2024.07.23 |
LLaMa3 LoRA를 통해 parameter efficient fine-tuning 진행하기 1(Matlab 도메인) - python (0) | 2024.07.22 |
자연어 처리 Python 실습 - Parameter Efficient Fine tuning (3) | 2024.07.22 |
자연어 처리 python 실습 - LLaMa instruction Tuning (1) | 2024.07.21 |