1. VPN Application
To access the laboratory resources remotely, you must first apply for VPN access. Please fill out the form below.
2. Start Training (CLI)
The environment is pre-configured. You can start a training session using the phisonai2 command launcher.
Make sure you have your env_config.yaml and exp_config.yaml ready.
# Basic Syntax
phisonai2 --env_config --exp_config
# Example usage:
phisonai2 --env_config ~/Desktop/aiDAPTIV2/commands/env_config/env_config.yaml --exp_config ~/Desktop/aiDAPTIV2/commands/exp_config/exp_config.yaml
Note: It is recommended to use screen or tmux sessions for long-running training jobs.
3. Configuration Settings
You need to configure two YAML files before running the training:
A. Environment Config (env_config.yaml)
Defines paths for models, datasets, and SSD caching.
path_settings:
model_name_or_path: "/home/$USER/Desktop/llm/Llama-3.1-8B-Instruct"
data_path:
- ./dataset_config/text-generation/QA_dataset_config.yaml
nvme_path: "/mnt/nvme0" # Path to aiDAPTIVCache
output_dir: "/home/$USER/output"
log_name: "training_log.log"
B. Experiment Config (exp_config.yaml)
Defines hyperparameters, GPU settings, and task types.
process_settings:
num_gpus: 1
specify_gpus: 0
run_settings:
task_type: "text-generation"
task_mode: "train"
num_train_epochs: 1
per_device_train_batch_size: 1
learning_rate: 0.000007
4. Python Code Integration
If you are writing your own training script, import the phisonlib middleware to enable aiDAPTIVLink.
# 1. Import Phison Middleware
from phisonlib.moirai import initialize, save_model, MoiraiConfig
# 2. Initialize Model (Stream Mode)
model = prepare_bf16_hf_model_init_stream(
model_name_or_path=MODEL_PATH,
tokenizer=tokenizer
)
# 3. Apply Middleware
moirai_config = prepare_config()
model, optimizer = initialize(module=model, config=moirai_config)
# Now proceed with your standard training loop...
5. Monitoring Logs
Logs are generated in the directory where you execute the command. Use tail to monitor progress in real-time.
# Replace 'your_log_name.log' with the actual filename defined in env_config.yaml
tail -f your_log_name.log
Look for the Loss value. A decreasing trend indicates successful training.