训练环境搭建
Version information:
ml-agents: 0.30.0,
ml-agents-envs: 0.30.0,
Communicator API: 1.5.0,
PyTorch: 2.0.1+cu117
训练命令行输入:
1 | mlagents-learn ./DodgeBall.yaml --run-id=fifthrun --initialize-from=fouthRun --resume |
参数 | 值 |
---|---|
trainer_type | poca |
batch_size | 2048 |
buffer_size | 20480 |
learning_rate | 0.0003 |
beta | 0.005 |
epsilon | 0.2 |
lambd | 0.95 |
num_epoch | 3 |
learning_rate_schedule | constant |
beta_schedule | constant |
epsilon_schedule | constant |
normalize | False |
hidden_units | 512 |
num_layers | 3 |
vis_encode_type | simple |
memory | None |
goal_conditioning_type | none |
deterministic | False |
gamma | 0.999 |
strength | 1.0 |
normalize | False |
hidden_units | 128 |
num_layers | 2 |
vis_encode_type | simple |
memory | None |
goal_conditioning_type | hyper |
deterministic | False |
init_path | None |
keep_checkpoints | 40 |
checkpoint_interval | 2000000 |
max_steps | 500000000 |
time_horizon | 1000 |
summary_freq | 50000 |
threaded | False |
save_steps | 500000 |
team_change | 1000000 |
swap_steps | 200000 |
window | 100 |
play_against_latest_model_ratio | 0.5 |
initial_elo | 1200.0 |
behavioral_cloning | None |
训练场景
场景定义:
外围墙体+障碍物墙体+空投补给。
蓝色空投补给10s刷新,补充所有子弹,黄色补充所有子弹同时武器伤害+1。
输入数据与y轴无关,将九个场景垂直放置加快训练速度
智能体设置
血量:2
基础攻击:1
冲刺CD:1
移动速度:20
冲刺移速:30
智能体sensor设置
弹药传感器:检测射击的弹道和空投弹药箱
人物传感器:检测队友、对手以及他们的朝向
墙体传感器:检测障碍物
- 背后信息传感器
最终结果:
BufferSensor
存储其余队友的PlayerInfo,最大8×3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32public class PlayerInfo
{
public DodgeBallAgent Agent;
public int HitPointsRemaining;
[ ]
public Vector3 StartingPos;
[ ]
public Quaternion StartingRot;
[ ]
public Rigidbody Rb;
[ ]
public Collider Col;
[ ]
public int TeamID;
}
private float[] GetOtherAgentData(DodgeBallGameController.PlayerInfo info)
{
var otherAgentdata = new float[8];
otherAgentdata[0] = (float)info.Agent.HitPointsRemaining / (float)NumberOfTimesPlayerCanBeHit;
var relativePosition = transform.InverseTransformPoint(info.Agent.transform.position);
otherAgentdata[1] = relativePosition.x / m_LocationNormalizationFactor;
otherAgentdata[2] = relativePosition.z / m_LocationNormalizationFactor;
otherAgentdata[3] = info.TeamID == teamID ? 0.0f : 1.0f;
otherAgentdata[4] = info.Agent.HasEnemyFlag ? 1.0f : 0.0f;
otherAgentdata[5] = info.Agent.Stunned ? 1.0f : 0.0f;
var relativeVelocity = transform.InverseTransformDirection(info.Agent.AgentRb.velocity);
otherAgentdata[6] = relativeVelocity.x / 30.0f;
otherAgentdata[7] = relativeVelocity.z / 30.0f;
return otherAgentdata;
}
观察空间输入
观察空间编号 类型 描述 备注 0 bool 当前是否可以射击 1 bool 是否眩晕 2-6 float 当前持有弹匣数,OneHot编码 7 float 当前血量/总血量 8 float 前向向速度分量 9 float 横向速度分量 10-12 vector3 相对初始位置offset 将y轴数据移除 13 bool 冲刺CD,1为可以使用 16 bool 冗余位,当前输入始终为0 删除 17 float 剩余敌人数量比例 18~19 vector2 冗余位,输入0,0 删除 后面优化后将观察空间缩小至16位