强化学习多人FPS训练记录

训练环境搭建

Version information:
ml-agents: 0.30.0,
ml-agents-envs: 0.30.0,
Communicator API: 1.5.0,
PyTorch: 2.0.1+cu117

训练命令行输入:

1
mlagents-learn ./DodgeBall.yaml --run-id=fifthrun --initialize-from=fouthRun --resume
参数
trainer_type poca
batch_size 2048
buffer_size 20480
learning_rate 0.0003
beta 0.005
epsilon 0.2
lambd 0.95
num_epoch 3
learning_rate_schedule constant
beta_schedule constant
epsilon_schedule constant
normalize False
hidden_units 512
num_layers 3
vis_encode_type simple
memory None
goal_conditioning_type none
deterministic False
gamma 0.999
strength 1.0
normalize False
hidden_units 128
num_layers 2
vis_encode_type simple
memory None
goal_conditioning_type hyper
deterministic False
init_path None
keep_checkpoints 40
checkpoint_interval 2000000
max_steps 500000000
time_horizon 1000
summary_freq 50000
threaded False
save_steps 500000
team_change 1000000
swap_steps 200000
window 100
play_against_latest_model_ratio 0.5
initial_elo 1200.0
behavioral_cloning None

训练场景

image-20230730155641248

场景定义:

外围墙体+障碍物墙体+空投补给。

蓝色空投补给10s刷新,补充所有子弹,黄色补充所有子弹同时武器伤害+1。

image-20230730155653415

输入数据与y轴无关,将九个场景垂直放置加快训练速度

智能体设置

血量:2

基础攻击:1

冲刺CD:1

移动速度:20

冲刺移速:30

智能体sensor设置

  • 弹药传感器:检测射击的弹道和空投弹药箱

    image-20230730155734511

    image-20230730155742471

  • 人物传感器:检测队友、对手以及他们的朝向

    Untitled

    Untitled

  • 墙体传感器:检测障碍物

Untitled

Untitled

  • 背后信息传感器

Untitled

最终结果:

Untitled

Untitled

  • BufferSensor

    • 存储其余队友的PlayerInfo,最大8×3

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      public class PlayerInfo
      {
      public DodgeBallAgent Agent;
      public int HitPointsRemaining;
      [HideInInspector]
      public Vector3 StartingPos;
      [HideInInspector]
      public Quaternion StartingRot;
      [HideInInspector]
      public Rigidbody Rb;
      [HideInInspector]
      public Collider Col;
      [HideInInspector]
      public int TeamID;
      }

      private float[] GetOtherAgentData(DodgeBallGameController.PlayerInfo info)
      {
      var otherAgentdata = new float[8];
      otherAgentdata[0] = (float)info.Agent.HitPointsRemaining / (float)NumberOfTimesPlayerCanBeHit;
      var relativePosition = transform.InverseTransformPoint(info.Agent.transform.position);
      otherAgentdata[1] = relativePosition.x / m_LocationNormalizationFactor;
      otherAgentdata[2] = relativePosition.z / m_LocationNormalizationFactor;
      otherAgentdata[3] = info.TeamID == teamID ? 0.0f : 1.0f;
      otherAgentdata[4] = info.Agent.HasEnemyFlag ? 1.0f : 0.0f;
      otherAgentdata[5] = info.Agent.Stunned ? 1.0f : 0.0f;
      var relativeVelocity = transform.InverseTransformDirection(info.Agent.AgentRb.velocity);
      otherAgentdata[6] = relativeVelocity.x / 30.0f;
      otherAgentdata[7] = relativeVelocity.z / 30.0f;
      return otherAgentdata;

      }
  • 观察空间输入

    观察空间编号 类型 描述 备注
    0 bool 当前是否可以射击
    1 bool 是否眩晕
    2-6 float 当前持有弹匣数,OneHot编码
    7 float 当前血量/总血量
    8 float 前向向速度分量
    9 float 横向速度分量
    10-12 vector3 相对初始位置offset 将y轴数据移除
    13 bool 冲刺CD,1为可以使用
    16 bool 冗余位,当前输入始终为0 删除
    17 float 剩余敌人数量比例
    18~19 vector2 冗余位,输入0,0 删除

    后面优化后将观察空间缩小至16位