AlphaZero
- class srl.algorithms.alphazero.Config(observation_mode: Literal['', 'render_image'] = '', override_env_observation_type: srl.base.define.SpaceTypes = <SpaceTypes.UNKNOWN: 0>, override_observation_type: Union[str, srl.base.define.RLBaseTypes] = <RLBaseTypes.NONE: 1>, override_action_type: Union[str, srl.base.define.RLBaseTypes] = <RLBaseTypes.NONE: 1>, action_division_num: int = 10, observation_division_num: int = 1000, frameskip: int = 0, extend_worker: Optional[Type[ForwardRef('ExtendWorker')]] = None, processors: List[ForwardRef('RLProcessor')] = <factory>, render_image_processors: List[ForwardRef('RLProcessor')] = <factory>, enable_rl_processors: bool = True, enable_state_encode: bool = True, enable_action_decode: bool = True, window_length: int = 1, render_image_window_length: int = 1, render_last_step: bool = True, render_rl_image: bool = True, render_rl_image_size: Tuple[int, int] = (128, 128), enable_sanitize: bool = True, enable_assertion: bool = False, dtype: str = 'float32', num_simulations: int = 100, discount: float = 1.0, sampling_steps: int = 1, batch_size: int = 32, memory: srl.rl.memories.replay_buffer.ReplayBufferConfig = <factory>, lr: float = 0.002, lr_scheduler: srl.rl.schedulers.lr_scheduler.LRSchedulerConfig = <factory>, root_dirichlet_alpha: float = 0.3, root_exploration_fraction: float = 0.25, c_base: float = 19652, c_init: float = 1.25, input_image_block: srl.rl.models.config.input_image_block.InputImageBlockConfig = <factory>, value_block: srl.rl.models.config.mlp_block.MLPBlockConfig = <factory>, policy_block: srl.rl.models.config.mlp_block.MLPBlockConfig = <factory>, value_type: Literal['rate', 'linear'] = 'linear')
- num_simulations: int = 100
シミュレーション回数
- discount: float = 1.0
割引率
- sampling_steps: int = 1
エピソード序盤の確率移動のステップ数
- batch_size: int = 32
Batch size
- memory: ReplayBufferConfig
- lr: float = 0.002
Learning rate
- lr_scheduler: LRSchedulerConfig
- root_dirichlet_alpha: float = 0.3
Root prior exploration noise.
- root_exploration_fraction: float = 0.25
Root prior exploration noise.
- c_base: float = 19652
PUCT
- c_init: float = 1.25
PUCT
- input_image_block: InputImageBlockConfig
- value_block: MLPBlockConfig
<MLPBlock> value block
- policy_block: MLPBlockConfig
<MLPBlock> policy block
- value_type: Literal['rate', 'linear'] = 'linear'
"rate" or "linear"