SimpleDistributedRL

Contents

  • Installation
  • How To Use
  • Distributed Learning (Online)

Custom

  • Making a Custom environment
  • Making a Custom algorithm
  • Detailed Framework

API

  • EnvConfig
  • RLConfig
  • RLConfig Parameters
  • Runner(Base)
  • Runner

Algorithms

  • Q-Learning
  • Deep Q-Networks
  • Rainbow
  • Agent57
  • Agent57 light
  • PPO(Proximal Policy Optimization)
  • DDPG(Deep Deterministic Policy Gradient)
  • SAC(Soft-Actor-Critic)
  • SND(Self-supervised Network Distillation)
  • Monte Carlo tree search
    • Config
      • Config.num_simulations
      • Config.expansion_threshold
      • Config.discount
      • Config.uct_c
  • AlphaZero
  • MuZero
  • DreamerV3
SimpleDistributedRL
  • Monte Carlo tree search
  • View page source

Monte Carlo tree search

class srl.algorithms.mcts.Config(observation_mode: Literal['', 'render_image'] = '', override_env_observation_type: srl.base.define.SpaceTypes = <SpaceTypes.UNKNOWN: 0>, override_observation_type: Union[str, srl.base.define.RLBaseTypes] = <RLBaseTypes.NONE: 1>, override_action_type: Union[str, srl.base.define.RLBaseTypes] = <RLBaseTypes.NONE: 1>, action_division_num: int = 10, observation_division_num: int = 1000, frameskip: int = 0, extend_worker: Optional[Type[ForwardRef('ExtendWorker')]] = None, processors: List[ForwardRef('RLProcessor')] = <factory>, render_image_processors: List[ForwardRef('RLProcessor')] = <factory>, enable_rl_processors: bool = True, enable_state_encode: bool = True, enable_action_decode: bool = True, window_length: int = 1, render_image_window_length: int = 1, render_last_step: bool = True, render_rl_image: bool = True, render_rl_image_size: Tuple[int, int] = (128, 128), enable_sanitize: bool = True, enable_assertion: bool = False, dtype: str = 'float32', num_simulations: int = 10, expansion_threshold: int = 5, discount: float = 1.0, uct_c: float = np.float64(1.4142135623730951))
num_simulations: int = 10

シミュレーション回数

expansion_threshold: int = 5

展開の閾値

discount: float = 1.0

割引率

uct_c: float = np.float64(1.4142135623730951)

UCT C

Previous Next

© Copyright 2022, poco.

Built with Sphinx using a theme provided by Read the Docs.