Q-Learning

class srl.algorithms.ql.Config(observation_mode: Literal['', 'render_image'] = '', override_env_observation_type: srl.base.define.SpaceTypes = <SpaceTypes.UNKNOWN: 0>, override_observation_type: Union[str, srl.base.define.RLBaseTypes] = <RLBaseTypes.NONE: 1>, override_action_type: Union[str, srl.base.define.RLBaseTypes] = <RLBaseTypes.NONE: 1>, action_division_num: int = 10, observation_division_num: int = 1000, frameskip: int = 0, extend_worker: Optional[Type[ForwardRef('ExtendWorker')]] = None, processors: List[ForwardRef('RLProcessor')] = <factory>, render_image_processors: List[ForwardRef('RLProcessor')] = <factory>, enable_rl_processors: bool = True, enable_state_encode: bool = True, enable_action_decode: bool = True, window_length: int = 1, render_image_window_length: int = 1, render_last_step: bool = True, render_rl_image: bool = True, render_rl_image_size: Tuple[int, int] = (128, 128), enable_sanitize: bool = True, enable_assertion: bool = False, dtype: str = 'float32', test_epsilon: float = 0, epsilon: float = 0.1, epsilon_scheduler: srl.rl.schedulers.scheduler.SchedulerConfig = <factory>, lr: float = 0.1, lr_scheduler: srl.rl.schedulers.scheduler.SchedulerConfig = <factory>, discount: float = 0.9, q_init: Literal['', 'random', 'normal'] = '')
test_epsilon: float = 0

ε-greedy parameter for Test

epsilon: float = 0.1

ε-greedy parameter for Train

epsilon_scheduler: SchedulerConfig

<Scheduler>

lr: float = 0.1

Learning rate

lr_scheduler: SchedulerConfig

<Scheduler>

discount: float = 0.9

Discount rate

q_init: Literal['', 'random', 'normal'] = ''

How to initialize Q table

パラメータ:
  • "" -- 0

  • "random" -- random.random()

  • "normal" -- np.random.normal()