Q-Learning
- class srl.algorithms.ql.Config(observation_mode: Union[str, srl.base.define.ObservationModes] = <ObservationModes.ENV: 1>, override_observation_type: srl.base.define.SpaceTypes = <SpaceTypes.UNKNOWN: 0>, override_action_type: Union[str, srl.base.define.RLBaseActTypes] = <RLBaseActTypes.NONE: 1>, action_division_num: int = 10, observation_division_num: int = 1000, frameskip: int = 0, extend_worker: Optional[Type[ForwardRef('ExtendWorker')]] = None, parameter_path: str = '', memory_path: str = '', use_rl_processor: bool = True, processors: List[ForwardRef('RLProcessor')] = <factory>, render_image_processors: List[ForwardRef('RLProcessor')] = <factory>, enable_state_encode: bool = True, enable_action_decode: bool = True, enable_reward_encode: bool = True, enable_done_encode: bool = True, window_length: int = 1, render_image_window_length: int = 1, enable_sanitize: bool = True, enable_assertion: bool = False, test_epsilon: float = 0, epsilon: Union[float, srl.rl.schedulers.scheduler.SchedulerConfig] = 0.1, lr: Union[float, srl.rl.schedulers.scheduler.SchedulerConfig] = 0.1, discount: float = 0.9, q_init: str = '')
- test_epsilon: float = 0
ε-greedy parameter for Test
- epsilon: float | SchedulerConfig = 0.1
<Scheduler> ε-greedy parameter for Train
- lr: float | SchedulerConfig = 0.1
<Scheduler> Learning rate
- discount: float = 0.9
Discount rate
- q_init: str = ''
How to initialize Q table
- パラメータ:
"" -- 0
"random" -- random.random()
"normal" -- np.random.normal()