Rainbow
- class srl.algorithms.rainbow.rainbow.Config(framework: str = 'auto', batch_size: int = 32, memory_capacity: int = 100000, memory_warmup_size: int = 1000, memory_compress: bool = True, memory_compress_level: int = -1, observation_mode: str | ~srl.base.define.ObservationModes = ObservationModes.ENV, override_observation_type: ~srl.base.define.SpaceTypes = SpaceTypes.UNKNOWN, override_action_type: str | ~srl.base.define.RLBaseActTypes = <RLBaseActTypes.NONE: 1>, action_division_num: int = 10, observation_division_num: int = 1000, frameskip: int = 0, extend_worker: ~typing.Type[ExtendWorker] | None = None, parameter_path: str = '', memory_path: str = '', use_rl_processor: bool = True, processors: ~typing.List[RLProcessor] = <factory>, render_image_processors: ~typing.List[RLProcessor] = <factory>, enable_state_encode: bool = True, enable_action_decode: bool = True, enable_reward_encode: bool = True, enable_done_encode: bool = True, window_length: int = 1, render_image_window_length: int = 1, enable_sanitize: bool = True, enable_assertion: bool = False, test_epsilon: float = 0, actor_epsilon: float = 0.4, actor_alpha: float = 7.0, epsilon: float | ~srl.rl.schedulers.scheduler.SchedulerConfig = 0.1, lr: float | ~srl.rl.schedulers.scheduler.SchedulerConfig = 0.001, discount: float = 0.99, target_model_update_interval: int = 1000, enable_reward_clip: bool = False, enable_double_dqn: bool = True, enable_noisy_dense: bool = False, enable_rescale: bool = False, multisteps: int = 3, retrace_h: float = 1.0, dummy_state_val: float = 0)
<PriorityExperienceReplay> <RLConfigComponentFramework> <RLConfigComponentInput>
- test_epsilon: float = 0
ε-greedy parameter for Test
- actor_epsilon: float = 0.4
Learning rate during distributed learning \(\epsilon_i = \epsilon^{1 + \frac{i}{N-1} \alpha}\)
- actor_alpha: float = 7.0
Look actor_epsilon
- epsilon: float | SchedulerConfig = 0.1
<Scheduler> ε-greedy parameter for Train
- lr: float | SchedulerConfig = 0.001
Learning rate
<DuelingNetwork> hidden layer
- discount: float = 0.99
Discount rate
- target_model_update_interval: int = 1000
Synchronization interval to Target network
- enable_reward_clip: bool = False
If True, clip the reward to three types [-1,0,1]
- enable_double_dqn: bool = True
enable DoubleDQN
- enable_noisy_dense: bool = False
noisy dense
- enable_rescale: bool = False
enable rescaling
- multisteps: int = 3
Multi-step learning
- retrace_h: float = 1.0
retrace parameter h