Runner(Base)

class srl.runner.runner_base.RunnerBase(name_or_env_config: str | EnvConfig | None = None, rl_config: TRLConfig | None = None, context: RunContext | None = None, delay_make_env: bool = False)

ベースクラス: Generic[TRLConfig]

実行環境を提供

name_or_env_config: str | EnvConfig | None = None: EnvConfigを指定（文字列のみのIDでも可能）

rl_config: TRLConfig | None = None: RLConfigを指定, Noneの場合、dummyアルゴリズムが使われます

set_seed(seed: int | None = None, seed_enable_gpu: bool = True)

set random seed.

パラメータ:

seed (Optional[int], optional) -- random seed. Defaults to None.
seed_enable_gpu (bool, optional) -- set GPU seed(実行速度が遅くなる場合があります). Defaults to True.

enable_stats(): ハードウェアの統計情報に関する設定を有効にします。

disable_stats(): ハードウェアの統計情報に関する設定を無効にします。

save_parameter(path: str, compress: bool = True, **kwargs)

save parameter

パラメータ:

path (str) -- save path
compress (bool, optional) -- 圧縮するかどうか。圧縮はlzma形式です. Defaults to True.

load_parameter(path: str): load parameter

save_memory(path: str, compress: bool = True, **kwargs)

save memory

パラメータ:

path (str) -- save path
compress (bool, optional) -- 圧縮するかどうか。圧縮はlzma形式です. Defaults to True.

load_memory(path: str): load memory

set_device(device: str = 'AUTO', enable_tf_device: bool = True, set_CUDA_VISIBLE_DEVICES_if_CPU: bool = True, tf_enable_memory_growth: bool = True)

set device.

"AUTO","" : Automatic assignment. "CPU","CPU:0": Use CPU. "GPU","GPU:0": Use GPU.

パラメータ:

device (str, optional) -- mainのdeviceを指定します。分散学習を用いない場合、これだけが使用されます. Defaults to "AUTO".
enable_tf_device (bool, optional) -- tensorflowにて、 'with tf.device()' を使用する. Defaults to True.
set_CUDA_VISIBLE_DEVICES_if_CPU (bool, optional) -- CPUの場合 CUDA_VISIBLE_DEVICES を-1にする. Defaults to True.
tf_enable_memory_growth (bool, optional) -- tensorflowにて、'set_memory_growth(True)' を実行する. Defaults to True.

set_progress(start_time: int = 1, interval_limit: int = 120, single_line=True, env_info: bool = False, train_info: bool = True, worker_info: bool = True, worker: int = 0, max_actor: int = 5, enable_eval: bool = False, eval_shuffle_player: bool = False, eval_episode: int = 1, eval_timeout: float = -1, eval_max_steps: int = -1, eval_players: Sequence[None | str | Tuple[str, dict] | RLConfig | Tuple[RLConfig, Any]] = [])

progress options

パラメータ:

start_time (int, optional) -- 最初に進捗を表示する秒数. Defaults to 1.
interval_limit (int, optional) -- 進捗を表示する最大の間隔（秒）. Defaults to 60*10.
single_line (bool, optional) -- 表示を1lineにするか. Defaults to False.
env_info (bool, optional) -- 進捗表示にenv infoを表示するか. Defaults to False.
train_info (bool, optional) -- 進捗表示にtrain infoを表示するか. Defaults to True.
worker_info (bool, optional) -- 進捗表示にworker infoを表示するか. Defaults to True.
worker (int, optional) -- 進捗表示に表示するworker index. Defaults to 0.
max_actor (int, optional) -- 進捗表示に表示するworker数. Defaults to 5.
eval_shuffle_player (bool, optional) -- 評価時にplayersをシャッフルするか. Defaults to False.
eval_episode (int, optional) -- 評価時のエピソード数. Defaults to 1.
eval_timeout (int, optional) -- 評価時の1エピソードの制限時間. Defaults to -1.
eval_max_steps (int, optional) -- 評価時の1エピソードの最大ステップ数. Defaults to -1.
eval_players (List[Union[None, str, Tuple[str, dict], RLConfig]], optional) -- 評価時のplayers. Defaults to [].

set_history_on_memory(interval: float | int = 1, interval_mode: Literal['time', 'step'] = 'time', enable_eval: bool = False, eval_episode: int = 1, eval_timeout: float = -1, eval_max_steps: int = -1, eval_players: Sequence[None | str | Tuple[str, dict] | RLConfig | Tuple[RLConfig, Any]] = [], eval_shuffle_player: bool = False)

学習履歴を保存する設定を指定します。

パラメータ:

interval (int, optional) -- 学習履歴を保存する間隔. Defaults to 1.
interval_mode (str, optional) -- 学習履歴を保存する間隔の単位(time:秒、step:step). Defaults to "time".
enable_eval (bool, optional) -- 学習履歴の保存時に評価用のシミュレーションを実行します. Defaults to False.
eval_episode (int, optional) -- 評価時のエピソード数. Defaults to 1.
eval_timeout (int, optional) -- 評価時の1エピソードの制限時間. Defaults to -1.
eval_max_steps (int, optional) -- 評価時の1エピソードの最大ステップ数. Defaults to -1.
eval_players (List[Union[None, str, Tuple[str, dict], RLConfig]], optional) -- 評価時のplayers. Defaults to [].
eval_shuffle_player (bool, optional) -- 評価時にplayersをシャッフルするか. Defaults to False.

set_history_on_file(save_dir: str = '', interval: float | int = 1, interval_mode: Literal['time', 'step'] = 'time', add_history: bool = False, write_system: bool = False, enable_eval: bool = False, eval_episode: int = 1, eval_timeout: float = -1, eval_max_steps: int = -1, eval_players: Sequence[None | str | Tuple[str, dict] | RLConfig | Tuple[RLConfig, Any]] = [], eval_shuffle_player: bool = False)

学習履歴を保存する設定を指定します。

パラメータ:

save_dir (str, optional) -- 保存するディレクトリ、""の場合tmpフォルダを作成
interval (int, optional) -- 学習履歴を保存する間隔. Defaults to 1.
interval_mode (str, optional) -- 学習履歴を保存する間隔の単位(time:秒、step:step). Defaults to "time".
add_history (bool, optional) -- 追記で学習履歴を保存. Defaults to False.
write_system (bool, optional) -- CPU/memory情報も保存. Defaults to False.
enable_eval (bool, optional) -- 学習履歴の保存時に評価用のシミュレーションを実行します. Defaults to False.
eval_episode (int, optional) -- 評価時のエピソード数. Defaults to 1.
eval_timeout (int, optional) -- 評価時の1エピソードの制限時間. Defaults to -1.
eval_max_steps (int, optional) -- 評価時の1エピソードの最大ステップ数. Defaults to -1.
eval_players (List[Union[None, str, Tuple[str, dict], RLConfig]], optional) -- 評価時のplayers. Defaults to [].
eval_shuffle_player (bool, optional) -- 評価時にplayersをシャッフルするか. Defaults to False.

set_checkpoint(save_dir: str, is_load: bool, interval: int = 600, enable_eval: bool = True, eval_episode: int = 1, eval_timeout: float = -1, eval_max_steps: int = -1, eval_players: Sequence[None | str | Tuple[str, dict] | RLConfig | Tuple[RLConfig, Any]] = [], eval_shuffle_player: bool = False)

一定間隔でモデルを保存します。

パラメータ:

save_dir (int) -- 保存するディレクトリ
interval (int, optional) -- 保存する間隔（秒）. Defaults to 60*10sec.
enable_eval (bool, optional) -- モデル保存時に評価用のシミュレーションを実行します. Defaults to False.
eval_episode (int, optional) -- 評価時のエピソード数. Defaults to 1.
eval_timeout (int, optional) -- 評価時の1エピソードの制限時間. Defaults to -1.
eval_max_steps (int, optional) -- 評価時の1エピソードの最大ステップ数. Defaults to -1.
eval_players (List[Union[None, str, Tuple[str, dict], RLConfig]], optional) -- 評価時のplayers. Defaults to [].
eval_shuffle_player (bool, optional) -- 評価時にplayersをシャッフルするか. Defaults to False.

Runner

class srl.runner.runner.Runner(name_or_env_config: str | srl.base.env.config.EnvConfig | NoneType = None, rl_config: Optional[+TRLConfig] = None, context: srl.base.context.RunContext | None = None, delay_make_env: bool = False)

ベースクラス: Generic[TRLConfig], RunnerBase[TRLConfig]

play(enable_progress: bool = True)

設定されているcontextでそのままplayする

setup_context: 同じ設定で複数回呼ばれる場合はFalseにする（eval等）

play_direct(): 設定されているcontextでそのままplayする、チェックなし

train(max_episodes: int = 0, timeout: float = 0, max_steps: int = 0, max_train_count: int = 0, max_memory: int = 0, players: Sequence[None | str | Tuple[str, dict] | RLConfig | Tuple[RLConfig, Any]] = [], shuffle_player: bool = True, train_interval: int = 1, train_repeat: int = 1, enable_progress: bool = True, callbacks: List[RunCallback] = [])

パラメータ:

max_episodes (int, optional) -- 終了するまでのエピソード数. Defaults to -1.
timeout (float, optional) -- 終了するまでの時間（秒）. Defaults to -1.
max_steps (int, optional) -- 終了するまでの総ステップ. Defaults to -1.
max_train_count (int, optional) -- 終了するまでの学習回数. Defaults to -1.
max_memory (int, optional) -- 終了するまでのメモリ数. Defaults to -1.
players (PlayerTypes, optional) -- 二人以上の環境で他プレイヤーのアルゴリズム
shuffle_player (bool, optional) -- playersをシャッフルするかどうか. Defaults to True.
train_interval (int, optional) -- 学習間隔（step）. Defaults to 1.
train_repeat (int, optional) -- 1stepあたりの学習回数. Defaults to 1.
enable_progress (bool, optional) -- 進捗を表示するか. Defaults to True.
callbacks (List[RunCallback], optional) -- callbacks. Defaults to [].

rollout(max_episodes: int = -1, timeout: float = -1, max_steps: int = -1, max_memory: int = -1, players: Sequence[None | str | Tuple[str, dict] | RLConfig | Tuple[RLConfig, Any]] = [], shuffle_player: bool = True, enable_progress: bool = True, callbacks: List[RunCallback] = []): collect_memory

train_only(timeout: float = -1, max_train_count: int = -1, enable_progress: bool = True, callbacks: List[RunCallback] = []): Trainerが学習するだけでWorkerによるシミュレーションはありません。

train_mp(actor_num: int = 1, queue_capacity: int = 1000, trainer_parameter_send_interval: float = 1, actor_parameter_sync_interval: float = 1, actor_devices: str | List[str] = 'AUTO', enable_mp_memory: bool = True, train_to_mem_queue_capacity: int = 100, mem_to_train_queue_capacity: int = 5, return_memory_data: bool = False, return_memory_timeout: int = 3600, initial_parameter_sharing: bool = True, initial_memory_sharing: bool = False, timeout: float = -1, max_train_count: int = -1, players: Sequence[None | str | Tuple[str, dict] | RLConfig | Tuple[RLConfig, Any]] = [], shuffle_player: bool = True, enable_progress: bool = True, callbacks: List[RunCallback] = [], used_context: RunContext | None = None): multiprocessingを使用した分散学習による学習を実施します。

evaluate(max_episodes: int = 10, timeout: float = -1, max_steps: int = -1, players: Sequence[None | str | Tuple[str, dict] | RLConfig | Tuple[RLConfig, Any]] = [], shuffle_player: bool = True, enable_progress: bool = True, callbacks: List[RunCallback] = []) → List[float] | List[List[float]]

シミュレーションし、報酬を返します。

パラメータ:

max_episodes (int, optional) -- 終了するまでのエピソード数. Defaults to 10.
timeout (int, optional) -- 終了するまでの時間（秒）. Defaults to -1.
max_steps (int, optional) -- 終了するまでの総ステップ. Defaults to -1.
players (PlayerTypes, optional) -- 二人以上の環境で他プレイヤーのアルゴリズム
shuffle_player (bool, optional) -- playersをシャッフルするかどうか. Defaults to True.
enable_progress (bool, optional) -- 進捗を表示するか. Defaults to True.
callbacks (List[RunCallback], optional) -- callbacks. Defaults to [].

戻り値:

プレイヤー数が1人なら Lost[float]、複数なら List[List[float]]] を返します。

戻り値の型:

Union[List[float], List[List[float]]]

model_summary(expand_nested: bool = True, **kwargs) → RLParameter

modelの概要を返します。これは以下と同じです。

>>> parameter = runner.make_parameter()
>>> parameter.summary()

パラメータ:: expand_nested (bool) -- tensorflow option
戻り値:: RLParameter
戻り値の型:: RLParameter