rlhfblender.data_collection package

Submodules

rlhfblender.data_collection.babyai_connector module

rlhfblender.data_collection.demo_session module

rlhfblender.data_collection.demo_session.check_socket_connection(session_id)[source]
Parameters:

session_id (str)

Return type:

bool

rlhfblender.data_collection.demo_session.close_demo_session(session_id, pid)[source]

Send a close command via the socket to the environment and return the results. Also terminate the process. :param session_id: The unique id of the session :param pid: The process id of the environment :return:

Parameters:
  • session_id (str)

  • pid (int)

async rlhfblender.data_collection.demo_session.create_new_session(session_id, gym_env, seed)[source]

Create a new session as a asynchronous process. In the process, initialize a gym environment and wait for commands via a pipe. :param session_id: The unique id of the session :param gym_env: The gym environment id :param seed: The seed for the environment :return:

Parameters:
  • session_id (str)

  • gym_env (str)

  • seed (str | int)

rlhfblender.data_collection.demo_session.demo_perform_step(session_id, action)[source]

Send a step command via the socket to the environment and return the results :param session_id: The unique id of the session :param action: The action to take in the environment :return:

Parameters:
  • session_id (str)

  • action (int | list[float])

Return type:

dict

rlhfblender.data_collection.demo_session.find_available_port(start_port=65432, max_attempts=100)[source]

Find an available port starting from the given port. :param start_port: The starting port number to check. :param max_attempts: The maximum number of ports to try. :return: An available port number.

rlhfblender.data_collection.demo_session.run_env_session(session_id, demo_number, gym_env, seed)[source]

Blocking loop that initializes a gym environment and waits for commands via a socket. :param session_id: (str) The unique id of the session (used for the socket port :param demo_number: (int) The index of the demo :param gym_env: (str) The gym environment id :param seed: (int) The seed for the environment :return:

Parameters:
  • session_id (str)

  • demo_number (int)

  • gym_env (str)

  • seed (str | int)

rlhfblender.data_collection.environment_handler module

rlhfblender.data_collection.episode_recorder module

rlhfblender.data_collection.feedback_model module

rlhfblender.data_collection.feedback_model_handler module

rlhfblender.data_collection.feedback_translator module

rlhfblender.data_collection.framework_selector module

rlhfblender.data_collection.imitation_connector module

rlhfblender.data_collection.metrics_processor module

rlhfblender.data_collection.metrics_processor.process_metrics(benchmark_results)[source]

Compute additional metrics on a per model/per benchmark basis :param benchmark_results: (RecordedEpisodes) Container of benchmark results :return metrics: (dict) Metrics

Parameters:

benchmark_results (RecordedEpisodesContainer)

Return type:

dict

rlhfblender.data_collection.sampler module

rlhfblender.data_collection.sb_zoo_connector module

Module contents

class rlhfblender.data_collection.RecordedEpisodesContainer(obs: numpy.ndarray, rewards: numpy.ndarray, dones: numpy.ndarray, actions: numpy.ndarray, infos: numpy.ndarray, renders: numpy.ndarray, features: numpy.ndarray, probs: numpy.ndarray, episode_rewards: numpy.ndarray, episode_lengths: numpy.ndarray, additional_metrics: numpy.ndarray)[source]

Bases: object

Parameters:
  • obs (ndarray)

  • rewards (ndarray)

  • dones (ndarray)

  • actions (ndarray)

  • infos (ndarray)

  • renders (ndarray)

  • features (ndarray)

  • probs (ndarray)

  • episode_rewards (ndarray)

  • episode_lengths (ndarray)

  • additional_metrics (ndarray)

actions: ndarray
additional_metrics: ndarray
dones: ndarray
episode_lengths: ndarray
episode_rewards: ndarray
features: ndarray
infos: ndarray
obs: ndarray
probs: ndarray
renders: ndarray
rewards: ndarray