botorch.models
Lazy-loading module for botorch.models.
Submodules are imported on first access via __getattr__ (PEP 562),
so heavy transitive dependencies (e.g. JAX via fully_bayesian) are
never loaded unless explicitly requested.
Model APIs
Base Model API
Abstract base module for all BoTorch models.
This module contains Model, the abstract base class for all BoTorch models,
and ModelList, a container for a list of Models.
- class botorch.models.model.Model(*args, **kwargs)[source]
Bases:
Module,ABCAbstract base class for BoTorch models.
The
Modelbase class cannot be used directly; it only defines an API for other BoTorch models.Modelsubclassestorch.nn.Module. While aModuleis most typically encountered as a representation of a neural network layer, it can be used more generally: see documentation <https://pytorch.org/tutorials/beginner/examples_nn/polynomial_module.html>_ on custom NN Modules.Moduleprovides several pieces of useful functionality: AModel’s attributes ofTensororModuletype are automatically registered so they can be moved and/or cast with thetomethod, automatically differentiated, and used with CUDA.- Parameters:
args (Any)
kwargs (Any)
- _has_transformed_inputs
A boolean denoting whether
train_inputsare currently stored as transformed or not.- Type:
bool
- _original_train_inputs
A Tensor storing the original train inputs for use in
_revert_to_original_inputs. Note that this is necessary since transform / untransform cycle introduces numerical errors which lead to upstream errors during training.- Type:
torch.Tensor | None
- _is_fully_bayesian
Returns
Trueif this is a fully Bayesian model.
- _is_ensemble
Returns
Trueif this model consists of multiple models that are stored in an additional batch dimension. This is true for the fully Bayesian models.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- abstractmethod posterior(X, output_indices=None, observation_noise=False, posterior_transform=None)[source]
Computes the posterior over model outputs at the provided points.
- Note: The input transforms should be applied here using
self.transform_inputs(X)after theself.eval()call and before anymodel.forwardormodel.likelihoodcalls.
- Parameters:
X (Tensor) – A
b x q x d-dim Tensor, wheredis the dimension of the feature space,qis the number of points considered jointly, andbis the batch dimension.output_indices (list[int] | None) – A list of indices, corresponding to the outputs over which to compute the posterior (if the model is multi-output). Can be used to speed up computation if only a subset of the model’s outputs are required for optimization. If omitted, computes the posterior over all model outputs.
observation_noise (bool | Tensor) – For models with an inferred noise level, if True, include observation noise. For models with an observed noise level, this must be a
model_batch_shape x 1 x m-dim tensor or amodel_batch_shape x n' x m-dim tensor containing the average noise for each batch and output.noisemust be in the outcome-transformed space if an outcome transform is used.posterior_transform (PosteriorTransform | None) – An optional PosteriorTransform.
- Returns:
A
Posteriorobject, representing a batch ofbjoint distributions overqpoints andmoutputs each.- Return type:
- property batch_shape: Size
The batch shape of the model.
This is a batch shape from an I/O perspective, independent of the internal representation of the model (as e.g. in BatchedMultiOutputGPyTorchModel). For a model with
moutputs, atest_batch_shape x q x d-shaped inputXto theposteriormethod returns a Posterior object over an output of shapebroadcast(test_batch_shape, model.batch_shape) x q x m.
- property num_outputs: int
The number of outputs of the model.
- subset_output(idcs)[source]
Subset the model along the output dimension.
- Parameters:
idcs (list[int]) – The output indices to subset the model to.
- Returns:
A
Modelobject of the same type and with the same parameters as the current model, subset to the specified output indices.- Return type:
- condition_on_observations(X, Y, **kwargs)[source]
Condition the model on new observations.
- Parameters:
X (Tensor) – A
batch_shape x n' x d-dim Tensor, wheredis the dimension of the feature space,n'is the number of points per batch, andbatch_shapeis the batch shape (must be compatible with the batch shape of the model).Y (Tensor) – A
batch_shape' x n' x m-dim Tensor, wheremis the number of model outputs,n'is the number of points per batch, andbatch_shape'is the batch shape of the observations.batch_shape'must be broadcastable tobatch_shapeusing standard broadcasting semantics. IfYhas fewer batch dimensions thanX, it is assumed that the missing batch dimensions are the same for allY.kwargs (Any)
- Returns:
A
Modelobject of the same type, representing the original model conditioned on the new observations(X, Y)(and possibly noise observations passed in via kwargs).- Return type:
- classmethod construct_inputs(training_data)[source]
Construct
Modelkeyword arguments from aSupervisedDataset.- Parameters:
training_data (SupervisedDataset) – A
SupervisedDataset, with attributestrain_X,train_Y, and, optionally,train_Yvar.- Returns:
A dict of keyword arguments that can be used to initialize a
Model, with keystrain_X,train_Y, and, optionally,train_Yvar.- Return type:
dict[str, BotorchContainer | Tensor]
- transform_inputs(X, input_transform=None)[source]
Transform inputs.
- Parameters:
X (Tensor) – A tensor of inputs
input_transform (Module | None) – A Module that performs the input transformation.
- Returns:
A tensor of transformed inputs
- Return type:
Tensor
- train(mode=True)[source]
Put the model in
trainmode. Reverts to the original inputs if intrainmode (mode=True) or sets transformed inputs if inevalmode (mode=False).- Parameters:
mode (bool) – A boolean denoting whether to put in
trainorevalmode. IfFalse, model is put inevalmode.- Return type:
Self
- property dtypes_of_buffers: set[dtype]
- class botorch.models.model.FantasizeMixin[source]
Bases:
ABCMixin to add a
fantasizemethod to aModel.Example
- class BaseModel:
def __init__(self, …): def condition_on_observations(self, …): def posterior(self, …): def transform_inputs(self, …):
- class ModelThatCanFantasize(BaseModel, FantasizeMixin):
- def __init__(self, args):
super().__init__(args)
model = ModelThatCanFantasize(…) model.fantasize(X)
- abstractmethod condition_on_observations(X, Y)[source]
Classes that inherit from
FantasizeMixinmust implement acondition_on_observationsmethod.- Parameters:
X (Tensor)
Y (Tensor)
- Return type:
Self
- abstractmethod posterior(X, *args, observation_noise=False)[source]
Classes that inherit from
FantasizeMixinmust implement aposteriormethod.- Parameters:
X (Tensor)
observation_noise (bool)
- Return type:
- abstractmethod transform_inputs(X, input_transform=None)[source]
Classes that inherit from
FantasizeMixinmust implement atransform_inputsmethod.- Parameters:
X (Tensor)
input_transform (Module | None)
- Return type:
Tensor
- fantasize(X, sampler, observation_noise=None, **kwargs)[source]
Construct a fantasy model.
Constructs a fantasy model in the following fashion: (1) compute the model posterior at
X, including observation noise. Ifobservation_noiseis a Tensor, use it directly as the observation noise to add. (2) sample from this posterior (usingsampler) to generate “fake” observations. (3) condition the model on the new fake observations.- Parameters:
X (Tensor) – A
batch_shape x n' x d-dim Tensor, wheredis the dimension of the feature space,n'is the number of points per batch, andbatch_shapeis the batch shape (must be compatible with the batch shape of the model).sampler (MCSampler) – The sampler used for sampling from the posterior at
X.observation_noise (Tensor | None) – A
model_batch_shape x 1 x m-dim tensor or amodel_batch_shape x n' x m-dim tensor containing the average noise for each batch and output, wheremis the number of outputs.noisemust be in the outcome-transformed space if an outcome transform is used. If None and using an inferred noise likelihood, the noise will be the inferred noise level. If using a fixed noise likelihood, the mean across the observation noise in the training data is used as observation noise.kwargs (Any) – Will be passed to
model.condition_on_observations
- Returns:
The constructed fantasy model.
- Return type:
Self
- class botorch.models.model.ModelList(*models)[source]
Bases:
ModelA multi-output Model represented by a list of independent models.
All BoTorch models are acceptable as inputs. The cost of this flexibility is that
ModelListdoes not support all methods that may be implemented by its component models. One use case forModelListis combining a regression model and a deterministic model in one multi-output container model, e.g. for cost-aware or multi-objective optimization where one of the outcomes is a deterministic function of the inputs.- Parameters:
*models (Model) – A variable number of models.
Example
>>> m_1 = SingleTaskGP(train_X, train_Y) >>> m_2 = GenericDeterministicModel(lambda x: x.sum(dim=-1)) >>> m_12 = ModelList(m_1, m_2) >>> m_12.posterior(test_X)
- posterior(X, output_indices=None, observation_noise=False, posterior_transform=None)[source]
Computes the posterior over model outputs at the provided points.
- Note: The input transforms should be applied here using
self.transform_inputs(X)after theself.eval()call and before anymodel.forwardormodel.likelihoodcalls.
- Parameters:
X (Tensor) – A
b x q x d-dim Tensor, wheredis the dimension of the feature space,qis the number of points considered jointly, andbis the batch dimension.output_indices (list[int] | None) – A list of indices, corresponding to the outputs over which to compute the posterior (if the model is multi-output). Can be used to speed up computation if only a subset of the model’s outputs are required for optimization. If omitted, computes the posterior over all model outputs.
observation_noise (bool | Tensor) – If True, add the observation noise from the respective likelihoods to the posterior. If a Tensor of shape
(batch_shape) x q x m, use it directly as the observation noise (withobservation_noise[...,i]added to the posterior of thei-th model).observation_noiseis assumed to be in the outcome-transformed space, if an outcome transform is used by the model.posterior_transform (Callable[[PosteriorList], Posterior] | None) – An optional PosteriorTransform.
- Returns:
A
Posteriorobject, representing a batch ofbjoint distributions overqpoints andmoutputs each.- Return type:
- property batch_shape: Size
The batch shape of the model.
This is a batch shape from an I/O perspective, independent of the internal representation of the model (as e.g. in BatchedMultiOutputGPyTorchModel). For a model with
moutputs, atest_batch_shape x q x d-shaped inputXto theposteriormethod returns a Posterior object over an output of shapebroadcast(test_batch_shape, model.batch_shape) x q x m.
- property num_outputs: int
The number of outputs of the model.
Equal to the sum of the number of outputs of the individual models in the ModelList.
- subset_output(idcs)[source]
Subset the model along the output dimension.
- Parameters:
idcs (list[int]) – The output indices to subset the model to. Relative to the overall number of outputs of the model.
- Returns:
A
Model(either aModelListor one of the submodels) with the outputs subset to the indices inidcs.- Return type:
Internally, this drops (if single-output) or subsets (if multi-output) the constituent models and returns them as a
ModelList. If the result is a single (possibly subset) model from the list, returns this model (instead of forming a degenerate single-modelModelList). For instance, ifm = ModelList(m1, m2)withm1a two-output model andm2a single-output model, thenm.subset_output([1]) `` will return the model ``m1subset to its second output.
- transform_inputs(X)[source]
Individually transform the inputs for each model.
- Parameters:
X (Tensor) – A tensor of inputs.
- Returns:
A list of tensors of transformed inputs.
- Return type:
list[Tensor]
- load_state_dict(state_dict, strict=True, keep_transforms=True, assign=False)[source]
Initialize the fully Bayesian models before loading the state dict.
- Parameters:
state_dict (Mapping[str, Any])
strict (bool)
keep_transforms (bool)
assign (bool)
- Return type:
None
- fantasize(X, sampler, observation_noise=None, evaluation_mask=None, **kwargs)[source]
Construct a fantasy model.
Constructs a fantasy model in the following fashion: (1) compute the model posterior at
X(including observation noise ifobservation_noise=True). (2) sample from this posterior (usingsampler) to generate “fake” observations. (3) condition the model on the new fake observations.- Parameters:
X (Tensor) – A
batch_shape x n' x d-dim Tensor, wheredis the dimension of the feature space,n'is the number of points per batch, andbatch_shapeis the batch shape (must be compatible with the batch shape of the model).sampler (MCSampler) – The sampler used for sampling from the posterior at
X. If evaluation_mask is not None, this must be aListSampler.observation_noise (Tensor | None) – A
model_batch_shape x 1 x m-dim tensor or amodel_batch_shape x n' x m-dim tensor containing the average noise for each batch and output, wheremis the number of outputs.noisemust be in the outcome-transformed space if an outcome transform is used. If None, then the noise will be the inferred noise level.evaluation_mask (Tensor | None) – A
n' x m-dim tensor of booleans indicating which outputs should be fantasized for a given design. This uses the same evaluation mask for all batches.kwargs (Any)
- Returns:
The constructed fantasy model.
- Return type:
- class botorch.models.model.ModelDict(**models)[source]
Bases:
ModuleDictA lightweight container mapping model names to models.
Initialize a
ModelDict.- Parameters:
models (Model) – An arbitrary number of models. Each model can be any type of BoTorch
Model, including multi-output models andModelList.
GPyTorch Model API
Abstract model class for all GPyTorch-based botorch models.
To implement your own, simply inherit from both the provided classes and a GPyTorch Model class such as an ExactGP.
- class botorch.models.gpytorch.GPyTorchModel(*args, **kwargs)[source]
Bases:
Model,ABCAbstract base class for models based on GPyTorch models.
The easiest way to use this is to subclass a model from a GPyTorch model class (e.g. an
ExactGP) and thisGPyTorchModel. See e.g.SingleTaskGP.Initialize internal Module state, shared by both nn.Module and ScriptModule.
- Parameters:
args (Any)
kwargs (Any)
- likelihood: Likelihood
- property batch_shape: Size
The batch shape of the model.
This is a batch shape from an I/O perspective, independent of the internal representation of the model (as e.g. in BatchedMultiOutputGPyTorchModel). For a model with
moutputs, atest_batch_shape x q x d-shaped inputXto theposteriormethod returns a Posterior object over an output of shapebroadcast(test_batch_shape, model.batch_shape) x q x m.
- property num_outputs: int
The number of outputs of the model.
- posterior(X, observation_noise=False, posterior_transform=None, **kwargs)[source]
Computes the posterior over model outputs at the provided points.
- Parameters:
X (Tensor) – A
(batch_shape) x q x d-dim Tensor, wheredis the dimension of the feature space andqis the number of points considered jointly.observation_noise (bool | Tensor) – If True, add the observation noise from the likelihood to the posterior. If a Tensor, use it directly as the observation noise (must be of shape
(batch_shape) x q). It is assumed to be in the outcome-transformed space if an outcome transform is used.posterior_transform (PosteriorTransform | None) – An optional PosteriorTransform.
kwargs (Any)
- Returns:
A
GPyTorchPosteriorobject, representing a batch ofbjoint distributions overqpoints. Includes observation noise if specified.- Return type:
- condition_on_observations(X, Y, noise=None, **kwargs)[source]
Condition the model on new observations.
- Parameters:
X (Tensor) – A
batch_shape x n' x d-dim Tensor, wheredis the dimension of the feature space,n'is the number of points per batch, andbatch_shapeis the batch shape (must be compatible with the batch shape of the model).Y (Tensor) – A
batch_shape' x n x m-dim Tensor, wheremis the number of model outputs,n'is the number of points per batch, andbatch_shape'is the batch shape of the observations.batch_shape'must be broadcastable tobatch_shapeusing standard broadcasting semantics. IfYhas fewer batch dimensions thanX, it is assumed that the missing batch dimensions are the same for allY.noise (Tensor | None) – If not
None, a tensor of the same shape asYrepresenting the associated noise variance.kwargs (Any) – Passed to
self.get_fantasy_model.
- Returns:
A
Modelobject of the same type, representing the original model conditioned on the new observations(X, Y)(and possibly noise observations passed in via kwargs).- Return type:
Example
>>> train_X = torch.rand(20, 2) >>> train_Y = torch.sin(train_X[:, :1]) + torch.cos(train_X[:, 1:]) >>> model = SingleTaskGP(train_X, train_Y) >>> model.eval() >>> test_X = torch.rand(10, 2) # Need to evaluate once to fill test independent caches # so that condition_on_observations works. >>> model(test_X) >>> new_X = torch.rand(5, 2) >>> new_Y = torch.sin(new_X[:, :1]) + torch.cos(new_X[:, 1:]) >>> model = model.condition_on_observations(X=new_X, Y=new_Y)
- load_state_dict(state_dict, strict=True, keep_transforms=True, assign=False)[source]
Load the model state.
- Parameters:
state_dict (Mapping[str, Any]) – A dict containing the state of the model.
strict (bool) – A boolean indicating whether to strictly enforce that the keys.
keep_transforms (bool) – A boolean indicating whether to keep the input and outcome transforms. Doing so is useful when loading a model that was trained on a full set of data, and is later loaded with a subset of the data.
assign (bool) – When set to
False, the properties of the tensors in the current module are preserved whereas setting it toTruepreserves properties of the Tensors in the state dict. The only exception is therequires_gradfield ofParameterfor which the value from the module is preserved. Default:False.
- Return type:
None
- class botorch.models.gpytorch.BatchedMultiOutputGPyTorchModel(*args, **kwargs)[source]
Bases:
GPyTorchModelBase class for batched multi-output GPyTorch models with independent outputs.
This model should be used when the same training data is used for all outputs. Outputs are modeled independently by using a different batch for each output.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- Parameters:
args (Any)
kwargs (Any)
- static get_batch_dimensions(train_X, train_Y)[source]
Get the raw batch shape and output-augmented batch shape of the inputs.
- Parameters:
train_X (Tensor) – A
n x dorbatch_shape x n x d(batch mode) tensor of training features.train_Y (Tensor) – A
n x morbatch_shape x n x m(batch mode) tensor of training observations.
- Returns:
2-element tuple containing
The
input_batch_shapeThe output-augmented batch shape:
input_batch_shape x (m)
- Return type:
tuple[Size, Size]
- property batch_shape: Size
The batch shape of the model.
This is a batch shape from an I/O perspective, independent of the internal representation of the model (as e.g. in BatchedMultiOutputGPyTorchModel). For a model with
moutputs, atest_batch_shape x q x d-shaped inputXto theposteriormethod returns a Posterior object over an output of shapebroadcast(test_batch_shape, model.batch_shape) x q x m.
- posterior(X, output_indices=None, observation_noise=False, posterior_transform=None)[source]
Computes the posterior over model outputs at the provided points.
- Parameters:
X (Tensor) – A
(batch_shape) x q x d-dim Tensor, wheredis the dimension of the feature space andqis the number of points considered jointly.output_indices (list[int] | None) – A list of indices, corresponding to the outputs over which to compute the posterior (if the model is multi-output). Can be used to speed up computation if only a subset of the model’s outputs are required for optimization. If omitted, computes the posterior over all model outputs.
observation_noise (bool | Tensor) – If True, add the observation noise from the likelihood to the posterior. If a Tensor, use it directly as the observation noise (must be of shape
(batch_shape) x q x m).posterior_transform (PosteriorTransform | None) – An optional PosteriorTransform.
- Returns:
A
GPyTorchPosteriorobject, representingbatch_shapejoint distributions overqpoints and the outputs selected byoutput_indiceseach. Includes observation noise if specified.- Return type:
- condition_on_observations(X, Y, **kwargs)[source]
Condition the model on new observations.
- Parameters:
X (Tensor) – A
batch_shape x n' x d-dim Tensor, wheredis the dimension of the feature space,mis the number of points per batch, andbatch_shapeis the batch shape (must be compatible with the batch shape of the model).Y (Tensor) – A
batch_shape' x n' x m-dim Tensor, wheremis the number of model outputs,n'is the number of points per batch, andbatch_shape'is the batch shape of the observations.batch_shape'must be broadcastable tobatch_shapeusing standard broadcasting semantics. IfYhas fewer batch dimensions thanX, it is assumed that the missing batch dimensions are the same for allY.kwargs (Any)
- Returns:
A
BatchedMultiOutputGPyTorchModelobject of the same type withn + n'training examples, representing the original model conditioned on the new observations(X, Y)(and possibly noise observations passed in via kwargs).- Return type:
Example
>>> train_X = torch.rand(20, 2) >>> train_Y = torch.cat( >>> [torch.sin(train_X[:, 0]), torch.cos(train_X[:, 1])], -1 >>> ) >>> model = SingleTaskGP(train_X, train_Y) >>> new_X = torch.rand(5, 2) >>> new_Y = torch.cat([torch.sin(new_X[:, 0]), torch.cos(new_X[:, 1])], -1) >>> model = model.condition_on_observations(X=new_X, Y=new_Y)
- class botorch.models.gpytorch.ModelListGPyTorchModel(*models)[source]
Bases:
ModelList,GPyTorchModel,ABCAbstract base class for models based on multi-output GPyTorch models.
This is meant to be used with a gpytorch ModelList wrapper for independent evaluation of submodels. Those submodels can themselves be multi-output models, in which case the task covariances will be ignored.
- Parameters:
*models (Model) – A variable number of models.
Example
>>> m_1 = SingleTaskGP(train_X, train_Y) >>> m_2 = GenericDeterministicModel(lambda x: x.sum(dim=-1)) >>> m_12 = ModelList(m_1, m_2) >>> m_12.posterior(test_X)
- property batch_shape: Size
The batch shape of the model.
This is a batch shape from an I/O perspective, independent of the internal representation of the model (as e.g. in BatchedMultiOutputGPyTorchModel). For a model with
moutputs, atest_batch_shape x q x d-shaped inputXto theposteriormethod returns a Posterior object over an output of shapebroadcast(test_batch_shape, model.batch_shape) x q x m.
- load_state_dict(state_dict, strict=True, assign=False)[source]
Initialize the fully Bayesian models before loading the state dict.
- Parameters:
state_dict (Mapping[str, Any])
strict (bool)
assign (bool)
- Return type:
None
- posterior(X, output_indices=None, observation_noise=False, posterior_transform=None)[source]
Computes the posterior over model outputs at the provided points. If any model returns a MultitaskMultivariateNormal posterior, then that will be split into individual MVNs per task, with inter-task covariance ignored.
- Parameters:
X (Tensor) – A
b x q x d-dim Tensor, wheredis the dimension of the feature space,qis the number of points considered jointly, andbis the batch dimension.output_indices (list[int] | None) – A list of indices, corresponding to the outputs over which to compute the posterior (if the model is multi-output). Can be used to speed up computation if only a subset of the model’s outputs are required for optimization. If omitted, computes the posterior over all model outputs.
observation_noise (bool | Tensor) – If True, add the observation noise from the respective likelihoods to the posterior. If a Tensor of shape
(batch_shape) x q x m, use it directly as the observation noise (withobservation_noise[...,i]added to the posterior of thei-th model).posterior_transform (PosteriorTransform | None) – An optional PosteriorTransform.
- Returns:
- If no
posterior_transformis provided and the component models have no
outcome_transform, or if the component models only use linear outcome transforms likeStandardize(i.e. notLog), returns aGPyTorchPosteriororGaussianMixturePosteriorobject, representingbatch_shapejoint distributions overqpoints and the outputs selected byoutput_indiceseach. Includes measurement noise ifobservation_noiseis specified.
- If no
- If no
posterior_transformis provided and component models have nonlinear transforms like
Log, returns aPosteriorListwith sub-posteriors of typeTransformedPosterior
- If no
- If
posterior_transformis provided, that posterior transform will be applied and will determine the return type. This could be any subclass of
Posterior, but common choices give aGPyTorchPosterior.
- If
- Return type:
- condition_on_observations(X, Y, **kwargs)[source]
Condition the model on new observations.
- Parameters:
X (Tensor) – A
batch_shape x n' x d-dim Tensor, wheredis the dimension of the feature space,n'is the number of points per batch, andbatch_shapeis the batch shape (must be compatible with the batch shape of the model).Y (Tensor) – A
batch_shape' x n' x m-dim Tensor, wheremis the number of model outputs,n'is the number of points per batch, andbatch_shape'is the batch shape of the observations.batch_shape'must be broadcastable tobatch_shapeusing standard broadcasting semantics. IfYhas fewer batch dimensions thanX, it is assumed that the missing batch dimensions are the same for allY.kwargs (Any)
- Returns:
A
Modelobject of the same type, representing the original model conditioned on the new observations(X, Y)(and possibly noise observations passed in via kwargs).- Return type:
- class botorch.models.gpytorch.MultiTaskGPyTorchModel(*args, **kwargs)[source]
Bases:
GPyTorchModel,ABCAbstract base class for multi-task models based on GPyTorch models.
This class provides the
posteriormethod to models that implement a “long-format” multi-task GP in the style ofMultiTaskGP.Initialize internal Module state, shared by both nn.Module and ScriptModule.
- Parameters:
args (Any)
kwargs (Any)
- posterior(X, output_indices=None, observation_noise=False, posterior_transform=None)[source]
Computes the posterior over model outputs at the provided points.
- Parameters:
X (Tensor) – A tensor of shape
batch_shape x q x dorbatch_shape x q x (d + 1), wheredis the dimension of the feature space (not including task indices) andqis the number of points considered jointly. The+ 1dimension is the optional task feature / index. If given, the model produces the outputs for the given task indices. If omitted, the model produces outputs for tasks inself._output_tasks(specified asoutput_taskswhile constructing the model), which can be overwritten usingoutput_indices.output_indices (list[int] | None) – A list of task values over which to compute the posterior. Only used if
Xdoes not include the task feature. If omitted, defaults toself._output_tasks.observation_noise (bool | Tensor) – If True, add observation noise from the respective likelihoods. If a Tensor, specifies the observation noise levels to add.
posterior_transform (PosteriorTransform | None) – An optional PosteriorTransform.
- Returns:
A
GPyTorchPosteriorobject, representingbatch_shapejoint distributions overqpoints. If the task features are included inX, the posterior will be single output. Otherwise, the posterior will be single or multi output corresponding to the tasks included in either theoutput_indicesorself._output_tasks.- Return type:
Deterministic Model API
Deterministic Models: Simple wrappers that allow the usage of deterministic mappings via the BoTorch Model and Posterior APIs.
Deterministic models are useful for expressing known input-output relationships
within the BoTorch Model API. This is useful e.g. for multi-objective
optimization with known objective functions (e.g. the number of parameters of a
Neural Network in the context of Neural Architecture Search is usually a known
function of the architecture configuration), or to encode cost functions for
cost-aware acquisition utilities. Cost-aware optimization is desirable when
evaluations have a cost that is heterogeneous, either in the inputs X or in a
particular fidelity parameter that directly encodes the fidelity of the
observation. GenericDeterministicModel supports arbitrary deterministic
functions, while AffineFidelityCostModel is a particular cost model for
multi-fidelity optimization. Other use cases of deterministic models include
representing approximate GP sample paths, e.g. Matheron paths obtained
with get_matheron_path_model, which allows them to be substituted in acquisition
functions or in other places where a Model is expected.
- class botorch.models.deterministic.DeterministicModel(weights=None)[source]
Bases:
EnsembleModelAbstract base class for deterministic models.
Initialize the ensemble model.
- Parameters:
weights (Tensor | None) – Optional weights for the ensemble members. If None, the model weights will default to uniform in the corresponding mixture posterior.
- class botorch.models.deterministic.GenericDeterministicModel(f, num_outputs=1, batch_shape=None)[source]
Bases:
DeterministicModelA generic deterministic model constructed from a callable.
Example
>>> f = lambda x: x.sum(dim=-1, keep_dims=True) >>> model = GenericDeterministicModel(f)
- Parameters:
f (Callable[[Tensor], Tensor]) – A callable mapping a
batch_shape x n x d-dim input tensorXto abatch_shape x n x m-dimensional output tensor (the outcome dimensionmmust be explicit, even ifm=1).num_outputs (int) – The number of outputs
m.batch_shape (torch.Size | None)
- property batch_shape: Size | None
The batch shape of the model.
- class botorch.models.deterministic.AffineDeterministicModel(a, b=0.01)[source]
Bases:
DeterministicModelAn affine deterministic model.
Affine deterministic model from weights and offset terms.
A simple model of the form
y[…, m] = b[m] + sum_{i=1}^d a[i, m] * X[…, i]
- Parameters:
a (Tensor) – A
d x m-dim tensor of linear weights, wheremis the number of outputs (must be explicit ifm=1)b (Tensor | float) – The affine (offset) term. Either a float (for single-output models or if the offset is shared), or a
m-dim tensor (with different offset values for themdifferent outputs).
- class botorch.models.deterministic.PosteriorMeanModel(model)[source]
Bases:
DeterministicModelA deterministic model that always returns the posterior mean.
- Parameters:
model (Model) – The base model.
- forward(X)[source]
Compute the (deterministic) model output at X.
- Parameters:
X (Tensor) – A
batch_shape x n x d-dim input tensorX.- Returns:
A
batch_shape x n x m-dimensional output tensor (the outcome dimensionmmust be explicit ifm=1).- Return type:
Tensor
- property num_outputs: int
The number of outputs of the model.
- property batch_shape: Size
The batch shape of the model.
- class botorch.models.deterministic.FixedSingleSampleModel(model, w=None, dim=None, jitter=1e-08, dtype=None, device=None)[source]
Bases:
DeterministicModelA deterministic model defined by a single sample
w.Given a base model
fand a fixed samplew, the model always outputsy = f_mean(x) + f_stddev(x) * w
We assume the outcomes are uncorrelated here.
- Parameters:
model (Model) – The base model.
w (Tensor | None) – A 1-d tensor with length model.num_outputs. If None, draw it from a standard normal distribution.
dim (int | None) – dimensionality of w. If None and w is not provided, draw w samples of size model.num_outputs.
jitter (float | None) – jitter value to be added for numerical stability, 1e-8 by default.
dtype (torch.dtype | None) – dtype for w if specified
device (torch.dtype | None) – device for w if specified
- class botorch.models.deterministic.MatheronPathModel(model, sample_shape=None, ensemble_as_batch=False, seed=None)[source]
Bases:
DeterministicModelA deterministic model that returns a Matheron path sample.
A Matheron path is a continuous sample path of a GP, obtained by drawing random Fourier features from a GP prior and a pathwise update rule based on the observed data.
Example
>>> model = SingleTaskGP(train_X, train_Y) >>> matheron_model = MatheronPathModel(model) >>> output = matheron_model(test_X)
- Parameters:
model (Model) – The base model.
sample_shape (Size | None) – The shape of the sample paths to be drawn, if an ensemble of sample paths is desired. If this is specified, the resulting deterministic model will behave as if the
sample_shapeis prepended to the model’sbatch_shape.ensemble_as_batch (bool) – If True, and model is an ensemble model, the resulting model will treat the ensemble dimension as a batch dimension.
seed (int | None) – Random seed for reproducible path generation. If None, no specific seed is set.
- forward(X)[source]
Evaluate the Matheron path at X.
- Parameters:
X (Tensor) – A
batch_shape x n x d-dim input tensorX.- Returns:
A
[sample_shape x] batch_shape x n x m-dimensional output tensor.- Return type:
Tensor
- property num_outputs: int
The number of outputs of the model.
- property batch_shape: Size
The batch shape of the model.
Ensemble Model API
Ensemble Models: Simple wrappers that allow the usage of ensembles via the BoTorch Model and Posterior APIs.
- class botorch.models.ensemble.EnsembleModel(weights=None)[source]
Bases:
Model,ABCAbstract base class for ensemble models.
Initialize the ensemble model.
- Parameters:
weights (Tensor | None) – Optional weights for the ensemble members. If None, the model weights will default to uniform in the corresponding mixture posterior.
- abstractmethod forward(X)[source]
Compute the (ensemble) model output at X.
- Parameters:
X (Tensor) – A
batch_shape x n x d-dim input tensorX.- Returns:
A
batch_shape x s x n x m-dimensional output tensor wheresis the size of the ensemble.- Return type:
Tensor
- property num_outputs: int
The number of outputs of the model.
- posterior(X, output_indices=None, posterior_transform=None, **kwargs)[source]
Compute the ensemble posterior at X.
- Parameters:
X (Tensor) – A
batch_shape x q x d-dim input tensorX.output_indices (list[int] | None) – A list of indices, corresponding to the outputs over which to compute the posterior. If omitted, computes the posterior over all model outputs.
posterior_transform (PosteriorTransform | None) – An optional PosteriorTransform.
kwargs (Any)
- Returns:
An
EnsemblePosteriorobject, representingbatch_shapejoint posteriors overnpoints and the outputs selected byoutput_indices.- Return type:
Models
Additive GP Models
- class botorch.models.additive_gp.OrthogonalAdditiveGP(train_X, train_Y, covar_module=None, second_order=False, likelihood=None, mean_module=None, outcome_transform=<class 'botorch.utils.types.DEFAULT'>, input_transform=None)[source]
Bases:
SingleTaskGPA Gaussian Process with Orthogonal Additive Kernel for interpretable modeling.
This GP model uses an OrthogonalAdditiveKernel which decomposes the function into interpretable additive components: a bias term, first-order effects for each input dimension, and optionally second-order interaction terms.
The model supports posterior inference of individual additive components when
infer_all_components=Trueis passed to theposteriormethod.Initialize the OrthogonalAdditiveGP.
- Parameters:
train_X (Tensor) – Training inputs (batch_shape x n x d) in [0, 1]^d.
train_Y (Tensor) – Training outputs (batch_shape x n x 1).
covar_module (OrthogonalAdditiveKernel | None) – An OrthogonalAdditiveKernel instance. If None, creates a default OrthogonalAdditiveKernel with dim inferred from train_X. The kernel’s default base kernel uses per-dimension lengthscales, so each additive component can adapt its smoothness independently.
second_order (bool) – If True and covar_module is None, enables second-order interactions in the default kernel. Ignored if covar_module is provided.
likelihood (Likelihood | None) – Optional likelihood (defaults to GaussianLikelihood).
mean_module (Mean | None) – Optional mean module (defaults to ConstantMean).
outcome_transform (OutcomeTransform | _DefaultType | None) – Optional outcome transform.
input_transform (InputTransform | None) – Optional input transform.
- Raises:
TypeError – If covar_module is provided but is not an OrthogonalAdditiveKernel.
- posterior(X, output_indices=None, observation_noise=False, posterior_transform=None, infer_all_components=False)[source]
Posterior inference of the additive Gaussian process.
- Parameters:
X (Tensor) – The input tensor of shape (batch_shape x n x d).
output_indices (list[int] | None) – Not supported for this model.
observation_noise (bool) – Whether to add observation noise to the posterior.
posterior_transform (PosteriorTransform | None) – Optional posterior transform.
infer_all_components (bool) – If True, returns a posterior with a batch dimension corresponding to each additive component (bias, first-order effects, and optionally second-order interactions). The number of components is 1 + d (first-order only) or 1 + d + d*(d-1)/2 (with second-order interactions).
- Returns:
The posterior distribution at X.
- Return type:
- property component_indices: dict[str, Tensor]
Returns component indices from the OrthogonalAdditiveKernel.
- evaluate_first_order_on_grid(grid_1d)[source]
Evaluate first-order component posteriors on 1D grids.
Uses diagonal test inputs with the existing GPyTorch posterior infrastructure. Each first-order component is evaluated on its own independent 1D grid.
Supports models with batch-shaped training data. The grid itself should be 1D; batch dimensions come from the model’s training data.
- Parameters:
grid_1d (Tensor) – 1D tensor of m points in [0, 1].
- Returns:
bias: (mean, variance) — shape
(*batch_shape,)each, averaged over the grid (should be approximately constant).first_order: (means, variances) — shape
(d, *batch_shape, m)each, posterior statistics on 1D grids.
- Return type:
Tuple of
Example
>>> grid = torch.linspace(0, 1, 50) >>> (bias_mean, bias_var), (fo_mean, fo_var) = \ ... model.evaluate_first_order_on_grid(grid) >>> # fo_mean[i] is component i's posterior mean on the 1D grid
- property num_components: int
Total number of additive components (bias + first-order [+ second-order]).
Cost Models (for cost-aware optimization)
Cost models to be used with multi-fidelity optimization.
Cost models are useful for defining known cost functions when the cost of an evaluation is heterogeneous in fidelity. For a full worked example, see the tutorial on continuous multi-fidelity Bayesian Optimization.
- class botorch.models.cost.AffineFidelityCostModel(fidelity_weights=None, fixed_cost=0.01)[source]
Bases:
DeterministicModelDeterministic, affine cost model operating on fidelity parameters.
For each (q-batch) element of a candidate set
X, this module computes a cost of the formcost = fixed_cost + sum_j weights[j] * X[fidelity_dims[j]]
For a full worked example, see the tutorial on continuous multi-fidelity Bayesian Optimization.
Example
>>> from botorch.models import AffineFidelityCostModel >>> from botorch.acquisition.cost_aware import InverseCostWeightedUtility >>> cost_model = AffineFidelityCostModel( >>> fidelity_weights={6: 1.0}, fixed_cost=5.0 >>> ) >>> cost_aware_utility = InverseCostWeightedUtility(cost_model=cost_model)
- Parameters:
fidelity_weights (dict[int, float] | None) – A dictionary mapping a subset of columns of
X(the fidelity parameters) to its associated weight in the affine cost expression. If omitted, assumes that the last column ofXis the fidelity parameter with a weight of 1.0.fixed_cost (float) – The fixed cost of running a single candidate point (i.e. an element of a q-batch).
- forward(X)[source]
Evaluate the cost on a candidate set X.
Computes a cost of the form
cost = fixed_cost + sum_j weights[j] * X[fidelity_dims[j]]
for each element of the q-batch
- Parameters:
X (Tensor) – A
batch_shape x q x d'-dim tensor of candidate points.- Returns:
A
batch_shape x q x 1-dim tensor of costs.- Return type:
Tensor
- class botorch.models.cost.FixedCostModel(fixed_cost)[source]
Bases:
DeterministicModelDeterministic, fixed cost model.
For each (q-batch) element of a candidate set
X, this module computes a fixed cost per objective.- Parameters:
fixed_cost (Tensor) – A
m-dim tensor containing the fixed cost of evaluating each objective.
Contextual GP Models with Aggregate Rewards
- class botorch.models.contextual.SACGP(train_X, train_Y, train_Yvar, decomposition)[source]
Bases:
SingleTaskGPA GP using a Structural Additive Contextual(SAC) kernel.
- Parameters:
train_X (Tensor) – (n x d) X training data.
train_Y (Tensor) – (n x 1) Y training data.
train_Yvar (Tensor | None) – (n x 1) Noise variances of each training Y. If None, we use an inferred noise likelihood.
decomposition (dict[str, list[int]]) – Keys are context names. Values are the indexes of parameters belong to the context. The parameter indexes are in the same order across contexts.
- classmethod construct_inputs(training_data, decomposition)[source]
Construct
Modelkeyword arguments from a dict ofSupervisedDataset.- Parameters:
training_data (SupervisedDataset) – A
SupervisedDatasetcontaining the training data.decomposition (dict[str, list[int]]) – Dictionary of context names and their indexes of the corresponding active context parameters.
- Return type:
dict[str, Any]
- class botorch.models.contextual.LCEAGP(train_X, train_Y, train_Yvar, decomposition, train_embedding=True, cat_feature_dict=None, embs_feature_dict=None, embs_dim_list=None, context_weight_dict=None)[source]
Bases:
SingleTaskGPA GP using a Latent Context Embedding Additive (LCE-A) Kernel.
Note that the model does not support batch training. Input training data sets should have dim = 2.
- Parameters:
train_X (Tensor) – (n x d) X training data.
train_Y (Tensor) – (n x 1) Y training data.
train_Yvar (Tensor | None) – (n x 1) Noise variance of Y. If None, we use an inferred noise likelihood.
decomposition (dict[str, list[int]]) – Keys are context names. Values are the indexes of parameters belong to the context.
train_embedding (bool) – Whether to train the embedding layer or not. If False, the model will use pre-trained embeddings in embs_feature_dict.
cat_feature_dict (dict | None) – Keys are context names and values are list of categorical features i.e. {“context_name” : [cat_0, …, cat_k]}, where k is the number of categorical variables. If None, we use context names in the decomposition as the only categorical feature, i.e., k = 1.
embs_feature_dict (dict | None) – Pre-trained continuous embedding features of each context.
embs_dim_list (list[int] | None) – Embedding dimension for each categorical variable. The length equals the number of categorical features k. If None, the embedding dimension is set to 1 for each categorical variable.
context_weight_dict (dict | None) – Known population weights of each context.
- classmethod construct_inputs(training_data, decomposition, train_embedding=True, cat_feature_dict=None, embs_feature_dict=None, embs_dim_list=None, context_weight_dict=None)[source]
Construct
Modelkeyword arguments from a dict ofSupervisedDataset.- Parameters:
training_data (SupervisedDataset) – A
SupervisedDatasetcontaining the training data.decomposition (dict[str, list[str]]) – Dictionary of context names and the names of the corresponding active context parameters.
train_embedding (bool) – Whether to train the embedding layer or not.
cat_feature_dict (dict | None) – Keys are context names and values are list of categorical features i.e. {“context_name” : [cat_0, …, cat_k]}, where k is the number of categorical variables. If None, we use context names in the decomposition as the only categorical feature, i.e., k = 1.
embs_feature_dict (dict | None) – Pre-trained continuous embedding features of each context.
embs_dim_list (list[int] | None) – Embedding dimension for each categorical variable. The length equals the number of categorical features k. If None, the embedding dimension is set to 1 for each categorical variable.
context_weight_dict (dict | None) – Known population weights of each context.
- Return type:
dict[str, Any]
Contextual GP Models with Context Rewards
References
Q. Feng, B. Latham, H. Mao and E. Backshy. High-Dimensional Contextual Policy Search with Unknown Context Rewards using Bayesian Optimization. Advances in Neural Information Processing Systems 33, NeurIPS 2020.
- class botorch.models.contextual_multioutput.LCEMGP(train_X, train_Y, task_feature, train_Yvar=None, mean_module=None, covar_module=None, likelihood=None, context_cat_feature=None, context_emb_feature=None, embs_dim_list=None, output_tasks=None, all_tasks=None, outcome_transform=<class 'botorch.utils.types.DEFAULT'>, input_transform=None)[source]
Bases:
MultiTaskGPThe Multi-Task GP with the latent context embedding multioutput (LCE-M) kernel. See [Feng2020HDCPS] for a reference on the model and its use in Bayesian optimization.
- Parameters:
train_X (Tensor) – (n x d) X training data.
train_Y (Tensor) – (n x 1) Y training data.
task_feature (int) – Column index of train_X to get context indices.
train_Yvar (Tensor | None) – An optional (n x 1) tensor of observed variances of each training Y. If None, we infer the noise. Note that the inferred noise is common across all tasks.
mean_module (Module | None) – The mean function to be used. Defaults to
ConstantMean.covar_module (Module | None) – The module for computing the covariance matrix between the non-task features. Defaults to
RBFKernel.likelihood (Likelihood | None) – A likelihood. The default is selected based on
train_Yvar. Iftrain_Yvaris None, a standardGaussianLikelihoodwith inferred noise level is used. Otherwise, a FixedNoiseGaussianLikelihood is used.context_cat_feature (Tensor | None) – (n_contexts x k) one-hot encoded context features. Rows are ordered by context indices, where k is the number of categorical variables. If None, task indices will be used and k = 1.
context_emb_feature (Tensor | None) – (n_contexts x m) pre-given continuous embedding features. Rows are ordered by context indices.
embs_dim_list (list[int] | None) – Embedding dimension for each categorical variable. The length equals k. If None, the embedding dimension is set to 1 for each categorical variable.
output_tasks (list[int] | None) – A list of task indices for which to compute model outputs for. If omitted, return outputs for all task indices.
all_tasks (list[int] | None) – By default, multi-task GPs infer the list of all tasks from the task features in
train_X. This is an experimental feature that enables creation of multi-task GPs with tasks that don’t appear in the training data. Note that when a task is not observed, the corresponding task covariance will heavily depend on random initialization and may behave unexpectedly.outcome_transform (OutcomeTransform | _DefaultType | None) – An outcome transform that is applied to the training data during instantiation and to the posterior during inference (that is, the
Posteriorobtained by calling.posterioron the model will be on the original scale). We use aStandardizetransform if nooutcome_transformis specified. Pass downNoneto use no outcome transform.input_transform (InputTransform | None) – An input transform that is applied in the model’s forward pass.
- task_covar_module(task_idcs)[source]
Compute the task covariance matrix for a given tensor of task / context indices.
- Parameters:
task_idcs (Tensor) – Task index tensor of shape (n x 1) or (b x n x 1).
- Returns:
Task covariance matrix of shape (b x n x n).
- Return type:
Tensor
- forward(x)[source]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.- Parameters:
x (Tensor)
- Return type:
MultivariateNormal
- classmethod construct_inputs(training_data, task_feature, output_tasks=None, context_cat_feature=None, context_emb_feature=None, embs_dim_list=None, **kwargs)[source]
Construct
Modelkeyword arguments from a dataset and other args.- Parameters:
training_data (SupervisedDataset | MultiTaskDataset) – A
SupervisedDatasetor aMultiTaskDataset.task_feature (int) – Column index of embedded task indicator features.
output_tasks (list[int] | None) – A list of task indices for which to compute model outputs for. If omitted, return outputs for all task indices.
context_cat_feature (Tensor | None) – (n_contexts x k) one-hot encoded context features. Rows are ordered by context indices, where k is the number of categorical variables. If None, task indices will be used and k = 1.
context_emb_feature (Tensor | None) – (n_contexts x m) pre-given continuous embedding features. Rows are ordered by context indices.
embs_dim_list (list[int] | None) – Embedding dimension for each categorical variable. The length equals k. If None, the embedding dimension is set to 1 for each categorical variable.
- Return type:
dict[str, Any]
Fully Bayesian GP Models
Gaussian Process Regression models with fully Bayesian inference.
Fully Bayesian models use Bayesian inference over model hyperparameters, such
as lengthscales and noise variance, learning a posterior distribution for the
hyperparameters using the No-U-Turn-Sampler (NUTS). This is followed by
sampling a small set of hyperparameters (often ~16) from the posterior
that we will use for model predictions and for computing acquisition function
values. By contrast, our “standard” models (e.g.
SingleTaskGP) learn only a single best value for each hyperparameter using
MAP. The fully Bayesian method generally results in a better and more
well-calibrated model, but is more computationally intensive. For a full
description, see [Eriksson2021saasbo].
We use a lightweight JAX/NumPyro implementation of a Matern-5/2 kernel as there are some performance issues with running NUTS on top of standard GPyTorch models. The resulting hyperparameter samples are loaded into a batched GPyTorch model after fitting.
References:
- botorch.models.fully_bayesian.matern52_kernel(X, lengthscale)[source]
Matern-5/2 kernel.
- Parameters:
X (Array)
lengthscale (Array)
- Return type:
Array
- botorch.models.fully_bayesian.linear_kernel(X, weight_variance)[source]
Linear kernel.
Supports batched inputs: X can be (…, n, d).
- Parameters:
X (Array)
weight_variance (Array)
- Return type:
Array
- botorch.models.fully_bayesian.compute_dists(X, lengthscale)[source]
Compute kernel distances.
Supports batched inputs: X can be (…, n, d) and lengthscale (…, d).
- Parameters:
X (Array)
lengthscale (Array)
- Return type:
Array
- botorch.models.fully_bayesian.reshape_and_detach(target, new_value)[source]
Detach and reshape
new_valueto matchtarget.- Parameters:
target (Tensor)
new_value (Tensor)
- Return type:
None
- class botorch.models.fully_bayesian.PyroModel(use_input_warping=False, indices_to_warp=None, eps=1e-07)[source]
Bases:
objectBase class for a Pyro model; used to assist in learning hyperparameters.
This class and its subclasses are not a standard BoTorch models; instead the subclasses are used as inputs to a
SaasFullyBayesianSingleTaskGP, which should then have its hyperparameters fit withfit_fully_bayesian_model_nuts. (By default, its subclassSaasPyroModelis used). APyroModel’ssamplemethod should specify lightweight PyTorch functionality, which will be used for fast model fitting with NUTS. The utility ofPyroModelis in enabling fast fitting with NUTS, since we would otherwise need to use GPyTorch, which is computationally infeasible in combination with Pyro.Initialize the PyroModel.
- Parameters:
use_input_warping (bool) – A boolean indicating whether to use input warping.
indices_to_warp (list[int] | None) – An optional list of indices to warp. The default is to warp all inputs.
eps (float) – A small value that is used to ensure inputs are not 0 or 1, when using input warping.
- warp(X, c0, c1)[source]
Warp the input through a Kumaraswamy CDF.
- Parameters:
X (Tensor)
c0 (Tensor)
c1 (Tensor)
- Return type:
Tensor
- set_inputs(train_X, train_Y, train_Yvar=None, task_feature=None, task_rank=None)[source]
Set the training data.
- Parameters:
train_X (Tensor) – Training inputs (n x d)
train_Y (Tensor) – Training targets (n x 1)
train_Yvar (Tensor | None) – Observed noise variance (n x 1). Inferred if None.
task_feature (int | None) – The index of the task feature column. Used by multi-task mixins.
task_rank (int | None) – The number of learned task embeddings. Used by multi-task mixins.
- Return type:
None
- abstractmethod postprocess_mcmc_samples(mcmc_samples)[source]
Post-process the final MCMC samples.
- Parameters:
mcmc_samples (dict[str, Tensor])
- Return type:
dict[str, Tensor]
- abstractmethod get_dummy_mcmc_samples(num_mcmc_samples, **tkwargs)[source]
Return dummy MCMC samples for state dict loading.
Each subclass returns a dict of ones with the keys and shapes that
load_mcmc_samplesexpects.- Parameters:
num_mcmc_samples (int)
tkwargs (Any)
- Return type:
dict[str, Tensor]
- abstractmethod load_mcmc_samples(mcmc_samples)[source]
- Parameters:
mcmc_samples (dict[str, Tensor])
- Return type:
tuple[Mean, Kernel, Likelihood]
- class botorch.models.fully_bayesian.MaternPyroModel(use_input_warping=False, indices_to_warp=None, eps=1e-07)[source]
Bases:
PyroModelImplementation of a fully Bayesian model with a dimension-scaling prior.
MaternPyroModelis not a standard BoTorch model; instead, it is used as an input toFullyBayesianSingleTaskGP.Initialize the PyroModel.
- Parameters:
use_input_warping (bool) – A boolean indicating whether to use input warping.
indices_to_warp (list[int] | None) – An optional list of indices to warp. The default is to warp all inputs.
eps (float) – A small value that is used to ensure inputs are not 0 or 1, when using input warping.
- sample()[source]
Sample from the Matern pyro model.
This samples the mean, noise variance, (optional) outputscale, and lengthscales according to a dimension-scaled prior.
- Return type:
None
- sample_outputscale(concentration=None, rate=None)[source]
Sample the outputscale.
If the concentration or rate arguments are None, then an outputscale of 1 is used.
- Parameters:
concentration (float | None) – The concentration parameter for a GammaPrior.
rate (float | None) – The rate parameter for a GammaPrior.
- Returns:
The outputscale.
- Return type:
Array
- postprocess_mcmc_samples(mcmc_samples)[source]
Post-process the MCMC samples.
Converts JAX arrays to torch tensors.
- Parameters:
mcmc_samples (dict[str, Array])
- Return type:
dict[str, Tensor]
- class botorch.models.fully_bayesian.SaasPyroModel(use_input_warping=False, indices_to_warp=None, eps=1e-07)[source]
Bases:
MaternPyroModelImplementation of the sparse axis-aligned subspace priors (SAAS) model.
The SAAS model uses sparsity-inducing priors to identify the most important parameters. This model is suitable for high-dimensional BO with potentially hundreds of tunable parameters. See [Eriksson2021saasbo] for more details.
SaasPyroModelis not a standard BoTorch model; instead, it is used as an input toSaasFullyBayesianSingleTaskGP. It is used as a default keyword argument, and end users are not likely to need to instantiate or modify aSaasPyroModelunless they want to customize its attributes (such ascovar_module).Initialize the PyroModel.
- Parameters:
use_input_warping (bool) – A boolean indicating whether to use input warping.
indices_to_warp (list[int] | None) – An optional list of indices to warp. The default is to warp all inputs.
eps (float) – A small value that is used to ensure inputs are not 0 or 1, when using input warping.
- sample_lengthscale(dim, alpha=0.1)[source]
Sample the lengthscale.
- Parameters:
dim (int)
alpha (float)
- Return type:
Array
- class botorch.models.fully_bayesian.LinearPyroModel(use_input_warping=False, indices_to_warp=None, eps=1e-07)[source]
Bases:
PyroModelImplementation of a Bayesian Linear pyro model.
LinearPyroModelis not a standard BoTorch model; instead, it is used as an input toFullyBayesianLinearSingleTaskGP. It is used as a default keyword argument, and end users are not likely to need to instantiate or modify aLinearPyroModelunless they want to customize its attributes (such ascovar_module).Initialize the PyroModel.
- Parameters:
use_input_warping (bool) – A boolean indicating whether to use input warping.
indices_to_warp (list[int] | None) – An optional list of indices to warp. The default is to warp all inputs.
eps (float) – A small value that is used to ensure inputs are not 0 or 1, when using input warping.
- sample_weight_variance(alpha=0.1)[source]
Sample the weight variance.
This is a hierarchical prior is a half-Cauchy prior on the prior weight covariance, which is diagonal with different values for each input dimension. The prior samples a global level of sparsity (tau) and which scales the HalfCauchy prior on the weight variance. Since the weight prior is centered at zero, a prior variance of 0, would correspond to the dimension being irrelevant. This choice of prior is motivated by Saas priors.
- Parameters:
alpha (float)
- Return type:
Array
- postprocess_mcmc_samples(mcmc_samples)[source]
Post-process the MCMC samples.
This computes the true weight variance, removes tausq (global shrinkage), and converts JAX arrays to torch tensors.
- Parameters:
mcmc_samples (dict[str, Array])
- Return type:
dict[str, Tensor]
- get_dummy_mcmc_samples(num_mcmc_samples, **tkwargs)[source]
Return dummy MCMC samples for state dict loading.
- Parameters:
num_mcmc_samples (int)
tkwargs (Any)
- Return type:
dict[str, Tensor]
- load_mcmc_samples(mcmc_samples)[source]
Load the MCMC samples into their corresponding modules.
- Parameters:
mcmc_samples (dict[str, Tensor])
- Return type:
tuple[Mean, Kernel, Likelihood, InputTransform]
- class botorch.models.fully_bayesian.AbstractFullyBayesianSingleTaskGP(train_X, train_Y, train_Yvar=None, outcome_transform=None, input_transform=None, use_input_warping=False, indices_to_warp=None)[source]
Bases:
ExactGP,BatchedMultiOutputGPyTorchModel,ABCAn abstract fully Bayesian single-task GP model.
This model assumes that the inputs have been normalized to [0, 1]^d and that the output has been standardized to have zero mean and unit variance. You can either normalize and standardize the data before constructing the model or use an
input_transformandoutcome_transform.You are expected to use
fit_fully_bayesian_model_nutsto fit this model as it isn’t compatible withfit_gpytorch_mll.Initialize the fully Bayesian single-task GP model.
- Parameters:
train_X (Tensor) – Training inputs (n x d)
train_Y (Tensor) – Training targets (n x 1)
train_Yvar (Tensor | None) – Observed noise variance (n x 1). Inferred if None.
outcome_transform (OutcomeTransform | None) – An outcome transform that is applied to the training data during instantiation and to the posterior during inference (that is, the
Posteriorobtained by calling.posterioron the model will be on the original scale). Note that.train()will be called on the outcome transform during instantiation of the model.input_transform (InputTransform | None) – An input transform that is applied in the model’s forward pass.
use_input_warping (bool) – A boolean indicating whether to use input warping.
indices_to_warp (list[int]) – An optional list of indices to warp. The default is to warp all inputs.
- property num_mcmc_samples: int
Number of MCMC samples in the model.
- property batch_shape: Size
Batch shape of the model, equal to the number of MCMC samples. Note that
SaasFullyBayesianSingleTaskGPdoes not support batching over input data at this point.
- train(mode=True, reset=True)[source]
Puts the model in
trainmode.- Parameters:
mode (bool) – A boolean indicating whether to put the model in training mode.
reset (bool) – A boolean indicating whether to reset the model to its initial state if mode is True. If
modeis False, this argument is ignored.self (TFullyBayesianSingleTaskGP)
- Returns:
The model itself.
- Return type:
TFullyBayesianSingleTaskGP
- load_mcmc_samples(mcmc_samples)[source]
Load the MCMC hyperparameter samples into the model.
This method will be called by
fit_fully_bayesian_model_nutswhen the model has been fitted in order to create a batched SingleTaskGP model.- Parameters:
mcmc_samples (dict[str, Tensor])
- Return type:
None
- forward(X)[source]
Unlike in other classes’
forwardmethods, there is noif self.trainingblock, because it ought to be unreachable: Ifself.train()has been called, thenself.covar_modulewill be None,check_if_fitted()will fail, and the rest of this method will not run.- Parameters:
X (Tensor)
- Return type:
MultivariateNormal
- posterior(X, output_indices=None, observation_noise=False, posterior_transform=None, **kwargs)[source]
Computes the posterior over model outputs at the provided points.
- Parameters:
X (Tensor) – A
(batch_shape) x q x d-dim Tensor, wheredis the dimension of the feature space andqis the number of points considered jointly.output_indices (list[int] | None) – A list of indices, corresponding to the outputs over which to compute the posterior (if the model is multi-output). Can be used to speed up computation if only a subset of the model’s outputs are required for optimization. If omitted, computes the posterior over all model outputs.
observation_noise (bool) – If True, add the observation noise from the likelihood to the posterior. If a Tensor, use it directly as the observation noise (must be of shape
(batch_shape) x q x m).posterior_transform (PosteriorTransform | None) – An optional PosteriorTransform.
kwargs (Any)
- Returns:
- A
GaussianMixturePosteriorobject. Includes observation noise if specified.
- A
- Return type:
- condition_on_observations(X, Y, **kwargs)[source]
Conditions on additional observations for a Fully Bayesian model (either identical across models or unique per-model).
- Parameters:
X (Tensor) – A
batch_shape x num_samples x d-dim Tensor, wheredis the dimension of the feature space andbatch_shapeis the number of sampled models.Y (Tensor) – A
batch_shape x num_samples x 1-dim Tensor, wheredis the dimension of the feature space andbatch_shapeis the number of sampled models.kwargs (Any)
- Returns:
- A fully bayesian model conditioned on
given observations. The returned model has
batch_shapecopies of the training data in case of identical observations (andbatch_shapetraining datasets otherwise).
- Return type:
- classmethod construct_inputs(training_data, *, use_input_warping=False, indices_to_warp=None)[source]
Construct
SingleTaskGPkeyword arguments from aSupervisedDataset.- Parameters:
training_data (SupervisedDataset) – A
SupervisedDataset, with attributestrain_X,train_Y, and, optionally,train_Yvar.use_input_warping (bool) – A boolean indicating whether to use input warping.
indices_to_warp (list[int] | None) – An optional list of indices to warp. The default is to warp all inputs.
- Returns:
A dict of keyword arguments that can be used to initialize a
FullyBayesianLinearSingleTaskGP, with keystrain_X,train_Y,use_input_warping,indices_to_warp, and, optionally,train_Yvar.- Return type:
dict[str, BotorchContainer | Tensor | None]
- class botorch.models.fully_bayesian.FullyBayesianSingleTaskGP(train_X, train_Y, train_Yvar=None, outcome_transform=None, input_transform=None, use_input_warping=False, indices_to_warp=None)[source]
Bases:
AbstractFullyBayesianSingleTaskGPA fully Bayesian single-task GP model.
This model assumes that the inputs have been normalized to [0, 1]^d and that the output has been standardized to have zero mean and unit variance. You can either normalize and standardize the data before constructing the model or use an
input_transformandoutcome_transform. A model with a Matern-5/2 kernel and dimension-scaled priors on the hyperparameters from [Hvarfner2024vanilla] is used by default.You are expected to use
fit_fully_bayesian_model_nutsto fit this model as it isn’t compatible withfit_gpytorch_mll.Example
>>> fully_bayesian_gp = FullyBayesianSingleTaskGP(train_X, train_Y) >>> fit_fully_bayesian_model_nuts(fully_bayesian_gp) >>> posterior = fully_bayesian_gp.posterior(test_X)
Initialize the fully Bayesian single-task GP model.
- Parameters:
train_X (Tensor) – Training inputs (n x d)
train_Y (Tensor) – Training targets (n x 1)
train_Yvar (Tensor | None) – Observed noise variance (n x 1). Inferred if None.
outcome_transform (OutcomeTransform | None) – An outcome transform that is applied to the training data during instantiation and to the posterior during inference (that is, the
Posteriorobtained by calling.posterioron the model will be on the original scale). Note that.train()will be called on the outcome transform during instantiation of the model.input_transform (InputTransform | None) – An input transform that is applied in the model’s forward pass.
use_input_warping (bool) – A boolean indicating whether to use input warping.
indices_to_warp (list[int]) – An optional list of indices to warp. The default is to warp all inputs.
- property median_lengthscale: Tensor
Median lengthscales across the MCMC samples.
- load_state_dict(state_dict, strict=True, assign=False)[source]
Custom logic for loading the state dict.
The standard approach of calling
load_state_dictcurrently doesn’t play well with theSaasFullyBayesianSingleTaskGPsince themean_module,covar_moduleandlikelihoodaren’t initialized until the model has been fitted. The reason for this is that we don’t know the number of MCMC samples until NUTS is called. Given the state dict, we can initialize a new model with some dummy samples and then load the state dict into this model. This currently only works for aSaasPyroModeland supporting more Pyro models likely requires moving the model construction logic into the Pyro model itself.- Parameters:
state_dict (Mapping[str, Any])
strict (bool)
assign (bool)
- Return type:
None
- class botorch.models.fully_bayesian.SaasFullyBayesianSingleTaskGP(train_X, train_Y, train_Yvar=None, outcome_transform=None, input_transform=None, use_input_warping=False, indices_to_warp=None)[source]
Bases:
FullyBayesianSingleTaskGPA fully Bayesian single-task GP model with the SAAS prior.
This model assumes that the inputs have been normalized to [0, 1]^d and that the output has been standardized to have zero mean and unit variance. You can either normalize and standardize the data before constructing the model or use an
input_transformandoutcome_transform. The SAAS model [Eriksson2021saasbo] with a Matern-5/2 kernel is used by default.You are expected to use
fit_fully_bayesian_model_nutsto fit this model as it isn’t compatible withfit_gpytorch_mll.Example
>>> saas_gp = SaasFullyBayesianSingleTaskGP(train_X, train_Y) >>> fit_fully_bayesian_model_nuts(saas_gp) >>> posterior = saas_gp.posterior(test_X)
Initialize the fully Bayesian single-task GP model.
- Parameters:
train_X (Tensor) – Training inputs (n x d)
train_Y (Tensor) – Training targets (n x 1)
train_Yvar (Tensor | None) – Observed noise variance (n x 1). Inferred if None.
outcome_transform (OutcomeTransform | None) – An outcome transform that is applied to the training data during instantiation and to the posterior during inference (that is, the
Posteriorobtained by calling.posterioron the model will be on the original scale). Note that.train()will be called on the outcome transform during instantiation of the model.input_transform (InputTransform | None) – An input transform that is applied in the model’s forward pass.
use_input_warping (bool) – A boolean indicating whether to use input warping.
indices_to_warp (list[int]) – An optional list of indices to warp. The default is to warp all inputs.
- class botorch.models.fully_bayesian.FullyBayesianLinearSingleTaskGP(train_X, train_Y, train_Yvar=None, outcome_transform=None, input_transform=None, use_input_warping=False, indices_to_warp=None)[source]
Bases:
AbstractFullyBayesianSingleTaskGPA fully Bayesian single-task GP model with a linear kernel.
This model assumes that the inputs have been normalized to [0, 1]^d and that the output has been standardized to have zero mean and unit variance. You can either normalize and standardize the data before constructing the model or use an
input_transformandoutcome_transform.You are expected to use
fit_fully_bayesian_model_nutsto fit this model as it isn’t compatible withfit_gpytorch_mll.Example
>>> gp = FullyBayesianLinearSingleTaskGP(train_X, train_Y) >>> fit_fully_bayesian_model_nuts(gp) >>> posterior = gp.posterior(test_X)
Initialize the fully Bayesian single-task GP model.
- Parameters:
train_X (Tensor) – Training inputs (n x d)
train_Y (Tensor) – Training targets (n x 1)
train_Yvar (Tensor | None) – Observed noise variance (n x 1). Inferred if None.
outcome_transform (OutcomeTransform | None) – An outcome transform that is applied to the training data during instantiation and to the posterior during inference (that is, the
Posteriorobtained by calling.posterioron the model will be on the original scale). Note that.train()will be called on the outcome transform during instantiation of the model.input_transform (InputTransform | None) – An input transform that is applied in the model’s forward pass.
use_input_warping (bool) – A boolean indicating whether to use input warping.
indices_to_warp (list[int]) – An optional list of indices to warp. The default is to warp all inputs.
- property median_weight_variance: Tensor
Median weight variance across the MCMC samples.
- load_state_dict(state_dict, strict=True, assign=False)[source]
Custom logic for loading the state dict.
The standard approach of calling
load_state_dictcurrently doesn’t play well with theFullyBayesianLinearSingleTaskGPsince themean_module,covar_moduleandlikelihoodaren’t initialized until the model has been fitted. The reason for this is that we don’t know the number of MCMC samples until NUTS is called. Given the state dict, we can initialize a new model with some dummy samples andthen load the state dict into this model. This currently only works for aLinearPyroModeland supporting more Pyro models likely requires moving the model construction logic into the Pyro model itself.- Parameters:
state_dict (Mapping[str, Any])
strict (bool)
assign (bool)
- Return type:
None
Fully Bayesian Multitask GP Models
Multi-task Gaussian Process Regression models with fully Bayesian inference.
- class botorch.models.fully_bayesian_multitask.MultiTaskPyroMixin[source]
Bases:
objectMixin with universal multi-task logic for PyroModel subclasses.
Stores task-related attributes (
task_feature,num_tasks,task_rank) and adjustsard_num_dimsto exclude the task column. Overridessample_meanto return per-task means and_prepare_featuresto strip the task column.Place before the
PyroModelsubclass in the MRO.- set_inputs(train_X, train_Y, train_Yvar=None, task_feature=None, task_rank=None, all_tasks=None)[source]
Set training data and configure multi-task attributes.
- Parameters:
train_X (Tensor) – Training inputs (n x (d + 1)), including a task column.
train_Y (Tensor) – Training targets (n x 1).
train_Yvar (Tensor | None) – Observed noise variance (n x 1). Inferred if None.
task_feature (int | None) – The index of the task feature column.
task_rank (int | None) – The number of learned task embeddings. Defaults to the number of tasks.
all_tasks (list[int] | None) – A list of all task indices. If omitted, all tasks will be inferred from the task feature column of the training data.
- Return type:
None
- class botorch.models.fully_bayesian_multitask.LatentFeatureMultiTaskPyroMixin[source]
Bases:
MultiTaskPyroMixinMixin that adds ICM-style multi-task capabilities via latent features.
Extends
MultiTaskPyroMixinwith an ICM task covariance using learned latent task embeddings and a Matern-5/2 task kernel. Place before thePyroModelsubclass in the MRO:class MultitaskSaasPyroModel(LatentFeatureMultiTaskPyroMixin, SaasPyroModel): ...
Overrides the dispatch methods
_maybe_multitask_transform,_build_mean_module,_build_multitask_covariance, andget_dummy_mcmc_samples.- sample_task_lengthscale(concentration=6.0, rate=3.0)[source]
Sample the task kernel lengthscale.
- Parameters:
concentration (float)
rate (float)
- Return type:
Array
- get_dummy_mcmc_samples(num_mcmc_samples, **tkwargs)[source]
Return dummy MCMC samples for state dict loading.
Calls
super()for base model keys, then reshapesmeanto(S, num_tasks)and addstask_lengthscaleandlatent_features.- Parameters:
num_mcmc_samples (int)
tkwargs (Any)
- Return type:
dict[str, Tensor]
- class botorch.models.fully_bayesian_multitask.MultitaskSaasPyroModel(use_input_warping=False, indices_to_warp=None, eps=1e-07)[source]
Bases:
LatentFeatureMultiTaskPyroMixin,SaasPyroModelMulti-task SAAS model. Backward-compatible subclass that composes
LatentFeatureMultiTaskPyroMixinwithSaasPyroModel.Initialize the PyroModel.
- Parameters:
use_input_warping (bool) – A boolean indicating whether to use input warping.
indices_to_warp (list[int] | None) – An optional list of indices to warp. The default is to warp all inputs.
eps (float) – A small value that is used to ensure inputs are not 0 or 1, when using input warping.
- class botorch.models.fully_bayesian_multitask.SaasFullyBayesianMultiTaskGP(train_X, train_Y, task_feature, train_Yvar=None, output_tasks=None, rank=None, all_tasks=None, outcome_transform=None, input_transform=None, pyro_model=None, validate_task_values=True)[source]
Bases:
MultiTaskGPA fully Bayesian multi-task GP model with the SAAS prior. This model assumes that the inputs have been normalized to [0, 1]^d and that the output has been stratified standardized to have zero mean and unit variance for each task. The SAAS model [Eriksson2021saasbo] with a Matern-5/2 is used as data kernel by default.
You are expected to use
fit_fully_bayesian_model_nutsto fit this model as it isn’t compatible withfit_gpytorch_mll.Example
>>> X1, X2 = torch.rand(10, 2), torch.rand(20, 2) >>> i1, i2 = torch.zeros(10, 1), torch.ones(20, 1) >>> train_X = torch.cat([ >>> torch.cat([X1, i1], -1), torch.cat([X2, i2], -1), >>> ]) >>> train_Y = torch.cat(f1(X1), f2(X2)).unsqueeze(-1) >>> train_Yvar = 0.01 * torch.ones_like(train_Y) >>> mtsaas_gp = SaasFullyBayesianMultiTaskGP( >>> train_X, train_Y, train_Yvar, task_feature=-1, >>> ) >>> fit_fully_bayesian_model_nuts(mtsaas_gp) >>> posterior = mtsaas_gp.posterior(test_X)
Initialize the fully Bayesian multi-task GP model.
- Parameters:
train_X (Tensor) – Training inputs (n x (d + 1))
train_Y (Tensor) – Training targets (n x 1)
train_Yvar (Tensor | None) – Observed noise variance (n x 1). If None, we infer the noise. Note that the inferred noise is common across all tasks.
task_feature (int) – The index of the task feature (
-d <= task_feature <= d).output_tasks (list[int] | None) – A list of task indices for which to compute model outputs for. If omitted, return outputs for all task indices.
rank (int | None) – The num of learned task embeddings to be used in the task kernel. If omitted, use a full rank (i.e. number of tasks) kernel.
all_tasks (list[int] | None) – A list of all task indices. If omitted, all tasks will be inferred from the task feature column of the training data. Used to inform the model about the total number of tasks, including any unobserved tasks.
outcome_transform (OutcomeTransform | None) – An outcome transform that is applied to the training data during instantiation and to the posterior during inference (that is, the
Posteriorobtained by calling.posterioron the model will be on the original scale). Note that.train()will be called on the outcome transform during instantiation of the model.input_transform (InputTransform | None) – An input transform that is applied to the inputs
Xin the model’s forward pass.pyro_model (MultitaskSaasPyroModel | None) – Optional
PyroModelthat has the same signature asMultitaskSaasPyroModel. Defaults toMultitaskSaasPyroModel.validate_task_values (bool) – If True, validate that the task values supplied in the input are expected tasks values. If false, unexpected task values will be mapped to the first output_task if supplied.
- train(mode=True, reset=True)[source]
Puts the model in
trainmode.- Parameters:
mode (bool) – A boolean indicating whether to put the model in training mode.
reset (bool) – A boolean indicating whether to reset the model to its initial state. If
modeis False, this argument is ignored.
- Returns:
The model itself.
- Return type:
TSaasFullyBayesianMultiTaskGP
- property median_lengthscale: Tensor
Median lengthscales across the MCMC samples.
- property num_mcmc_samples: int
Number of MCMC samples in the model.
- property batch_shape: Size
Batch shape of the model, equal to the number of MCMC samples. Note that
SaasFullyBayesianMultiTaskGPdoes not support batching over input data at this point.
- fantasize(*args, **kwargs)[source]
Construct a fantasy model.
Constructs a fantasy model in the following fashion: (1) compute the model posterior at
X, including observation noise. Ifobservation_noiseis a Tensor, use it directly as the observation noise to add. (2) sample from this posterior (usingsampler) to generate “fake” observations. (3) condition the model on the new fake observations.- Parameters:
X – A
batch_shape x n' x d-dim Tensor, wheredis the dimension of the feature space,n'is the number of points per batch, andbatch_shapeis the batch shape (must be compatible with the batch shape of the model).sampler – The sampler used for sampling from the posterior at
X.observation_noise – A
model_batch_shape x 1 x m-dim tensor or amodel_batch_shape x n' x m-dim tensor containing the average noise for each batch and output, wheremis the number of outputs.noisemust be in the outcome-transformed space if an outcome transform is used. If None and using an inferred noise likelihood, the noise will be the inferred noise level. If using a fixed noise likelihood, the mean across the observation noise in the training data is used as observation noise.kwargs – Will be passed to
model.condition_on_observations
- Returns:
The constructed fantasy model.
- Return type:
NoReturn
- load_mcmc_samples(mcmc_samples)[source]
Load the MCMC hyperparameter samples into the model.
This method will be called by
fit_fully_bayesian_model_nutswhen the model has been fitted in order to create a batched MultiTaskGP model.- Parameters:
mcmc_samples (dict[str, Tensor])
- Return type:
None
- eval()[source]
Puts the model in eval mode.
Circumvents the need to call MultiTaskGP.eval(), which computes the task_covar_matrix for non-observed tasks. This is not needed for fully Bayesian models, since the non-observed tasks’ covar factors are instead sampled.
- Returns:
The model itself.
- Return type:
Self
- posterior(X, output_indices=None, observation_noise=False, posterior_transform=None, **kwargs)[source]
Computes the posterior over model outputs at the provided points.
- Returns:
- A
GaussianMixturePosteriorobject. Includes observation noise if specified.
- A
- Parameters:
X (Tensor)
output_indices (list[int] | None)
observation_noise (bool)
posterior_transform (PosteriorTransform | None)
kwargs (Any)
- Return type:
- forward(X)[source]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.- Parameters:
X (Tensor)
- Return type:
MultivariateNormal
- load_state_dict(state_dict, strict=True)[source]
Custom logic for loading the state dict.
The standard approach of calling
load_state_dictcurrently doesn’t play well with theSaasFullyBayesianMultiTaskGPsince themean_module,covar_moduleandlikelihoodaren’t initialized until the model has been fitted. The reason for this is that we don’t know the number of MCMC samples until NUTS is called. Given the state dict, we can initialize a new model with some dummy samples and then load the state dict into this model. This currently only works for aMultitaskSaasPyroModeland supporting more Pyro models likely requires moving the model construction logic into the Pyro model itself.TODO: If this were to inherit from
SaasFullyBayesianSingleTaskGP, we could simplify this method and eliminate some others.- Parameters:
state_dict (Mapping[str, Any])
strict (bool)
- condition_on_observations(X, Y, **kwargs)[source]
Conditions on additional observations for a Fully Bayesian model (either identical across models or unique per-model).
- Parameters:
X (Tensor) – A
batch_shape x num_samples x d-dim Tensor, wheredis the dimension of the feature space andbatch_shapeis the number of sampled models.Y (Tensor) – A
batch_shape x num_samples x 1-dim Tensor, wheredis the dimension of the feature space andbatch_shapeis the number of sampled models.kwargs (Any)
- Returns:
- A fully bayesian model conditioned on
given observations. The returned model has
batch_shapecopies of the training data in case of identical observations (andbatch_shapetraining datasets otherwise).
- Return type:
GP Regression Models
Gaussian Process Regression models based on GPyTorch models.
These models are often a good starting point and are further documented in the tutorials.
SingleTaskGP is a single-task exact GP model that uses relatively strong priors on
the Kernel hyperparameters, which work best when covariates are normalized to the unit
cube and outcomes are standardized (zero mean, unit variance). By default, this model
uses a Standardize outcome transform, which applies this standardization. However,
it does not (yet) use an input transform by default.
SingleTaskGP model works in batch mode (each batch having its own hyperparameters).
When the training observations include multiple outputs, SingleTaskGP uses
batching to model outputs independently.
SingleTaskGP supports multiple outputs. However, as a single-task model,
SingleTaskGP should be used only when the outputs are independent and all
use the same training inputs. If outputs are independent but they have different
training inputs, use the ModelListGP. When modeling correlations between outputs,
use a multi-task model like MultiTaskGP.
- class botorch.models.gp_regression.SingleTaskGP(train_X, train_Y, train_Yvar=None, likelihood=None, covar_module=None, mean_module=None, outcome_transform=<class 'botorch.utils.types.DEFAULT'>, input_transform=None)[source]
Bases:
BatchedMultiOutputGPyTorchModel,ExactGP,FantasizeMixinA single-task exact GP model, supporting both known and inferred noise levels.
A single-task exact GP which, by default, utilizes hyperparameter priors from [Hvarfner2024vanilla]. These priors designed to perform well independently of the dimensionality of the problem. Moreover, they suggest a moderately low level of noise. Importantly, The model works best when covariates are normalized to the unit cube and outcomes are standardized (zero mean, unit variance). For a detailed discussion on the hyperparameter priors, see https://github.com/meta-pytorch/botorch/discussions/2451.
This model works in batch mode (each batch having its own hyperparameters). When the training observations include multiple outputs, this model will use batching to model outputs independently.
Use this model when you have independent output(s) and all outputs use the same training data. If outputs are independent and outputs have different training data, use the ModelListGP. When modeling correlations between outputs, use the MultiTaskGP.
An example of a case in which noise levels are known is online experimentation, where noise can be measured using the variability of different observations from the same arm, or provided by outside software. Another use case is simulation optimization, where the evaluation can provide variance estimates, perhaps from bootstrapping. In any case, these noise levels can be provided to
SingleTaskGPastrain_Yvar.SingleTaskGPcan also be used when the observations are known to be noise-free. Noise-free observations can be modeled using arbitrarily small noise values, such astrain_Yvar=torch.full_like(train_Y, 1e-6).Example
Model with inferred noise levels:
>>> import torch >>> from botorch.models.gp_regression import SingleTaskGP >>> >>> train_X = torch.rand(20, 2, dtype=torch.float64) >>> train_Y = torch.sin(train_X).sum(dim=1, keepdim=True) >>> inferred_noise_model = SingleTaskGP(train_X, train_Y)
Model with a known observation variance of 0.2:
>>> train_Yvar = torch.full_like(train_Y, 0.2) >>> observed_noise_model = SingleTaskGP(train_X, train_Y, train_Yvar)
With noise-free observations:
>>> train_Yvar = torch.full_like(train_Y, 1e-6) >>> noise_free_model = SingleTaskGP(train_X, train_Y, train_Yvar)
- Parameters:
train_X (Tensor) – A
batch_shape x n x dtensor of training features.train_Y (Tensor) – A
batch_shape x n x mtensor of training observations.train_Yvar (Tensor | None) – An optional
batch_shape x n x mtensor of observed measurement noise.likelihood (Likelihood | None) – A likelihood. If omitted, use a standard
GaussianLikelihoodwith inferred noise level iftrain_Yvaris None, and aFixedNoiseGaussianLikelihoodwith the given noise observations iftrain_Yvaris not None.covar_module (Module | None) – The module computing the covariance (Kernel) matrix. If omitted, uses an
RBFKernel.mean_module (Mean | None) – The mean function to be used. If omitted, use a
ConstantMean.outcome_transform (OutcomeTransform | _DefaultType | None) – An outcome transform that is applied to the training data during instantiation and to the posterior during inference (that is, the
Posteriorobtained by calling.posterioron the model will be on the original scale). We use aStandardizetransform if nooutcome_transformis specified. Pass downNoneto use no outcome transform. Note that.train()will be called on the outcome transform during instantiation of the model.input_transform (InputTransform | None) – An input transform that is applied in the model’s forward pass.
- train_inputs: tuple[Tensor]
- forward(x)[source]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.- Parameters:
x (Tensor)
- Return type:
MultivariateNormal
GP Regression Models for Mixed Parameter Spaces
- class botorch.models.gp_regression_mixed.MixedSingleTaskGP(train_X, train_Y, cat_dims, train_Yvar=None, cont_kernel_factory=None, likelihood=None, outcome_transform=<class 'botorch.utils.types.DEFAULT'>, input_transform=None)[source]
Bases:
SingleTaskGPA single-task exact GP model for mixed search spaces.
This model is similar to
SingleTaskGP, but supports mixed search spaces, which combine discrete and continuous features, as well as solely discrete spaces. It uses a kernel that combines a CategoricalKernel (based on Hamming distances) and a regular kernel into a kernel of the form- K((x1, c1), (x2, c2)) =
K_cont_1(x1, x2) + K_cat_1(c1, c2) + K_cont_2(x1, x2) * K_cat_2(c1, c2)
where
xiandciare the continuous and categorical features of the input, respectively. The suffix_iindicates that we fit different lengthscales for the kernels in the sum and product terms.Since this model does not provide gradients for the categorical features, optimization of the acquisition function will need to be performed in a mixed fashion, i.e., treating the categorical features properly as discrete optimization variables. We recommend using
optimize_acqf_mixed.Example
>>> train_X = torch.cat( [torch.rand(20, 2), torch.randint(3, (20, 1))], dim=-1) ) >>> train_Y = ( torch.sin(train_X[..., :-1]).sum(dim=1, keepdim=True) + train_X[..., -1:] ) >>> model = MixedSingleTaskGP(train_X, train_Y, cat_dims=[-1])
A single-task exact GP model supporting categorical parameters.
- Parameters:
train_X (Tensor) – A
batch_shape x n x dtensor of training features.train_Y (Tensor) – A
batch_shape x n x mtensor of training observations.cat_dims (list[int]) – A list of indices corresponding to the columns of the input
Xthat should be considered categorical features.train_Yvar (Tensor | None) – An optional
batch_shape x n x mtensor of observed measurement noise.cont_kernel_factory (None | Callable[[torch.Size, int, list[int]], Kernel]) – A method that accepts
batch_shape,ard_num_dims, andactive_dimsarguments and returns an instantiated GPyTorchKernelobject to be used as the base kernel for the continuous dimensions. If omitted, this model uses anRBFKernelas the kernel for the ordinal parameters.likelihood (Likelihood | None) – A likelihood. If omitted, use a standard GaussianLikelihood with inferred noise level.
outcome_transform (OutcomeTransform | _DefaultType | None) – An outcome transform that is applied to the training data during instantiation and to the posterior during inference (that is, the
Posteriorobtained by calling.posterioron the model will be on the original scale). We use aStandardizetransform if nooutcome_transformis specified. Pass downNoneto use no outcome transform.input_transform (InputTransform | None) – An input transform that is applied in the model’s forward pass. Only input transforms are allowed which do not transform the categorical dimensions. If you want to use it for example in combination with a
OneHotToNumericinput transform one has to instantiate the transform withtransform_on_train== False and pass in the already transformed input.
- classmethod construct_inputs(training_data, categorical_features, likelihood=None)[source]
Construct
Modelkeyword arguments from a dict ofSupervisedDataset.- Parameters:
training_data (SupervisedDataset) – A
SupervisedDatasetcontaining the training data.categorical_features (list[int]) – Column indices of categorical features.
likelihood (Likelihood | None) – Optional likelihood used to constuct the model.
- Return type:
dict[str, Any]
Higher Order GP Models
References
S. Zhe, W. Xing, and R. M. Kirby. Scalable high-order gaussian process regression. Proceedings of Machine Learning Research, volume 89, Apr 2019.
- class botorch.models.higher_order_gp.FlattenedStandardize(output_shape, batch_shape=None, min_stdv=1e-08)[source]
Bases:
StandardizeStandardize outcomes in a structured multi-output settings by reshaping the batched output dimensions to be a vector. Specifically, an output dimension of [a x b x c] will be squeezed to be a vector of [a * b * c].
- Parameters:
output_shape (torch.Size) – A
n x output_shape-dim tensor of training targets.batch_shape (torch.Size | None) – The batch_shape of the training targets.
min_stdv (float) – The minimum standard deviation for which to perform standardization (if lower, only de-mean the data).
- forward(Y, Yvar=None, X=None)[source]
Standardize outcomes.
If the module is in train mode, this updates the module state (i.e. the mean/std normalizing constants). If the module is in eval mode, simply applies the normalization using the module state.
- Parameters:
Y (Tensor) – A
batch_shape x n x m-dim tensor of training targets.Yvar (Tensor | None) – A
batch_shape x n x m-dim tensor of observation noises associated with the training targets (if applicable).X (Tensor | None) – A
batch_shape x n x d-dim tensor of training inputs (if applicable). This argument is not used by this transform, but it is used by its subclass,StratifiedStandardize.
- Returns:
The transformed outcome observations.
The transformed observation noise (if applicable).
- Return type:
A two-tuple with the transformed outcomes
- untransform(Y, Yvar=None, X=None)[source]
Un-standardize outcomes.
- Parameters:
Y (Tensor) – A
batch_shape x n x m-dim tensor of standardized targets.Yvar (Tensor | None) – A
batch_shape x n x m-dim tensor of standardized observation noises associated with the targets (if applicable).X (Tensor | None) – A
batch_shape x n x d-dim tensor of inputs (if applicable). This argument is not used by this transform, but it is used by its subclass,StratifiedStandardize.
- Returns:
The un-standardized outcome observations.
The un-standardized observation noise (if applicable).
- Return type:
A two-tuple with the un-standardized outcomes
- untransform_posterior(posterior, X=None)[source]
Un-standardize the posterior.
- Parameters:
posterior (HigherOrderGPPosterior) – A posterior in the standardized space.
X (Tensor | None) – A
batch_shape x n x d-dim tensor of inputs (if applicable). This argument is not used by this transform, but it is used by its subclass,StratifiedStandardize.
- Returns:
The un-standardized posterior. If the input posterior is a
GPyTorchPosteriororGaussianMixturePosterior, return the same type with analytically rescaled distribution. Otherwise, return aTransformedPosterior.- Return type:
- class botorch.models.higher_order_gp.HigherOrderGP(train_X, train_Y, likelihood=None, covar_modules=None, num_latent_dims=None, learn_latent_pars=True, latent_init='default', outcome_transform=<class 'botorch.utils.types.DEFAULT'>, input_transform=None)[source]
Bases:
BatchedMultiOutputGPyTorchModel,ExactGP,FantasizeMixinA model for high-dimensional output regression.
As described in [Zhe2019hogp]. “Higher-order” means that the predictions are matrices (tensors) with at least two dimensions, such as images or grids of images, or measurements taken from a region of at least two dimensions. The posterior uses Matheron’s rule [Doucet2010sampl] as described in [Maddox2021bohdo].
HigherOrderGPdiffers from a “vector” multi-output model in that it uses Kronecker algebra to obtain parsimonious covariance matrices for these outputs (seeKroneckerMultiTaskGPfor more information). For example, imagine a 10 x 20 x 30 grid of images. If we were to vectorize the resulting 6,000 data points in order to use them in a non-higher-order GP, they would have a 6,000 x 6,000 covariance matrix, with 36 million entries. The Kronecker structure allows representing this as a product of 10x10, 20x20, and 30x30 covariance matrices, with only 1,400 entries.NOTE: This model requires the use of specialized Kronecker solves in linear operator, which are disabled by default in BoTorch. These are enabled by default in the
HigherOrderGP.posteriorcall. However, they need to be manually enabled by the user during model fitting. Note also that we’re usingfit_gpytorch_mll_torch()here instead offit_gpytorch_mll()since the approximate computations result in a non-smooth MLL that the default L-BFGS-B optimizer invoked byfit_gpytorch_mll()does not handle well.Example
>>> from linear_operator.settings import _fast_solves >>> model = HigherOrderGP(train_X, train_Y) >>> mll = ExactMarginalLogLikelihood(model.likelihood, model) >>> with _fast_solves(True): >>> fit_gpytorch_mll_torch(mll) >>> samples = model.posterior(test_X).rsample()
- Parameters:
train_X (Tensor) – A
batch_shape x n x d-dim tensor of training inputs.train_Y (Tensor) – A
batch_shape x n x output_shape-dim tensor of training targets.likelihood (Likelihood | None) – Gaussian likelihood for the model.
covar_modules (list[Kernel] | None) – List of kernels for each output structure.
num_latent_dims (list[int] | None) – Sizes for the latent dimensions.
learn_latent_pars (bool) – If true, learn the latent parameters.
latent_init (str) – [default or gp] how to initialize the latent parameters.
outcome_transform (OutcomeTransform | _DefaultType | None) – An outcome transform that is applied to the training data during instantiation and to the posterior during inference (that is, the
Posteriorobtained by calling.posterioron the model will be on the original scale). We use aStandardizetransform if nooutcome_transformis specified. Pass downNoneto use no outcome transform. Note that.train()will be called on the outcome transform during instantiation of the model.input_transform (InputTransform | None) – An input transform that is applied in the model’s forward pass.
- forward(X)[source]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.- Parameters:
X (Tensor)
- Return type:
MultivariateNormal
- get_fantasy_model(inputs, targets, **kwargs)[source]
Returns a new GP model that incorporates the specified inputs and targets as new training data.
Using this method is more efficient than updating with
set_train_datawhen the number of inputs is relatively small, because any computed test-time caches will be updated in linear time rather than computed from scratch.Note
If
targetsis a batch (e.g.b x m), then the GP returned from this method will be a batch mode GP. Ifinputsis of the same (or lesser) dimension astargets, then it is assumed that the fantasy points are the same for each target batch.- Parameters:
inputs (torch.Tensor) – (
b1 x ... x bk x m x dorf x b1 x ... x bk x m x d) Locations of fantasy observations.targets (torch.Tensor) – (
b1 x ... x bk x morf x b1 x ... x bk x m) Labels of fantasy observations.
- Returns:
An
ExactGPmodel withn + mtraining examples, where themfantasy examples have been added and all test-time caches have been updated.- Return type:
ExactGP
- condition_on_observations(X, Y, noise=None, **kwargs)[source]
Condition the model on new observations.
- Parameters:
X (Tensor) – A
batch_shape x n' x d-dim Tensor, wheredis the dimension of the feature space,mis the number of points per batch, andbatch_shapeis the batch shape (must be compatible with the batch shape of the model).Y (Tensor) – A
batch_shape' x n' x m_d-dim Tensor, wherem_dis the shaping of the model outputs,n'is the number of points per batch, andbatch_shape'is the batch shape of the observations.batch_shape'must be broadcastable tobatch_shapeusing standard broadcasting semantics. IfYhas fewer batch dimensions thanX, it is assumed that the missing batch dimensions are the same for allY.noise (Tensor | None) – If not None, a tensor of the same shape as
Yrepresenting the noise variance associated with each observation.kwargs (Any) – Passed to
condition_on_observations.
- Returns:
A
BatchedMultiOutputGPyTorchModelobject of the same type withn + n'training examples, representing the original model conditioned on the new observations(X, Y)(and possibly noise observations passed in via kwargs).- Return type:
- posterior(X, output_indices=None, observation_noise=False, posterior_transform=None)[source]
Computes the posterior over model outputs at the provided points.
- Parameters:
X (Tensor) – A
(batch_shape) x q x d-dim Tensor, wheredis the dimension of the feature space andqis the number of points considered jointly.output_indices (list[int] | None) – A list of indices, corresponding to the outputs over which to compute the posterior (if the model is multi-output). Can be used to speed up computation if only a subset of the model’s outputs are required for optimization. If omitted, computes the posterior over all model outputs.
observation_noise (bool | Tensor) – If True, add the observation noise from the likelihood to the posterior. If a Tensor, use it directly as the observation noise (must be of shape
(batch_shape) x q x m).posterior_transform (PosteriorTransform | None) – An optional PosteriorTransform.
- Returns:
A
GPyTorchPosteriorobject, representingbatch_shapejoint distributions overqpoints and the outputs selected byoutput_indiceseach. Includes observation noise if specified.- Return type:
- make_posterior_variances(joint_covariance_matrix)[source]
Computes the posterior variances given the data points X. As currently implemented, it computes another forwards call with the stacked data to get out the joint covariance across all data points.
- Parameters:
joint_covariance_matrix (LinearOperator)
- Return type:
Tensor
Latent Kronecker GP Models
References
J. A. Lin, S. Ament, M. Balandat, D. Eriksson, J. M. Hernández-Lobato, E. Bakshy. Scalable Gaussian Processes with Latent Kronecker Structure. International Conference on Machine Learning 2025.
J. A. Lin, S. Ament, M. Balandat, E. Bakshy. Scaling Gaussian Processes for Learning Curve Prediction via Latent Kronecker Structure. NeurIPS 2024 Bayesian Decision-making and Uncertainty Workshop.
J. A. Lin, J. Antorán, s. Padhy, D. Janz, J. M. Hernández-Lobato, A. Terenin. Sampling from Gaussian Process Posterior using Stochastic Gradient Descent. Advances in Neural Information Processing Systems 2023.
- class botorch.models.latent_kronecker_gp.LatentKroneckerGP(train_X, train_T, train_Y, likelihood=None, mean_module_X=None, mean_module_T=None, covar_module_X=None, covar_module_T=None, input_transform=None, outcome_transform=<class 'botorch.utils.types.DEFAULT'>)[source]
Bases:
GPyTorchModel,ExactGP,FantasizeMixinA multi-task GP model which uses Kronecker structure despite missing entries.
Leverages pathwise conditioning and iterative linear system solvers to efficiently draw samples from the GP posterior. See [lin2024scaling] and [lin2025scalable] for details.
For more information about pathwise conditioning, see [wilson2021pathwise] and [Maddox2021bohdo]. Details about iterative linear system solvers for GPs with pathwise conditioning can be found in [lin2023sampling].
NOTE: This model requires iterative methods for efficient posterior inference. To enable iterative methods, the
use_iterative_methodshelper function can be used as a context manager.Example
>>> model = LatentKroneckerGP(train_X, train_T, train_Y) >>> mll = ExactMarginalLogLikelihood(model.likelihood, model) >>> with model.use_iterative_methods(): >>> fit_gpytorch_mll(mll) >>> samples = model.posterior(test_X, test_T).rsample()
- Parameters:
train_X (Tensor) – A
batch_shape x n x dtensor of training features.train_T (Tensor) – A
batch_shape x t x 1tensor of training time steps.train_Y (Tensor) – A
batch_shape x n x ttensor of training observations, corresponding to the Cartesian product oftrain_Xandtrain_T.likelihood (Likelihood | None) – A likelihood. If omitted, use a standard
GaussianLikelihoodwith inferred homoskedastic noise level.mean_module_X (Mean | None) – The mean function to be used for X. If omitted, a
ZeroMeanwill be used.mean_module_T (Mean | None) – The mean function to be used for T. If omitted, a
ZeroMeanwill be used.covar_module_X (Module | None) – The module computing the covariance matrix of X. If omitted, a
MaternKernelwill be used.covar_module_T (Module | None) – The module computing the covariance matrix of T. If omitted, a
MaternKernelwrapped in aScaleKernelwill be used.input_transform (InputTransform | None) – An input transform that is applied to X in the model’s forward pass.
outcome_transform (OutcomeTransform | _DefaultType | None) – An outcome transform that is applied to Y. Note that
.train()will be called on the outcome transform during instantiation of the model.
- property train_T: Tensor
The training T values (second element of train_inputs).
T is stored in train_inputs (alongside X) to enable GPyTorch’s multi-input prediction strategy via
_get_test_prior_mean_and_covariances. This also allows using different T values at test time, e.g., evaluating the posterior at a subset of task indices.The helper methods below (
transform_inputs,_set_transformed_inputs,_revert_to_original_inputs) ensure T is preserved through BoTorch’s input transform machinery, which expects single-input models.
- transform_inputs(X, input_transform=None)[source]
Transform inputs.
Only transforms X, leaving T unchanged. The
_is_T_inputcheck is needed because MLL closures calltransform_inputson each element oftrain_inputsindividually, including T.- Parameters:
X (Tensor) – A tensor of inputs. May be X or T from train_inputs.
input_transform (Module | None) – A Module that performs the input transformation.
- Returns:
Transformed X, or T unchanged.
- Return type:
Tensor
- use_iterative_methods(tol=0.01, max_iter=10000, covar_root_decomposition=False, log_prob=True, solves=True)[source]
- Parameters:
tol (float)
max_iter (int)
covar_root_decomposition (bool)
log_prob (bool)
solves (bool)
- forward(*args, **kwargs)[source]
Computes the joint distribution at the given input locations.
- Parameters:
*args – Either (X,) for backward compatibility, or (X, T). If only X is provided, uses self.train_T for T.
- Returns:
The joint distribution at the specified input locations.
- Return type:
MultivariateNormal
- posterior(X, T=None, observation_noise=False, posterior_transform=None, **kwargs)[source]
Computes the posterior over model outputs at the provided points.
Leverages GPyTorch’s inference stack with our custom Kronecker-structured covariances (via the overridden
_get_test_prior_mean_and_covariances). Sampling uses pathwise conditioning for efficiency.NOTE: For efficient inference with large datasets, wrap the call in the
model.use_iterative_methods()context manager, e.g.:>>> with model.use_iterative_methods(): ... posterior = model.posterior(X, T)
- Parameters:
X (Tensor) – A
(batch_shape) x q x d-dim Tensor of test features.T (Tensor | None) – A
(batch_shape) x t x 1-dim Tensor of test T values. If None, defaults to usingself.train_T.observation_noise (bool | Tensor) – If True, add observation noise. Currently not supported.
posterior_transform (PosteriorTransform | None) – An optional PosteriorTransform. Currently not supported.
kwargs (Any)
- Returns:
A
LatentKroneckerGPPosteriorwith proper mean/variance and efficient pathwise sampling.- Return type:
- condition_on_observations(X, Y, noise=None, **kwargs)[source]
Condition the model on new observations.
- Parameters:
X (Tensor) – A
batch_shape x n' x d-dim Tensor, wheredis the dimension of the feature space,n'is the number of points per batch, andbatch_shapeis the batch shape (must be compatible with the batch shape of the model).Y (Tensor) – A
batch_shape' x n x m-dim Tensor, wheremis the number of model outputs,n'is the number of points per batch, andbatch_shape'is the batch shape of the observations.batch_shape'must be broadcastable tobatch_shapeusing standard broadcasting semantics. IfYhas fewer batch dimensions thanX, it is assumed that the missing batch dimensions are the same for allY.noise (Tensor | None) – If not
None, a tensor of the same shape asYrepresenting the associated noise variance.kwargs (Any) – Passed to
self.get_fantasy_model.
- Returns:
A
Modelobject of the same type, representing the original model conditioned on the new observations(X, Y)(and possibly noise observations passed in via kwargs).- Return type:
Example
>>> train_X = torch.rand(20, 2) >>> train_Y = torch.sin(train_X[:, :1]) + torch.cos(train_X[:, 1:]) >>> model = SingleTaskGP(train_X, train_Y) >>> model.eval() >>> test_X = torch.rand(10, 2) # Need to evaluate once to fill test independent caches # so that condition_on_observations works. >>> model(test_X) >>> new_X = torch.rand(5, 2) >>> new_Y = torch.sin(new_X[:, :1]) + torch.cos(new_X[:, 1:]) >>> model = model.condition_on_observations(X=new_X, Y=new_Y)
- classmethod construct_inputs(training_data)[source]
Constructs the input tensors for LatentKroneckerGP from a SupervisedDataset.
This method processes the provided training data to extract and organize the features and targets into the required format for the LatentKroneckerGP model. It factorizes inputs from the product space into the factors X and T. The matching output Y values are assembled by mapping observed values to their corresponding positions and filling missing values with NaN.
- Parameters:
training_data (SupervisedDataset) – A SupervisedDataset containing training inputs and outputs.
- Returns:
train_X: The unique feature values (excluding the T dimension).train_T: The unique feature values of the T dimension.train_Y: The outputs aligned with the Cartesian product oftrain_Xandtrain_T, with missing values filled as NaN.
- Return type:
A dictionary with keys
train_X,train_T, andtrain_Y, where
Model List GP Regression Models
Model List GP Regression models.
- class botorch.models.model_list_gp_regression.ModelListGP(*gp_models)[source]
Bases:
IndependentModelList,ModelListGPyTorchModel,FantasizeMixinA multi-output GP model with independent GPs for the outputs.
This model supports different-shaped training inputs for each of its sub-models. It can be used with any number of single-output
GPyTorchModels and the models can be of different types. Use this model when you have independent outputs with different training data. When modeling correlations between outputs, useMultiTaskGP.Internally, this model is just a list of individual models, but it implements the same input/output interface as all other BoTorch models. This makes it very flexible and convenient to work with. The sequential evaluation comes at a performance cost though - if you are using a block design (i.e. the same number of training example for each output, and a similar model structure, you should consider using a batched GP model instead, such as
SingleTaskGPwith batched inputs).- Parameters:
*gp_models (GPyTorchModel) – A number of single-output
GPyTorchModels. If models have input/output transforms, these are honored individually for each model.
Example
>>> model1 = SingleTaskGP(train_X1, train_Y1) >>> model2 = SingleTaskGP(train_X2, train_Y2) >>> model = ModelListGP(model1, model2)
- condition_on_observations(X, Y, **kwargs)[source]
Condition the model on new observations.
- Parameters:
X (list[Tensor]) – A
m-list ofbatch_shape x n' x d-dim Tensors, wheredis the dimension of the feature space,n'is the number of points per batch, andbatch_shapeis the batch shape (must be compatible with the batch shape of the model).Y (Tensor) – A
batch_shape' x n' x m-dim Tensor, wheremis the number of model outputs,n'is the number of points per batch, andbatch_shape'is the batch shape of the observations.batch_shape'must be broadcastable tobatch_shapeusing standard broadcasting semantics. IfYhas fewer batch dimensions thanX, it is assumed that the missing batch dimensions are the same for allY.kwargs (Any) – Keyword arguments passed to
IndependentModelList.get_fantasy_model.
- Returns:
A
ModelListGPrepresenting the original model conditioned on the new observations(X, Y)(and possibly noise observations passed in via kwargs). Here thei-th model hasn_i + n'training examples, where then'training examples have been added and all test-time caches have been updated.- Return type:
Multitask GP Models
Multi-Task GP models.
References
E. Bonilla, K. Chai and C. Williams. Multi-task Gaussian Process Prediction. Advances in Neural Information Processing Systems 20, NeurIPS 2007.
K. Swersky, J. Snoek and R. Adams. Multi-Task Bayesian Optimization. Advances in Neural Information Processing Systems 26, NeurIPS 2013.
A. Doucet. A Note on Efficient Conditional Simulation of Gaussian Distributions. http://www.stats.ox.ac.uk/~doucet/doucet_simulationconditionalgaussian.pdf, Apr 2010.
W. Maddox, M. Balandat, A. Wilson, and E. Bakshy. Bayesian Optimization with High-Dimensional Outputs. https://arxiv.org/abs/2106.12997, Jun 2021.
- class botorch.models.multitask.MultiTaskGP(train_X, train_Y, task_feature, train_Yvar=None, mean_module=None, covar_module=None, likelihood=None, task_covar_prior=<class 'botorch.utils.types.DEFAULT'>, output_tasks=None, rank=None, all_tasks=None, outcome_transform=<class 'botorch.utils.types.DEFAULT'>, input_transform=None, validate_task_values=True)[source]
Bases:
ExactGP,MultiTaskGPyTorchModel,FantasizeMixinMulti-Task exact GP model using an ICM (intrinsic co-regionalization model) kernel. See [Bonilla2007MTGP] and [Swersky2013MTBO] for a reference on the model and its use in Bayesian optimization.
By default, this model uses a
PositiveIndexKernelfor the task covariance, model and its use in Bayesian optimization. By default, The ICM kernel is constrained to have only non-negative entries by using aPositiveIndexKernelfor the task covariance. The reason for this is that correlations are typically positive and can be difficult to estimate accurately, especially with limited data.The model can be single-output or multi-output, determined by the
output_tasks. This model uses dimension-scaled priors on the Kernel hyperparameters, which work best when covariates are normalized to the unit cube and outcomes are standardized (zero mean, unit variance). The standardization should be applied in a stratified fashion at the level of the tasks, rather than across all data points.If the
train_Yvaris None, this model infers the noise level. If you have known observation noise, you can settrain_Yvarto a tensor containing the noise variance measurements. WARNING: This currently does not support different noise levels for the different tasks.Multi-Task GP model using an ICM kernel.
- Parameters:
train_X (Tensor) – A
n x (d + 1)orb x n x (d + 1)(batch mode) tensor of training data. One of the columns should contain the task features (seetask_featureargument).train_Y (Tensor) – A
n x 1orb x n x 1(batch mode) tensor of training observations.task_feature (int) – The index of the task feature (
-d <= task_feature <= d).train_Yvar (Tensor | None) – An optional
norb x n(batch mode) tensor of observed measurement noise. If None, we infer the noise. Note that the inferred noise is common across all tasks.mean_module (Module | None) – The mean function to be used. Defaults to
ConstantMean.covar_module (Module | None) – The module for computing the covariance matrix between the non-task features. Defaults to
RBFKernel.likelihood (Likelihood | None) – A likelihood. The default is selected based on
train_Yvar. Iftrain_Yvaris None, a standardGaussianLikelihoodwith inferred noise level is used. Otherwise, aFixedNoiseGaussianLikelihoodis used.output_tasks (list[int] | None) – A list of task indices for which to compute model outputs for. If omitted, return outputs for all task indices.
rank (int | None) – The rank to be used for the index kernel. If omitted, use a full rank (i.e. number of tasks) kernel.
task_covar_prior (Prior | _DefaultType | None) – A Prior on the task covariance matrix. Defaults to
BetaPrior(2.5, 1.5)which biases task correlations toward positive values. PassNoneto use no prior.all_tasks (list[int] | None) – By default, multi-task GPs infer the list of all tasks from the task features in
train_X. This is an experimental feature that enables creation of multi-task GPs with tasks that don’t appear in the training data. Note that when a task is not observed, the corresponding task covariance will heavily depend on random initialization and may behave unexpectedly.outcome_transform (OutcomeTransform | _DefaultType | None) – An outcome transform that is applied to the training data during instantiation and to the posterior during inference (that is, the
Posteriorobtained by calling.posterioron the model will be on the original scale). We use aStandardizetransform if nooutcome_transformis specified. Pass downNoneto use no outcome transform. NOTE: Standardization should be applied in a stratified fashion, separately for each task. Note that.train()will be called on the outcome transform during instantiation of the model.input_transform (InputTransform | None) – An input transform that is applied in the model’s forward pass.
validate_task_values (bool) – If True, validate that the task values supplied in the input are expected tasks values. If false, unexpected task values will be mapped to the first output_task if supplied.
Example
>>> X1, X2 = torch.rand(10, 2), torch.rand(20, 2) >>> i1, i2 = torch.zeros(10, 1), torch.ones(20, 1) >>> train_X = torch.cat([ >>> torch.cat([X1, i1], -1), torch.cat([X2, i2], -1), >>> ]) >>> train_Y = torch.cat([f1(X1), f2(X2)]).unsqueeze(-1) >>> model = MultiTaskGP(train_X, train_Y, task_feature=-1)
- forward(x)[source]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.- Parameters:
x (Tensor)
- Return type:
MultivariateNormal
- eval()[source]
Puts the model in
evalmode.When unobserved tasks are present (i.e.,
all_tasksincludes tasks not in the training data), this method sets the covariance factor for unobserved tasks to the mean of the observed tasks’ covariance factors. This provides a reasonable initialization for prediction on unobserved tasks.- Return type:
Self
- classmethod get_all_tasks(train_X, task_feature, output_tasks=None)[source]
- Parameters:
train_X (Tensor)
task_feature (int)
output_tasks (list[int] | None)
- Return type:
tuple[list[int], int, int]
- classmethod construct_inputs(training_data, task_feature, output_tasks=None, task_covar_prior=<class 'botorch.utils.types.DEFAULT'>, prior_config=None, rank=None, map_heterogeneous_to_full=False)[source]
Construct
Modelkeyword arguments from a dataset and other args.- Parameters:
training_data (SupervisedDataset | MultiTaskDataset) – A
SupervisedDatasetor aMultiTaskDataset.task_feature (int) – Column index of embedded task indicator features.
output_tasks (list[int] | None) – A list of task indices for which to compute model outputs for. If omitted, return outputs for all task indices.
task_covar_prior (Prior | _DefaultType | None) – A GPyTorch
Priorobject to use as prior on the cross-task covariance matrix. Defaults toDEFAULTwhich usesBetaPrior(2.5, 1.5)in the model. PassNoneto use no prior.prior_config (dict | None) – Configuration for inter-task covariance prior. Should only be used if
task_covar_prioris not passed directly. Must containuse_LKJ_priorindicator and should contain float valueeta.rank (int | None) – The rank of the cross-task covariance matrix.
map_heterogeneous_to_full (bool) – If True and
training_datais aMultiTaskDatasetwith heterogeneous features, zero-pad each task’s features into the union feature space and concatenate into a singletrain_Xtensor. The zero-padded entries are intended to be overwritten by aLearnedFeatureImputationinput transform. If False (default), heterogeneous features will raiseUnsupportedErrorviatraining_data.X.
- Return type:
dict[str, Any]
- class botorch.models.multitask.KroneckerMultiTaskGP(train_X, train_Y, likelihood=None, data_covar_module=None, task_covar_prior=None, rank=None, outcome_transform=None, input_transform=None, **kwargs)[source]
Bases:
ExactGP,GPyTorchModel,FantasizeMixinMulti-task GP with Kronecker structure, using an ICM kernel.
This model assumes the “block design” case, i.e., it requires that all tasks are observed at all data points.
For posterior sampling, this model uses Matheron’s rule [Doucet2010sampl] to compute the posterior over all tasks as in [Maddox2021bohdo] by exploiting Kronecker structure.
When a multi-fidelity model has Kronecker structure, this means there is one covariance kernel over the fidelity features (call it
K_f) and another over the rest of the input parameters (call itK_i), and the resulting covariance across inputs and fidelities is given by the Kronecker product of the two covariance matrices. This is equivalent to saying the covariance between two input and feature pairs is given by- K((parameter_1, fidelity_1), (parameter_2, fidelity_2))
= K_f(fidelity_1, fidelity_2) * K_i(parameter_1, parameter_2).
Then the covariance matrix of
n_iparameters andn_ffidelities can be codified as a Kronecker product of ann_i x n_imatrix and ann_f x n_fmatrix, which is far more parsimonious than specifying the whole(n_i * n_f) x (n_i * n_f)covariance matrix.Example
>>> train_X = torch.rand(10, 2) >>> train_Y = torch.cat([f_1(X), f_2(X)], dim=-1) >>> model = KroneckerMultiTaskGP(train_X, train_Y)
- Parameters:
train_X (Tensor) – A
batch_shape x n x dtensor of training features.train_Y (Tensor) – A
batch_shape x n x mtensor of training observations.likelihood (MultitaskGaussianLikelihood | None) – A
MultitaskGaussianLikelihood. If omitted, uses aMultitaskGaussianLikelihoodwith aLogNormalPrior(-4, 1)noise prior.data_covar_module (Module | None) – The module computing the covariance (Kernel) matrix in data space. If omitted, uses an
RBFKernel.task_covar_prior (Prior | None) – A Prior on the task covariance matrix. Must operate on p.s.d. matrices. A common prior for this is the
LKJprior. If omitted, usesLKJCovariancePriorwithetaparameter as specified in the keyword arguments (if not specified, useeta=1.5).rank (int | None) – The rank of the ICM kernel. If omitted, use a full rank kernel.
outcome_transform (OutcomeTransform | None) – An outcome transform that is applied to the training data during instantiation and to the posterior during inference (that is, the
Posteriorobtained by calling.posterioron the model will be on the original scale). We use aStandardizetransform if nooutcome_transformis specified. Pass downNoneto use no outcome transform. NOTE: Standardization should be applied in a stratified fashion, separately for each task. Note that.train()will be called on the outcome transform during instantiation of the model.input_transform (InputTransform | None) – An input transform that is applied in the model’s forward pass.
kwargs (Any) – Additional arguments to override default settings of priors, including: - eta: The eta parameter on the default LKJ task_covar_prior. A value of 1.0 is uninformative, values <1.0 favor stronger correlations (in magnitude), correlations vanish as eta -> inf. - sd_prior: A scalar prior over nonnegative numbers, which is used for the default LKJCovariancePrior task_covar_prior. - likelihood_rank: The rank of the task covariance matrix to fit. Defaults to 0 (which corresponds to a diagonal covariance matrix).
- forward(X)[source]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.- Parameters:
X (Tensor)
- Return type:
MultitaskMultivariateNormal
- property train_full_covar
- property predictive_mean_cache
- posterior(X, output_indices=None, observation_noise=False, posterior_transform=None)[source]
Computes the posterior over model outputs at the provided points.
- Parameters:
X (Tensor) – A
(batch_shape) x q x d-dim Tensor, wheredis the dimension of the feature space andqis the number of points considered jointly.observation_noise (bool | Tensor) – If True, add the observation noise from the likelihood to the posterior. If a Tensor, use it directly as the observation noise (must be of shape
(batch_shape) x q). It is assumed to be in the outcome-transformed space if an outcome transform is used.posterior_transform (PosteriorTransform | None) – An optional PosteriorTransform.
output_indices (list[int] | None)
- Returns:
A
GPyTorchPosteriorobject, representing a batch ofbjoint distributions overqpoints. Includes observation noise if specified.- Return type:
- train(val=True, *args, **kwargs)[source]
Put the model in
trainmode. Reverts to the original inputs if intrainmode (mode=True) or sets transformed inputs if inevalmode (mode=False).- Parameters:
mode – A boolean denoting whether to put in
trainorevalmode. IfFalse, model is put inevalmode.
Heterogeneous Search Space Multitask GP Models
Multi-Task GP model designed to operate on tasks from different search spaces.
References:
- class botorch.models.heterogeneous_mtgp.HeterogeneousMTGP(train_Xs, train_Ys, train_Yvars, feature_indices, full_feature_dim, rank=None, use_saas_prior=True, use_combinatorial_kernel=True, all_tasks=None, input_transform=None, outcome_transform=None, validate_task_values=True)[source]
Bases:
MultiTaskGPA multi-task GP model designed to operate on tasks from different search spaces. This model uses
MultiTaskConditionalKernel.This model was introduced in [Deshwal2024Heterogeneous].
- The model is designed to work with a
MultiTaskDatasetthat contains datasets with different features.
- The model is designed to work with a
- It uses a helper to embed the
Xcoming from the sub-spaces into the full-feature space (+ task feature) before passing them down to the base
MultiTaskGP.
- It uses a helper to embed the
- The same helper is used in the
posteriormethod to embed theXfrom the target task into the full dimensional space before evaluating the
posteriormethod of the base class.
- The same helper is used in the
- This model also overwrites the
_split_inputsmethod. Instead of x_basic, we return theXwith task feature included since this is used by theMultiTaskConditionalKernelto identify the active dimensions of / the kernels to evaluate for the given input.
- This model also overwrites the
Construct a heterogeneous multi-task GP model from lists of inputs corresponding to each task.
NOTE: This model assumes that the task 0 is the output / target task. It will only produce predictions for task 0.
- Parameters:
train_Xs (list[Tensor]) – A list of tensors of shape
(n_i x d_i)whered_iis the dimensionality of the input features for task i. NOTE: These should not include the task feature!train_Ys (list[Tensor]) – A list of tensors of shape
(n_i x 1)containing the observations for the corresponding task.train_Yvars (list[Tensor] | None) – An optional list of tensors of shape
(n_i x 1)containing the observation variances for the corresponding task.feature_indices (list[list[int]]) – A list of lists of integers specifying the indices mapping the features from a given task to the full tensor of features. The
i``th element of the list should contain ``d_iintegers.full_feature_dim (int) – The total number of features across all tasks. This does not include the task feature dimension.
rank (int | None) – The rank of the cross-task covariance matrix.
use_saas_prior (bool) – Whether to use the SAAS prior for base kernels of the
MultiTaskConditionalKernel.use_combinatorial_kernel (bool) – Whether to use a combinatorial kernel over the binary embedding of task features in
MultiTaskConditionalKernel.all_tasks (list[int] | None) – By default, multi-task GPs infer the list of all tasks from the task features in
train_X. This is an experimental feature that enables creation of multi-task GPs with tasks that don’t appear in the training data. Note that when a task is not observed, the corresponding task covariance will heavily depend on random initialization and may behave unexpectedly.input_transform (InputTransform | None) – An input transform that is applied in the model’s forward pass. The transform should be compatible with the inputs from the full feature space with the task feature appended.
outcome_transform (OutcomeTransform | None) – An outcome transform that is applied to the training data during instantiation and to the posterior during inference (that is, the
Posteriorobtained by calling.posterioron the model will be on the original scale).validate_task_values (bool) – If True, validate that the task values supplied in the input are expected tasks values. If false, unexpected task values will be mapped to the first output_task if supplied.
- classmethod get_all_tasks(train_X, task_feature, output_tasks=None)[source]
- Parameters:
train_X (Tensor)
task_feature (int)
output_tasks (list[int] | None)
- Return type:
tuple[list[int], int, int]
- map_to_full_tensor(X, task_index, imputation_values=None)[source]
Map a tensor of task-specific features to the full tensor of features, utilizing the feature indices to map each feature to its corresponding position in the full tensor. Also append the task index as the last column. The columns of the full tensor that are not used by the given task are filled with the per-dimension empirical mean computed across all tasks that contain that dimension (see
_compute_imputation_values). This avoids out-of-domain padding values that would otherwise be squashed by an input transform with fixed bounds (e.g.Normalize).- Parameters:
X (Tensor) – A tensor of shape
(n x d_i)whered_iis the number of features in the original task dataset.task_index (int) – The index of the task whose features are being mapped.
imputation_values (Tensor | None) – Optional pre-computed imputation values. If not provided, uses
self.feature_imputation_values.
- Returns:
A tensor of shape
(n x (self.full_feature_dim + 1))containing the mapped features.- Return type:
Tensor
Example
>>> # Suppose full feature dim is 3, the feature indices for task 5 >>> # are [2, 0], and the empirical mean for missing dim 1 is 7.0. >>> X = torch.tensor([[1.0, 2.0], [3.0, 4.0]]) >>> X_full = self.map_to_full_tensor(X=X, task_index=5) >>> # X_full = torch.tensor([[2.0, 7.0, 1.0, 5.0], [4.0, 7.0, 3.0, 5.0]])
- posterior(X, output_indices=None, observation_noise=False, posterior_transform=None, **kwargs)[source]
Computes the posterior for the target task at the provided points.
- Parameters:
X (Tensor) – A tensor of shape
batch_shape x q x (d_0 + 1), whered_0is the dimension of the feature space for task 0 and the last column is the task indicator (must be 0 for the target task).output_indices (list[int] | None) – Not supported. Must be
Noneor[0].observation_noise (bool | Tensor) – If True, add observation noise from the respective likelihoods. If a Tensor, specifies the observation noise levels to add.
posterior_transform (PosteriorTransform | None) – An optional PosteriorTransform.
kwargs (Any)
- Returns:
A
GPyTorchPosteriorobject, representingbatch_shapejoint distributions overqpoints.- Return type:
- classmethod construct_inputs(training_data, task_feature=-1, output_tasks=None, rank=None, use_saas_prior=True, use_combinatorial_kernel=True, map_heterogeneous_to_full=False)[source]
Construct
Modelkeyword arguments from a givenMultiTaskDataset.- Parameters:
training_data (MultiTaskDataset) – A
MultiTaskDataset.task_feature (int) – Column index of embedded task indicator features. Only supported value is
-1.output_tasks (list[int] | None) – A list of task indices for which to compute model outputs for. Only supported value is
[0].rank (int | None) – The rank of the cross-task covariance matrix.
use_saas_prior (bool) – Whether to use the SAAS prior for base kernels of the
MultiTaskConditionalKernel.use_combinatorial_kernel (bool) – Whether to use a combinatorial kernel over the binary embedding of task features in
MultiTaskConditionalKernel.map_heterogeneous_to_full (bool) – Accepted for compatibility with
MultiTaskGP.construct_inputsbut unused.HeterogeneousMTGPhandles heterogeneous features viaMultiTaskConditionalKernel.
- Return type:
dict[str, Any]
Hierarchical Search Space GP Models
- class botorch.models.hierarchical.conditional_kernel_gp.HierarchicalConditionalKernel(dim, hierarchical_dependencies, eval_hierarchical_features=True, separate_hierarchical_features=True, use_saas_prior=True, use_outputscale=True)[source]
Bases:
KernelA conditional kernel that exploits correlations in hierarchical search spaces.
It is best to describe its behavior by walking through an example. Let’s say the hierarchical search space tree is as follows:
ROOT ├── C1 ├── C2 └── P1 ├── (0) C3 └── (1) P2 ├── (0) C4 └── (1) C5Let’s say the input X is a vector of the form
(C1, C2, C3, C4, C5, P1, P2). The entries do not necessarily need to follow this particular order, though.As a concrete example, this kernel computes:
# Ex1: k([C1, C2, C3, C4, C5, P1=0, P2=0], [C1', C2', C3', C4', C5', P1'=1, P2'=1]) = k([C1, C2], [C1', C2']) + k(P1, P1'). # Ex2: k([C1, C2, C3, C4, C5, P1=1, P2=0], [C1', C2', C3', C4', C5', P1'=1, P2'=1]) = k([C1, C2], [C1', C2']) + k(P1, P1') + k(P2, P2'). # Ex3: k([C1, C2, C3, C4, C5, P1=1, P2=1], [C1', C2', C3', C4', C5', P1'=1, P2'=1]) = k([C1, C2], [C1', C2']) + k(C5, C5') + k(P1, P1') + k(P2, P2').
More generally, the kernel finds all common active features and sums the kernel values over them.
This kernel supports arbitrary tree depths.
Each parent node is allowed to have multiple child nodes.
In particular, single-child parents are allowed. In this case, the parent is a boolean flag.
Dimensions that correspond to parent nodes must be discrete or categorical. Approximate equality is used to compare them for branching logic, which requires the values to be separated by at least a tolerance of
rtol=RTOL,atol=ATOL.
Internally, this kernel determines if a child node is active by checking the values of its ancestors (not just its parent). Examples:
C3 is active if and only if
P1 == 0;C4 is active if and only if
P1 == 1andP2 == 0.
- Parameters:
dim (int) – The dimension of the feature vector.
hierarchical_dependencies (dict[int, dict[int | float, list[int]]]) – A dictionary of the form
{parent_index: {parent_value: children_indices}}.eval_hierarchical_features (bool) – Whether to evaluate correlations over hierarchical features or not. If false, the hierarchical features are merely used as flags to determine the active features, but they do not directly contribute to the kernel values.
separate_hierarchical_features (bool) – This is relevant only if
eval_hierarchical_features=True. If true, the correlations of hierarchical features will be captured by a separate additive kernel. Otherwise, they are treated together with non-hierarchical features.use_saas_prior (bool) – Whether to use the SAAS prior. If false, use the log-normal prior instead.
use_outputscale (bool) – Whether to use the outputscale parameter. If false, the outputscales of each sub-kernels are fixed to 1.
- forward(x1, x2, diag=False, **params)[source]
Computes the covariance between \(\mathbf x_1\) and \(\mathbf x_2\). This method should be implemented by all Kernel subclasses.
- Parameters:
x1 (Tensor) – First set of data (… x N x D).
x2 (Tensor) – Second set of data (… x M x D).
diag (bool) – Should the Kernel compute the whole kernel, or just the diag? If True, it must be the case that
x1 == x2. (Default: False.)last_dim_is_batch – If True, treat the last dimension of
x1andx2as another batch dimension. (Useful for additive structure over the dimensions). (Default: False.)
- Returns:
The kernel matrix or vector. The shape depends on the kernel’s evaluation mode:
full_covar:... x N x Mfull_covarwithlast_dim_is_batch=True:... x K x N x Mdiag:... x Ndiagwithlast_dim_is_batch=True:... x K x N
- Return type:
Tensor
- class botorch.models.hierarchical.conditional_kernel_gp.HierarchicalConditionalKernelGP(train_X, train_Y, hierarchical_dependencies, eval_hierarchical_features=True, separate_hierarchical_features=True, train_Yvar=None, use_saas_prior=True, use_outputscale=True, input_transform=None, outcome_transform=<class 'botorch.utils.types.DEFAULT'>)[source]
Bases:
SingleTaskGPA GP model with a conditional kernel that exploits correlations in hierarchical search spaces.
Single-Task GP model using a hierarchical conditional kernel.
- Parameters:
train_X (Tensor) – A
batch_shape x n x dtensor of training features, wheredis the number of features in the search space. Column indices inhierarchical_dependenciesrefer to columns oftrain_X.train_Y (Tensor) – A
batch_shape x n x mtensor of training observations.hierarchical_dependencies (dict[int, dict[int | float, list[int]]]) – A dictionary of the form
{parent_index: {parent_value: children_indices}}that defines the hierarchical structure of the search space. All indices must be valid column indices intotrain_X(i.e., in[0, d)).eval_hierarchical_features (bool) – Whether to evaluate correlations over hierarchical features or not. If false, the hierarchical features are merely used as flags to determine the active features, but they do not directly contribute to the kernel values.
separate_hierarchical_features (bool) – This is relevant only if
eval_hierarchical_features=True. If true, the correlations of hierarchical features will be captured by a separate additive kernel.train_Yvar (Tensor | None) – An optional
batch_shape x n x mtensor of observed measurement noise. If None, the noise is inferred.use_saas_prior (bool) – Whether to use the SAAS prior. If false, use the log-normal prior instead.
use_outputscale (bool) – Whether to use the outputscale parameter. If false, the outputscales of each sub-kernel are fixed to 1.
input_transform (InputTransform | None) – An input transform that is applied in the model’s forward pass.
outcome_transform (OutcomeTransform | _DefaultType | None) – An outcome transform that is applied to the training data during instantiation and to the posterior during inference.
- classmethod construct_inputs(training_data, hierarchical_dependencies, eval_hierarchical_features=True, separate_hierarchical_features=True, use_saas_prior=True, use_outputscale=True)[source]
Construct
Modelkeyword arguments from aSupervisedDataset.- Parameters:
training_data (SupervisedDataset) – A
SupervisedDataset, with attributestrain_X,train_Y, and, optionally,train_Yvar.hierarchical_dependencies (dict[int, dict[int | float, list[int]]])
eval_hierarchical_features (bool)
separate_hierarchical_features (bool)
use_saas_prior (bool)
use_outputscale (bool)
- Returns:
A dict of keyword arguments that can be used to initialize a
Model, with keystrain_X,train_Y, and, optionally,train_Yvar.- Return type:
dict[str, Any]
- class botorch.models.hierarchical.conditional_kernel_gp.HierarchicalConditionalKernelMultiTaskGP(train_X, train_Y, task_feature, hierarchical_dependencies, train_Yvar=None, eval_hierarchical_features=True, separate_hierarchical_features=True, use_saas_prior=True, use_outputscale=True, likelihood=None, task_covar_prior=None, output_tasks=None, rank=None, all_tasks=None, outcome_transform=<class 'botorch.utils.types.DEFAULT'>, input_transform=None, validate_task_values=True)[source]
Bases:
MultiTaskGPMulti-Task GP with a conditional kernel that exploits correlations in hierarchical search spaces.
This model extends
MultiTaskGPby using aHierarchicalConditionalKernelfor the data covariance instead of the default kernel. Task correlations are still captured via aPositiveIndexKernelas in the parent class.The model can be single-output or multi-output, determined by
output_tasks. It supports dimension-scaled priors on the Kernel hyperparameters, which work best when covariates are normalized to the unit cube and outcomes are standardized (zero mean, unit variance). The standardization should be applied in a stratified fashion at the level of the tasks, rather than across all data points.Multi-Task GP model using a hierarchical conditional kernel.
- Parameters:
train_X (Tensor) – A
n x (d + 1)tensor of training data. One of the columns should contain the task features (seetask_featureargument).train_Y (Tensor) – A
n x 1tensor of training observations.task_feature (int) – The index of the task feature (
-d <= task_feature <= d).hierarchical_dependencies (dict[int, dict[int | float, list[int]]]) – A dictionary of the form
{parent_index: {parent_value: children_indices}}that defines the hierarchical structure of the search space. Parent indices should refer to the feature indices AFTER removing the task feature.train_Yvar (Tensor | None) – An optional
ntensor of observed measurement noise. If None, we infer the noise. Note that the inferred noise is common across all tasks.eval_hierarchical_features (bool) – Whether to evaluate correlations over hierarchical features or not. If false, the hierarchical features are merely used as flags to determine the active features, but they do not directly contribute to the kernel values.
separate_hierarchical_features (bool) – This is relevant only if
eval_hierarchical_features=True. If true, the correlations of hierarchical features will be captured by a separate additive kernel.use_saas_prior (bool) – Whether to use the SAAS prior. If false, use the log-normal prior instead.
use_outputscale (bool) – Whether to use the outputscale parameter. If false, the outputscales of each sub-kernel are fixed to 1.
likelihood (Likelihood | None) – A likelihood. The default is selected based on
train_Yvar. Iftrain_Yvaris None, aHadamardGaussianLikelihoodwith inferred noise level is used. Otherwise, aFixedNoiseGaussianLikelihoodis used.task_covar_prior (Prior | None) – A Prior on the task covariance matrix. Must operate on p.s.d. matrices. A common prior for this is the
LKJprior.output_tasks (list[int] | None) – A list of task indices for which to compute model outputs for. If omitted, return outputs for all task indices.
rank (int | None) – The rank to be used for the index kernel. If omitted, use a full rank (i.e. number of tasks) kernel.
all_tasks (list[int] | None) – By default, multi-task GPs infer the list of all tasks from the task features in
train_X. This is an experimental feature that enables creation of multi-task GPs with tasks that don’t appear in the training data.outcome_transform (OutcomeTransform | _DefaultType | None) – An outcome transform that is applied to the training data during instantiation and to the posterior during inference. We use a
Standardizetransform if nooutcome_transformis specified. Pass downNoneto use no outcome transform.input_transform (InputTransform | None) – An input transform that is applied in the model’s forward pass.
validate_task_values (bool) – If True, validate that the task values supplied in the input are expected task values.
- classmethod construct_inputs(training_data, task_feature, hierarchical_dependencies, eval_hierarchical_features=True, separate_hierarchical_features=True, use_saas_prior=True, use_outputscale=True, output_tasks=None, task_covar_prior=None, rank=None)[source]
Construct
Modelkeyword arguments from a dataset and other args.- Parameters:
training_data (SupervisedDataset | MultiTaskDataset) – A
SupervisedDatasetor aMultiTaskDataset.task_feature (int) – Column index of embedded task indicator features.
hierarchical_dependencies (dict[int, dict[int | float, list[int]]]) – A dictionary of the form
{parent_index: {parent_value: children_indices}}that defines the hierarchical structure of the search space.eval_hierarchical_features (bool) – Whether to evaluate correlations over hierarchical features or not.
separate_hierarchical_features (bool) – Whether to capture the correlations of hierarchical features with a separate additive kernel.
use_saas_prior (bool) – Whether to use the SAAS prior.
use_outputscale (bool) – Whether to use the outputscale parameter.
output_tasks (list[int] | None) – A list of task indices for which to compute model outputs for. If omitted, return outputs for all task indices.
task_covar_prior (Prior | None) – A GPyTorch
Priorobject to use as prior on the cross-task covariance matrix.rank (int | None) – The rank of the cross-task covariance matrix.
- Return type:
dict[str, Any]
This file defines some helper functions for parsing hierarchical dependencies. We will use a running example to illustrate the functionality of each helper function:
ROOT
├── C0, C1
└── P2
├── (0) C3
└── (1) P4
├── (0) C5
└── (1) C6
C0, C1, C3, C5, and C6 are child nodes (similar to non-hierarchical parameters in Ax).
P2 and P4 are parent nodes (similar to hierarchical parameters in Ax).
Each node, except ROOT, corresponds to a dimension in the feature vector X.
Let’s say the input X is a vector of the form (C0, C1, P2, C3, P4, C5, C6). The
features do not necessarily need to follow this particular order, but we will stick to
this order as an example. Then, the hierarchical tree is represented by a list of
dictionaries as follows:
hierarchical_dependencies = {
2: {0: [3], 1: [4]}, # P2 == 0 --> C3 is activated; P2 == 1 -> P4 is activated.
4: {0: [5], 1: [6]}, # P4 == 0 --> C5 is activated; P4 == 1 -> C6 is activated.
}
Note that, in the above example, ROOT does not actually exist in the representation. It is an imaginary node that is the parent of all orphan nodes, e.g., C0, C1, and P2. But all functions in this file supports both rootless trees and rooted trees.
- botorch.models.hierarchical.utils.get_orphan_feature_indices(dim, hierarchical_dependencies)[source]
Construct the indices of the orphan nodes by parsing the hierarchical dependencies. They are precisely the children of the (imaginary) root node.
- Parameters:
dim (int) – The full dimension of the feature vector.
hierarchical_dependencies (dict[int, dict[int, list[int]]]) – A dictionary specifying the hierarchical structure.
- Returns:
A list of indices of the orphan nodes sorted in ascending order.
- Return type:
list[int]
- botorch.models.hierarchical.utils.get_index_to_path(dim, hierarchical_dependencies)[source]
Construct a dictionary that maps the index of each node (feature) to its root-to-node path.
- Parameters:
dim (int) – The full dimension of the feature vector.
hierarchical_dependencies (dict[int, dict[int, list[int]]]) – A dictionary specifying the hierarchical structure.
- Returns:
A dictionary that maps the node (or feature) index
fidto its root-to-node path. The path is represented by a list of tuples of the form(index, value). If we follow the path by settingX[index] = valuefor all(index, value)in the path, thenX[fid]is activated.- Return type:
dict[int, list[tuple[int, int]]]
- botorch.models.hierarchical.utils.get_blocks_with_paths(dim, hierarchical_dependencies, keep_hierarchical_features=True, separate_hierarchical_features=True)[source]
A helper function parsing the hierarchical dependencies. This function does two things:
Partition the indices
{0, 1, 2, ..., dim - 1}into blocks. Features in the same block are always activated together—either they are all active or all inactive.Construct the path from the root to each block. This will be helpful in checking if a block is active or not.
The partition will be used in
HierarchicalConditionalKernel, which creates a kernel for each block. The trivial partition{{0}, {1}, ..., {dim - 1}}is bad, because then the kernel would be completely additive. The goal is to construct a partition where each block is as large as possible. Two nodes end up in the same block if and only if they share the same parent node and are activated by the same parent node value.- Parameters:
dim (int) – The full dimension of the feature vector.
hierarchical_dependencies (dict[int, dict[int, list[int]]]) – A list that specifies the hierarchical dependencies.
keep_hierarchical_features (bool) – If true, the hierarchical features are kept in the partition.
separate_hierarchical_features (bool) – If true, hierarchical features are not grouped with non-hierarchical features. This flag is relevant only if
keep_hierarchical_featuresis true.
- Returns:
A partition of the indices
{0, 1, 2, ..., dim - 1}. Each block in the partition is a list of indices of features that are activated at the same time.A list of root-to-block paths in the tree. Each path is represented by a list of tuples of the form
(index, value). The block is activated if following the path by setting theindex-th feature tovalue.
- Return type:
A tuple of two lists
Multi-Fidelity GP Regression Models
Multi-Fidelity Gaussian Process Regression models based on GPyTorch models.
For more on Multi-Fidelity BO, see the tutorial.
A common use case of multi-fidelity regression modeling is optimizing a “high-fidelity” function that is expensive to simulate when you have access to one or more cheaper “lower-fidelity” versions that are not fully accurate but are correlated with the high-fidelity function. The multi-fidelity model models both the low- and high-fidelity functions together, including the correlation between them, which can help you predict and optimize the high-fidelity function without having to do too many expensive high-fidelity evaluations.
J. Wu, S. Toscano-Palmerin, P. I. Frazier, and A. G. Wilson. Practical multi-fidelity bayesian optimization for hyperparameter tuning. ArXiv 2019.
- class botorch.models.gp_regression_fidelity.SingleTaskMultiFidelityGP(train_X, train_Y, train_Yvar=None, iteration_fidelity=None, data_fidelities=None, linear_truncated=True, nu=2.5, covar_module=None, likelihood=None, outcome_transform=<class 'botorch.utils.types.DEFAULT'>, input_transform=None)[source]
Bases:
SingleTaskGPA single task multi-fidelity GP model.
A SingleTaskGP model using a DownsamplingKernel for the data fidelity parameter (if present) and an ExponentialDecayKernel for the iteration fidelity parameter (if present).
This kernel is described in [Wu2019mf].
Example
>>> train_X = torch.rand(20, 4) >>> train_Y = train_X.pow(2).sum(dim=-1, keepdim=True) >>> model = SingleTaskMultiFidelityGP(train_X, train_Y, data_fidelities=[3])
- Parameters:
train_X (Tensor) – A
batch_shape x n x (d + s)tensor of training features, wheresis the dimension of the fidelity parameters (either one or two).train_Y (Tensor) – A
batch_shape x n x mtensor of training observations.train_Yvar (Tensor | None) – An optional
batch_shape x n x mtensor of observed measurement noise.iteration_fidelity (int | None) – The column index for the training iteration fidelity parameter (optional).
data_fidelities (Sequence[int] | None) – The column indices for the downsampling fidelity parameter. If a list/tuple of indices is provided, a kernel will be constructed for each index (optional).
linear_truncated (bool) – If True, use a
LinearTruncatedFidelityKernelinstead of the default kernel.nu (float) – The smoothness parameter for the Matern kernel: either 1/2, 3/2, or 5/2. Only used when
linear_truncated=True.covar_module (Module | None) – The module for computing the covariance matrix between the non-fidelity features. Defaults to
RBFKernel.likelihood (Likelihood | None) – A likelihood. If omitted, use a standard GaussianLikelihood with inferred noise level.
outcome_transform (OutcomeTransform | _DefaultType | None) – An outcome transform that is applied to the training data during instantiation and to the posterior during inference (that is, the
Posteriorobtained by calling.posterioron the model will be on the original scale). We use aStandardizetransform if nooutcome_transformis specified. Pass downNoneto use no outcome transform.input_transform (InputTransform | None) – An input transform that is applied in the model’s forward pass.
- classmethod construct_inputs(training_data, fidelity_features)[source]
Construct
Modelkeyword arguments from a dict ofSupervisedDataset.- Parameters:
training_data (SupervisedDataset) – Dictionary of
SupervisedDataset.fidelity_features (list[int]) – Index of fidelity parameter as input columns.
- Return type:
dict[str, Any]
Pairwise GP Models
Preference Learning with Gaussian Process
Wei Chu, and Zoubin Ghahramani. Preference learning with Gaussian processes. Proceedings of the 22nd international conference on Machine learning. 2005.
Eric Brochu, Vlad M. Cora, and Nando De Freitas. A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599 (2010).
- class botorch.models.pairwise_gp.PairwiseGP(datapoints, comparisons, likelihood=None, covar_module=None, input_transform=None, *, jitter=1e-06, xtol=None, consolidate_rtol=0.0, consolidate_atol=0.0001, maxfev=None)[source]
Bases:
Model,GP,FantasizeMixinProbit GP for preference learning with Laplace approximation
A probit-likelihood GP that learns via pairwise comparison data, using a Laplace approximation of the posterior of the estimated utility values. By default it uses a scaled RBF kernel.
Implementation is based on [Chu2005preference]. Also see [Brochu2010tutorial] for additional reference.
Note that in [Chu2005preference] the likelihood of a pairwise comparison is :math:
\left(\frac{f(x_1) - f(x_2)}{\sqrt{2}\sigma}\right), i.e. a scale is used in the denominator. To maintain consistency with usage of kernels elsewhere in BoTorch, we instead do not include :math:\sigmain the code (implicitly setting it to 1) and use ScaleKernel to scale the function.In the example below, the user/decision maker has stated that they prefer the first item over the second item and the third item over the second item, generating comparisons [0, 1] and [2, 1]. .. rubric:: Example
>>> from botorch.models import PairwiseGP >>> import torch >>> datapoints = torch.Tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) >>> comparisons = torch.Tensor([[0, 1], [2, 1]]) >>> model = PairwiseGP(datapoints, comparisons)
- Parameters:
datapoints (Tensor | None) – Either
Noneor abatch_shape x n x dtensor of training features. If eitherdatapointsorcomparisonsisNone, construct a prior-only model.comparisons (Tensor | None) – Either
Noneor abatch_shape x m x 2tensor of training comparisons; comparisons[i] is a noisy indicator suggesting the utility value of comparisons[i, 0]-th is greater than comparisons[i, 1]-th. If eithercomparisonsordatapointsisNone, construct a prior-only model.likelihood (PairwiseLikelihood | None) – A PairwiseLikelihood.
covar_module (ScaleKernel | None) – Covariance module.
input_transform (InputTransform | None) – An input transform that is applied in the model’s forward pass.
jitter (float) – Value added to diagonal for numerical stability in
psd_safe_cholesky.xtol (float | None) – Stopping criteria in scipy.optimize.fsolve used to find f_map in
PairwiseGP._update. If None, default behavior is handled byPairwiseGP._update.consolidate_rtol (float) –
rtolpassed toconsolidate_duplicates.consolidate_atol (float) –
atolpassed toconsolidate_duplicates.maxfev (int | None) – The maximum number of calls to the function in scipy.optimize.fsolve. If None, default behavior is handled by
PairwiseGP._update.
- property datapoints: Tensor
Alias for consolidated datapoints
- property comparisons: Tensor
Alias for consolidated comparisons
- property unconsolidated_utility: Tensor
Utility of the unconsolidated datapoints
- property num_outputs: int
The number of outputs of the model.
- property batch_shape: Size
The batch shape of the model.
This is a batch shape from an I/O perspective, independent of the internal representation of the model (as e.g. in BatchedMultiOutputGPyTorchModel). For a model with
moutputs, atest_batch_shape x q x d-shaped inputXto theposteriormethod returns a Posterior object over an output of shapebroadcast(test_batch_shape, model.batch_shape) x q x m.
- classmethod construct_inputs(training_data)[source]
Construct
Modelkeyword arguments from aRankingDataset.- Parameters:
training_data (SupervisedDataset) – A
RankingDataset, with attributestrain_X,train_Y, and, optionally,train_Yvar.- Returns:
A dict of keyword arguments that can be used to initialize a
PairwiseGP, includingdatapointsandcomparisons.- Return type:
dict[str, Tensor]
- set_train_data(datapoints=None, comparisons=None, strict=False, update_model=True)[source]
Set datapoints and comparisons and update model properties if needed
- Parameters:
datapoints (Tensor | None) – Either
Noneor abatch_shape x n x ddimension tensor X. If there are input transformations, assume the datapoints are not transformed. If eitherdatapointsorcomparisonsisNone, construct a prior-only model.comparisons (Tensor | None) – Either
Noneor a tensor of sizebatch_shape x m x 2. (i, j) means f_i is preferred over f_j. If eithercomparisonsordatapointsisNone, construct a prior-only model.strict (bool) –
strictargument as in gpytorch.models.exact_gp for compatibility when using fit_gpytorch_mll with input_transform.update_model (bool) – True if we want to refit the model (see _update) after re-setting the data.
- Return type:
None
- load_state_dict(state_dict, strict=False)[source]
Load kernel hyperparameters and recompute Laplace approximation.
_load_from_state_dictfilters out data-dependent buffers (utility, covariance Cholesky, likelihood Hessian, etc.), loading only kernel hyperparameters (lengthscale, outputscale). After loading, we recompute the Laplace approximation so that the MAP utility and derived tensors are consistent with both the loaded hyperparameters and the current training data. Without this recomputation, the model would use stale Laplace tensors computed during__init__with default hyperparameters, producing an inconsistent (non-PSD) posterior – e.g., during cross-validation where the model is constructed on fold data and then loaded with hyperparameters from the full model.- Parameters:
state_dict (dict[str, Tensor]) – The state dict.
strict (bool) – Boolean specifying whether or not given and instance-bound state_dicts should have identical keys. Only implemented for
strict=Falsesince buffers will be filtered out when calling_load_from_state_dict.
- Returns:
A named tuple
_IncompatibleKeys, containing themissing_keysandunexpected_keys.- Return type:
_IncompatibleKeys
- forward(datapoints)[source]
Calculate a posterior or prior prediction.
During training mode, forward implemented solely for gradient-based hyperparam opt. Essentially what it does is to re-calculate the utility f using its analytical form at f_map so that we are able to obtain gradients of the hyperparameters.
- Parameters:
datapoints (Tensor) – A
batch_shape x n x dTensor, should be the same as self.datapoints during training- Returns:
Posterior centered at MAP points for training data (training mode)
Prior predictions (prior mode)
Predictive posterior (eval mode)
- Return type:
A MultivariateNormal object, being one of the followings
- posterior(X, output_indices=None, observation_noise=False, posterior_transform=None)[source]
Computes the posterior over model outputs at the provided points.
- Parameters:
X (Tensor) – A
batch_shape x q x d-dim Tensor, wheredis the dimension of the feature space andqis the number of points considered jointly.output_indices (list[int] | None) – As defined in parent Model class, not used for this model.
observation_noise (bool) – Ignored (since noise is not identifiable from scale in probit models).
posterior_transform (PosteriorTransform | None) – An optional PosteriorTransform.
- Returns:
- A
Posteriorobject, representing joint distributions over
qpoints.
- A
- Return type:
- condition_on_observations(X, Y)[source]
Condition the model on new observations.
Note that unlike other BoTorch models, PairwiseGP requires Y to be pairwise comparisons.
- Parameters:
X (Tensor) – A
batch_shape x n x ddimension tensor XY (Tensor) – A tensor of size
batch_shape x m x 2. (i, j) means f_i is preferred over f_jkwargs – Not used.
- Returns:
A (deepcopied)
Modelobject of the same type, representing the original model conditioned on the new observations(X, Y).- Return type:
- class botorch.models.pairwise_gp.PairwiseLaplaceMarginalLogLikelihood(likelihood, model)[source]
Bases:
MarginalLogLikelihoodLaplace-approximated marginal log likelihood/evidence for PairwiseGP
See (12) from [Chu2005preference].
- Parameters:
likelihood – Used as in args to GPyTorch MarginalLogLikelihood
model (GP) – Used as in args to GPyTorch MarginalLogLikelihood
- forward(post, comp)[source]
Calculate approximated log evidence, i.e., log(P(D|theta))
Note that post will be based on the consolidated/deduped datapoints for numerical stability, but comp will still be the unconsolidated comparisons so that it’s still compatible with fit_gpytorch_*.
- Parameters:
post (Posterior) – training posterior distribution from self.model (after consolidation)
comp (Tensor) – Comparisons pairs (before consolidation)
- Returns:
The approximated evidence, i.e., the marginal log likelihood
- Return type:
Tensor
Relevance Pursuit Models
Relevance Pursuit model structure and optimization routines for the sparse optimization of Gaussian process hyper-parameters, see [Ament2024pursuit] for details.
References
- class botorch.models.relevance_pursuit.RelevancePursuitMixin(dim, support)[source]
Bases:
ABCMixin class to convert between the sparse and dense representations of the relevance pursuit modules’ sparse parameters, as well as to compute the generalized support acquisition and support deletion criteria.
Constructor for the RelevancePursuitMixin class.
For details, see [Ament2024pursuit] or https://arxiv.org/abs/2410.24222.
- Parameters:
dim (int) – The total number of features.
support (list[int] | None) – The indices of the features in the support, subset of range(dim).
- dim: int
- abstract property sparse_parameter: Parameter
The sparse parameter, required to have a single indexing dimension.
- abstractmethod set_sparse_parameter(value)[source]
Sets the sparse parameter.
NOTE: We can’t use the property setter @sparse_parameter.setter because of the special way PyTorch treats Parameter types, including custom setters that bypass the @property setters before the latter are called.
- Parameters:
value (Parameter)
- Return type:
None
- property is_sparse: bool
- property support: list[int]
The indices of the active parameters.
- property is_active: Tensor
A Boolean Tensor of length
dim, indicating which of thedimindices ofself.sparse_parameterare in the support, i.e. active.
- property inactive_indices: Tensor
An integral Tensor of length
dim - len(support), indicating which of the indices ofself.sparse_parameterare not in the support, i.e. inactive.
- to_sparse()[source]
Converts the sparse parameter to its sparse (< dim) representation.
- Returns:
The current object in its sparse representation.
- Return type:
Self
- to_dense()[source]
Converts the sparse parameter to its dense, length-
dimrepresentation.- Returns:
The current object in its dense representation.
- Return type:
Self
- expand_support(indices)[source]
Expands the support by a number of indices.
- Parameters:
indices (list[int]) – A list of indices of
self.sparse_parameterto add to the support.- Returns:
The current object, updated with the expanded support.
- Return type:
Self
- contract_support(indices)[source]
Contracts the support by a number of indices.
- Parameters:
indices (list[int]) – A list of indices of
self.sparse_parameterto remove from the support.- Returns:
The current object, updated with the contracted support.
- Return type:
Self
- full_support()[source]
Initializes the RelevancePursuitMixin with a full, size-
dimsupport.- Returns:
The current object with full support in the dense representation.
- Return type:
Self
- remove_support()[source]
Initializes the RelevancePursuitMixin with an empty, size-zero support.
- Returns:
The current object with empty support, representation unchanged.
- Return type:
Self
- support_expansion(mll, n=1, modifier=None)[source]
Computes the indices of the elements that maximize the gradient of the sparse parameter and that are not already in the support, and subsequently expands the support to include the elements if their gradient is positive.
- Parameters:
mll (ExactMarginalLogLikelihood) – The marginal likelihood, containing the model to optimize. NOTE: Virtually all of the rest of the code is not specific to the marginal likelihood optimization, so we could generalize this to work with any objective.
n (int) – The maximum number of elements to select. NOTE: The actual number of elements that are added could be fewer if there are fewer than
nelements with a positive gradient.modifier (Callable[[Tensor], Tensor] | None) – A function that modifies the gradient of the inactive elements before computing the support expansion criterion. This can be used to select the maximum gradient magnitude for real-valued elements whose gradients are not non-negative, using modifier = torch.abs.
- Returns:
True if the support was expanded, False otherwise.
- Return type:
bool
- expansion_objective(mll)[source]
Computes an objective value for all the inactive parameters, i.e. self.sparse_parameter[~self.is_active] since we can’t add already active parameters to the support. This value will be used to select the parameters.
- Parameters:
mll (ExactMarginalLogLikelihood) – The marginal likelihood, containing the model to optimize.
- Returns:
The expansion objective value for all the inactive parameters.
- Return type:
Tensor
- support_contraction(mll, n=1, modifier=None)[source]
Computes the indices of the elements with the smallest magnitude, and subsequently contracts the support by excluding the elements.
- Parameters:
mll (ExactMarginalLogLikelihood) – The marginal likelihood, containing the model to optimize. NOTE: Virtually all of the rest of the code is not specific to the marginal likelihood optimization, so we could generalize this to work with any objective.
n (int) – The number of elements to select for removal.
modifier (Callable[[Tensor], Tensor] | None) – A function that modifies the parameter values before computing the support contraction criterion.
- Returns:
True if the support was contracted, False otherwise.
- Return type:
bool
- optimize_mll(mll, model_trace=None, reset_parameters=True, reset_dense_parameters=False, closure=None, optimizer=None, closure_kwargs=None, optimizer_kwargs=None)[source]
Optimizes the marginal likelihood.
- Parameters:
mll (ExactMarginalLogLikelihood) – The marginal likelihood, containing the model to optimize.
model_trace (list[Model] | None) – If not None, a list to which a deepcopy of the model state is appended. NOTE This operation is in place.
reset_parameters (bool) – If True, initializes the sparse parameter to the all-zeros vector before every marginal likelihood optimization step. If False, the optimization is warm-started with the previous iteration’s parameters.
reset_dense_parameters (bool) – If True, re-initializes the dense parameters, e.g. other GP hyper-parameters that are not part of the Relevance Pursuit module, to the initial values provided by their associated constraints.
closure (Callable[[], tuple[Tensor, Sequence[Tensor | None]]] | None) – A closure to use to compute the loss and the gradients, see docstring of
fit_gpytorch_mllfor details.optimizer (Callable | None) – The numerical optimizer, see docstring of
fit_gpytorch_mll.closure_kwargs (dict[str, Any] | None) – Additional arguments to pass to the
closurefunction.optimizer_kwargs (dict[str, Any] | None) – A dictionary of keyword arguments for the optimizer.
- Returns:
The marginal likelihood after optimization.
- botorch.models.relevance_pursuit.forward_relevance_pursuit(sparse_module, mll, sparsity_levels=None, reset_parameters=True, reset_dense_parameters=False, record_model_trace=True, initial_support=None, closure=None, optimizer=None, closure_kwargs=None, optimizer_kwargs=None)[source]
Forward Relevance Pursuit.
NOTE: For the robust
SparseOutlierNoisemodel of [Ament2024pursuit], the forward algorithm is generally faster than the backward algorithm, particularly when the maximum sparsity level is small, but it leads to less robust results when the number of outliers is large.For details, see [Ament2024pursuit] or https://arxiv.org/abs/2410.24222.
Example
>>> base_noise = HomoskedasticNoise( >>> noise_constraint=NonTransformedInterval( >>> 1e-5, 1e-1, initial_value=1e-3 >>> ) >>> ) >>> likelihood = SparseOutlierGaussianLikelihood( >>> base_noise=base_noise, >>> dim=X.shape[0], >>> ) >>> model = SingleTaskGP(train_X=X, train_Y=Y, likelihood=likelihood) >>> mll = ExactMarginalLogLikelihood(model.likelihood, model) >>> # NOTE: ``likelihood.noise_covar`` is the ``RelevancePursuitMixin`` >>> sparse_module = likelihood.noise_covar >>> sparse_module, model_trace = forward_relevance_pursuit(sparse_module, mll)
- Parameters:
sparse_module (RelevancePursuitMixin) – The relevance pursuit module.
mll (ExactMarginalLogLikelihood) – The marginal likelihood, containing the model to optimize.
sparsity_levels (list[int] | None) – The sparsity levels to expand the support to.
reset_parameters (bool) – If true, initializes the sparse parameter to the all zeros after each iteration.
reset_dense_parameters (bool) – If true, re-initializes the dense parameters, e.g. other GP hyper-parameters that are not part of the Relevance Pursuit module, to the initial values provided by their associated constraints.
record_model_trace (bool) – If true, records the model state after every iteration.
initial_support (list[int] | None) – The support with which to initialize the sparse module. By default, the support is initialized to the empty set.
closure (Callable[[], tuple[Tensor, Sequence[Tensor | None]]] | None) – A closure to use to compute the loss and the gradients, see docstring of
fit_gpytorch_mllfor details.optimizer (Callable | None) – The numerical optimizer, see docstring of
fit_gpytorch_mll.closure_kwargs (dict[str, Any] | None) – Additional arguments to pass to the
closurefunction.optimizer_kwargs (dict[str, Any] | None) – A dictionary of keyword arguments to pass to the optimizer. By default, initializes the “options” sub-dictionary with
maxiterandftol,gtolvalues, unless specified.
- Returns:
The relevance pursuit module after forward relevance pursuit optimization, and a list of models with different supports that were optimized.
- Return type:
tuple[RelevancePursuitMixin, list[Model] | None]
- botorch.models.relevance_pursuit.backward_relevance_pursuit(sparse_module, mll, sparsity_levels=None, reset_parameters=True, reset_dense_parameters=False, record_model_trace=True, initial_support=None, closure=None, optimizer=None, closure_kwargs=None, optimizer_kwargs=None)[source]
Backward Relevance Pursuit.
NOTE: For the robust
SparseOutlierNoisemodel of [Ament2024pursuit], the backward algorithm generally leads to more robust results than the forward algorithm, especially when the number of outliers is large, but is more expensive unless the support is contracted by more than one in each iteration.For details, see [Ament2024pursuit] or https://arxiv.org/abs/2410.24222.
Example
>>> base_noise = HomoskedasticNoise( >>> noise_constraint=NonTransformedInterval( >>> 1e-5, 1e-1, initial_value=1e-3 >>> ) >>> ) >>> likelihood = SparseOutlierGaussianLikelihood( >>> base_noise=base_noise, >>> dim=X.shape[0], >>> ) >>> model = SingleTaskGP(train_X=X, train_Y=Y, likelihood=likelihood) >>> mll = ExactMarginalLogLikelihood(model.likelihood, model) >>> # NOTE: ``likelihood.noise_covar`` is the ``RelevancePursuitMixin`` >>> sparse_module = likelihood.noise_covar >>> sparse_module, model_trace = backward_relevance_pursuit(sparse_module, mll)
- Parameters:
sparse_module (RelevancePursuitMixin) – The relevance pursuit module.
mll (ExactMarginalLogLikelihood) – The marginal likelihood, containing the model to optimize.
sparsity_levels (list[int] | None) – The sparsity levels to expand the support to.
reset_parameters (bool) – If true, initializes the sparse parameter to the all zeros after each iteration.
reset_dense_parameters (bool) – If true, re-initializes the dense parameters, e.g. other GP hyper-parameters that are not part of the Relevance Pursuit module, to the initial values provided by their associated constraints.
record_model_trace (bool) – If true, records the model state after every iteration.
initial_support (list[int] | None) – The support with which to initialize the sparse module. By default, the support is initialized to the full set.
closure (Callable[[], tuple[Tensor, Sequence[Tensor | None]]] | None) – A closure to use to compute the loss and the gradients, see docstring of
fit_gpytorch_mllfor details.optimizer (Callable | None) – The numerical optimizer, see docstring of
fit_gpytorch_mll.closure_kwargs (dict[str, Any] | None) – Additional arguments to pass to the
closurefunction.optimizer_kwargs (dict[str, Any] | None) – A dictionary of keyword arguments to pass to the optimizer. By default, initializes the “options” sub-dictionary with
maxiterandftol,gtolvalues, unless specified.
- Returns:
The relevance pursuit module after forward relevance pursuit optimization, and a list of models with different supports that were optimized.
- Return type:
tuple[RelevancePursuitMixin, list[Model] | None]
- botorch.models.relevance_pursuit.get_posterior_over_support(rp_class, model_trace, log_support_prior=None, prior_mean_of_support=None)[source]
Computes the posterior distribution over a list of models. Assumes we are storing both likelihood and GP model in the model_trace.
Example
>>> likelihood = SparseOutlierGaussianLikelihood( >>> base_noise=base_noise, >>> dim=X.shape[0], >>> ) >>> model = SingleTaskGP(train_X=X, train_Y=Y, likelihood=likelihood) >>> mll = ExactMarginalLogLikelihood(model.likelihood, model) >>> # NOTE: ``likelihood.noise_covar`` is the ``RelevancePursuitMixin`` >>> sparse_module = likelihood.noise_covar >>> sparse_module, model_trace = backward_relevance_pursuit(sparse_module, mll) >>> # NOTE: SparseOutlierNoise is the type of ``sparse_module`` >>> support_size, bmc_probabilities = get_posterior_over_support( >>> SparseOutlierNoise, model_trace, prior_mean_of_support=2.0 >>> )
- Parameters:
rp_class (type[RelevancePursuitMixin]) – The relevance pursuit class to use for computing the support size. This is used to get the RelevancePursuitMixin from the Model via the static method
_from_model. We could generalize this and let the user pass this getter instead.model_trace (list[Model]) – A list of models with different support sizes, usually generated with relevance_pursuit.
log_support_prior (Callable[[Tensor], Tensor] | None) – Callable that computes the log prior probability of a support size. If None, uses a default exponential prior with a mean specified by
prior_mean_of_support.prior_mean_of_support (float | None) – A mean value for the default exponential prior distribution over the support size. Ignored if
log_support_prioris passed.
- Returns:
A tensor of posterior marginal likelihoods, one for each model in the trace.
- Return type:
tuple[Tensor, Tensor]
Sparse Axis-Aligned Subspaces (SAAS) GP Models
References
S. Daulton, D. Eriksson, M. Balandat, and E. Bakshy. BONSAI: Bayesian Optimization with Natural Simplicity and Interpretability. ArXiv, 2026.
- class botorch.models.map_saas.SaasPriorHelper(tau=None)[source]
Bases:
objectHelper class for specifying parameter and setting closures.
Instantiates a new helper object.
- Parameters:
tau (Tensor | float | None) – Value of the global shrinkage parameter. If
None, the tau will be a free parameter and inferred from the data. Tau can be a tensor for batched models, likeEnsembleMapSaasSingleTaskGP, where each batch has a different sparsity prior. If tau is a tensor, it must have shapebatch_shape.
- tau(m)[source]
The global shrinkage parameter
tau.- Parameters:
m (Kernel) – A kernel object equipped with a lengthscale.
- Returns:
The global shrinkage parameter of the SAAS prior.
- Return type:
Tensor
- inv_lengthscale_prior_param_or_closure(m)[source]
Closure to compute the scaled inverse lengthscale parameter (
tau / l^2) to which the SAAS prior is applied.- Parameters:
m (Kernel) – A kernel object equipped with a lengthscale.
- Returns:
The scaled inverse lengthscale parameter.
- Return type:
Tensor
- inv_lengthscale_prior_setting_closure(m, value)[source]
Closure to set the inverse lengthscale prior parameter.
- Parameters:
m (Kernel) – A kernel object equipped with a lengthscale.
value (Tensor) – The value of the scaled inverse lengthscale parameter, (
tau / l^2), used to recover and set the lengthscale of the kernel.
- Return type:
None
- botorch.models.map_saas.add_saas_prior(base_kernel, tau=None, log_scale=True)[source]
Add a SAAS prior to a given base_kernel.
The SAAS prior is given by tau / lengthscale^2 ~ HC(1.0). If tau is None, we place an additional HC(0.1) prior on tau similar to the original SAAS prior that relies on inference with NUTS.
Example
>>> matern_kernel = MaternKernel(...) >>> add_saas_prior(matern_kernel, tau=None) # Add a SAAS prior
- Parameters:
base_kernel (Kernel) – Base kernel that has a lengthscale and uses ARD. Note that this function modifies the kernel object in place.
tau (Tensor | float | None) – Value of the global shrinkage. If
None, infer the global shrinkage parameter. Can be a tensor for batched models (e.g., ensembles) where each batch has a different sparsity prior.log_scale (bool) – Set to
Trueif the lengthscale and tau should be optimized on a log-scale without any domain rescaling. That is, we will learnraw_lengthscale := log(lengthscale)and this hyperparameter needs to satisfy the corresponding bound constraints. Setting this toTruewill generally improve the numerical stability, but requires an optimizer that can handle bound constraints, e.g., L-BFGS-B.
- Returns:
Base kernel with SAAS priors added.
- Return type:
Kernel
- botorch.models.map_saas.get_map_saas_model(train_X, train_Y, train_Yvar=None, input_transform=None, outcome_transform=None, tau=None)[source]
Helper method for creating an unfitted MAP SAAS model.
- Parameters:
train_X (Tensor) – Tensor of shape
n x dwith training inputs.train_Y (Tensor) – Tensor of shape
n x 1with training targets.train_Yvar (Tensor | None) – Optional tensor of shape
n x 1with observed noise, inferred if None.input_transform (InputTransform | None) – An optional input transform.
outcome_transform (OutcomeTransform | None) – An optional outcome transform.
tau (Tensor | float | None) – Fixed value of the global shrinkage tau. If None, the model places a HC(0.1) prior on tau and infers it. Can be a tensor for batched models where each batch has a different sparsity prior.
- Returns:
A SingleTaskGP with a Matern kernel and a SAAS prior.
- Return type:
- botorch.models.map_saas.get_mean_module_with_normal_prior(batch_shape=None)[source]
Return constant mean with a N(0, 1) prior constrained to [-10, 10].
This prior assumes the outputs (targets) have been standardized to have zero mean and unit variance.
- Parameters:
batch_shape (Size | None) – Optional batch shape for the constant-mean module.
- Returns:
ConstantMean module.
- Return type:
ConstantMean
- botorch.models.map_saas.get_gaussian_likelihood_with_gamma_prior(batch_shape=None)[source]
Return Gaussian likelihood with a Gamma(0.9, 10) prior.
This prior prefers small noise, but also has heavy tails.
- Parameters:
batch_shape (Size | None) – Batch shape for the likelihood.
- Returns:
GaussianLikelihood with Gamma(0.9, 10) prior constrained to [1e-4, 0.1].
- botorch.models.map_saas.get_additive_map_saas_covar_module(ard_num_dims, num_taus=4, active_dims=None, batch_shape=None, dtype=None, device=None)[source]
Return an additive map SAAS covar module.
The constructed kernel is an additive kernel with
num_tausterms. Each term is a scaled Matern kernel with a SAAS prior and a tau sampled from a HalfCauchy(0, 1) distribution.- Parameters:
ard_num_dims (int) – The number of inputs dimensions.
num_taus (int) – The number of taus to use (4 if omitted).
active_dims (tuple[int, ...] | None) – Active dims for the covar module. The kernel will be evaluated only using these columns of the input tensor.
batch_shape (Size | None) – Batch shape for the covar module.
dtype (dtype | None)
device (device | None)
- Returns:
An additive MAP SAAS covar module.
- class botorch.models.map_saas.AdditiveMapSaasSingleTaskGP(train_X, train_Y, train_Yvar=None, outcome_transform=<class 'botorch.utils.types.DEFAULT'>, input_transform=None, num_taus=4)[source]
Bases:
SingleTaskGPAn additive MAP SAAS single-task GP.
This is a maximum-a-posteriori (MAP) version of sparse axis-aligned subspace BO (SAASBO), see
SaasFullyBayesianSingleTaskGPfor more details. SAASBO is a high-dimensional Bayesian optimization approach that uses approximate fully Bayesian inference via NUTS to learn the model hyperparameters. This works very well, but is very computationally expensive which limits the use of SAASBO to a small (~100) number of trials. Two of the main benefits with SAASBO are:A sparse prior on the inverse lengthscales that avoid overfitting.
The ability to sample several (~16) sets of hyperparameters from the posterior that we can average over when computing the acquisition function (ensembling).
The goal of this Additive MAP SAAS model is to retain the main benefits of the SAAS model while significantly speeding up the time to fit the model. We achieve this by creating an additive kernel where each kernel in the sum is a Matern-5/2 kernel with a SAAS prior and a separate outputscale. The sparsity level for each kernel is sampled from an HC(0.1) distribution leading to a mix of sparsity levels (as is often the case for the fully Bayesian SAAS model). We learn all the hyperparameters using MAP inference which is significantly faster than using NUTS.
While we often find that the original SAAS model with NUTS performs better, the additive MAP SAAS model can be several orders of magnitude faster to fit, which makes it applicable to problems with potentially thousands of trials.
Instantiates an AdditiveMapSaasSingleTaskGP.
- Parameters:
train_X (Tensor) – A
batch_shape x n x dtensor of training features.train_Y (Tensor) – A
batch_shape x n x mtensor of training observations.train_Yvar (Tensor | None) – A
batch_shape x n x mtensor of observed noise.outcome_transform (OutcomeTransform | _DefaultType | None) – An outcome transform that is applied to the training data during instantiation and to the posterior during inference (that is, the
Posteriorobtained by calling.posterioron the model will be on the original scale). We use aStandardizetransform if nooutcome_transformis specified. Pass downNoneto use no outcome transform.input_transform (InputTransform | None) – An optional input transform.
num_taus (int) – The number of taus to use (4 if omitted).
- class botorch.models.map_saas.EnsembleMapSaasSingleTaskGP(train_X, train_Y, train_Yvar=None, num_taus=4, taus=None, outcome_transform=<class 'botorch.utils.types.DEFAULT'>, input_transform=None)[source]
Bases:
SingleTaskGPInstantiates an
EnsembleMapSaasSingleTaskGP[Daulton2026bonsai], which is a batched ensemble ofSingleTaskGP``s with the Matern-5/2 kernel and a SAAS prior. The model is intended to be trained with ``ExactMarginalLogLikelihoodandfit_gpytorch_mll. Under the hood, the model is equivalent to a multi-outputBatchedMultiOutputGPyTorchModel, but it produces aGaussianMixturePosterior, which leads to ensembling of the model outputs.- Parameters:
train_X (Tensor) – An
n x dtensor of training features.train_Y (Tensor) – An
n x 1tensor of training observations.train_Yvar (Tensor | None) – An optional
n x 1tensor of observed measurement noise.num_taus (int) – The number of taus to use (4 if omitted). Each tau is a sparsity parameter for the corresponding kernel in the ensemble.
taus (Tensor | None) – An optional tensor of shape
num_tauscontaining the taus to use. If omitted, the taus are sampled from a HalfCauchy(0.1) distribution.outcome_transform (OutcomeTransform | _DefaultType | None) – An outcome transform that is applied to the training data during instantiation and to the posterior during inference (that is, the
Posteriorobtained by calling.posterioron the model will be on the original scale). We use aStandardizetransform if nooutcome_transformis specified. Pass downNoneto use no outcome transform. Note that.train()will be called on the outcome transform during instantiation of the model.input_transform (InputTransform | None) – An input transform that is applied in the model’s forward pass.
- posterior(X, output_indices=None, observation_noise=False, posterior_transform=None, **kwargs)[source]
Computes the posterior over model outputs at the provided points.
- Parameters:
X (Tensor) – A
(batch_shape) x q x d-dim Tensor, wheredis the dimension of the feature space andqis the number of points considered jointly.output_indices (list[int] | None) – A list of indices, corresponding to the outputs over which to compute the posterior (if the model is multi-output). Can be used to speed up computation if only a subset of the model’s outputs are required for optimization. If omitted, computes the posterior over all model outputs.
observation_noise (bool) – If True, add the observation noise from the likelihood to the posterior. If a Tensor, use it directly as the observation noise (must be of shape
(batch_shape) x q x m).posterior_transform (PosteriorTransform | None) – An optional PosteriorTransform.
kwargs (Any)
- Returns:
- A
GaussianMixturePosteriorobject. Includes observation noise if specified.
- A
- Return type:
- classmethod construct_inputs(training_data, *, num_taus=4)[source]
Construct
Modelkeyword arguments from a dict ofSupervisedDataset.- Parameters:
training_data (SupervisedDataset) – A
SupervisedDatasetcontaining the training data.num_taus (int) – Number of taus to use in the ensemble (4 if omitted).
- Return type:
dict[str, BotorchContainer | Tensor]
- load_state_dict(state_dict, strict=True)[source]
Load the model state.
- Parameters:
state_dict (Mapping[str, Any]) – A dict containing the state of the model.
strict (bool) – A boolean indicating whether to strictly enforce that the keys.
keep_transforms – A boolean indicating whether to keep the input and outcome transforms. Doing so is useful when loading a model that was trained on a full set of data, and is later loaded with a subset of the data.
assign – When set to
False, the properties of the tensors in the current module are preserved whereas setting it toTruepreserves properties of the Tensors in the state dict. The only exception is therequires_gradfield ofParameterfor which the value from the module is preserved. Default:False.
- Return type:
None
Variational GP Models
This file contains a readily usable implementation of the robust Gaussian process model of [Ament2024pursuit], leveraging the Relevance Pursuit algorithm.
In particular, this file contains a RobustRelevancePursuitMixin class,
and a concrete implementation of a SingleTaskGP model,
RobustRelevancePursuitSingleTaskGP, which has the same API as a standard
SingleTaskGP model, but automatically instantiates the robust likelihood
SparseOutlierGaussianLikelihood and dispatches the relevance pursuit
algorithm during model fitting via fit_gpytorch_mll.
Even though a standard SingleTaskGP model is expressive enough to implement
the robust model by changing the likelihood, its optimization is more complex.
So the main reason for the RobustRelevancePursuitMixin class is to hide
this complexity by using multiple dispatch of fit_gpytorch_mll, which needs
to do two distinct operations in the context of the robust model:
It needs to toggle the relevance pursuit discrete optimization algorithm that changes the support, and as a sub-task,
it needs to still carry out the numerical optimization of the hyper-parameters given a fixed support, but still with a
SparseOutlierGaussianLikelihood. Since the types of the marginal likelihood (MarginalLogLikelihood) and the likelihood (SparseOutlierGaussianLikelihood) are the same in both calls, the only way we can leverage the multiple dispatch mechanism is the model type.
- class botorch.models.robust_relevance_pursuit_model.RobustRelevancePursuitMixin(base_likelihood, dim, prior_mean_of_support=None, convex_parameterization=True, cache_model_trace=False)[source]
Bases:
ABCA Mixin class for robust relevance pursuit models, which wraps a base likelihood with a
SparseOutlierGaussianLikelihoodto detect outliers, and calls the relevance pursuit algorithm during model fitting viafit_gpytorch_mll.This is distinct from the
RelevancePursuitMixinclass, which is a Mixin class to equip a specific module (the likelihood, in the case of the robust model) with the relevance pursuit algorithms.Initializes a robust relevance pursuit model, which wraps a base likelihood with a
SparseOutlierGaussianLikelihoodto detect outliers, and calls the relevance pursuit algorithm during model fitting viafit_gpytorch_mll.For details, see [Ament2024pursuit] or https://arxiv.org/abs/2410.24222.
- Parameters:
base_likelihood (GaussianLikelihood | FixedNoiseGaussianLikelihood) – The base likelihood that will be wrapped by a
SparseOutlierGaussianLikelihoodto detect outliers.dim (int) – The number of training data points, i.e. the maximum dimensionality of the support set of the likelihood.
prior_mean_of_support (float | None) – The mean value for the default exponential prior distribution over the support size.
convex_parameterization (bool) – If True, use a convex parameterization of the sparse noise model. See
SparseOutlierGaussianLikelihoodfor details.cache_model_trace (bool) – If True, cache the model trace during relevance pursuit.
- abstractmethod to_standard_model()[source]
Converts this
RobustRelevancePursuitMixinto an equivalent standard model with the same robust likelihood and hyper-parameters. This leaves the model structure and predictions unchanged, but leadsfit_gpytorch_mll’s dispatch to numerically optimize the hyper-parameters of the model with a fixed support set, as opposed to dispatching to the discrete optimization via the relevance pursuit algorithm.- Returns:
A standard model.
- Return type:
- load_standard_model(standard_model)[source]
Loads the state dict of a model into the
RobustRelevancePursuitMixin.- Parameters:
standard_model (Model) – A standard model with the same parameter structure and likelihood as the
RobustRelevancePursuitMixinmodel.- Returns:
The
RobustRelevancePursuitMixinwith the standard model’s state dict.- Return type:
Self
- custom_fit(mll, *, numbers_of_outliers=None, fractions_of_outliers=None, timeout_sec=None, relevance_pursuit_optimizer=<function backward_relevance_pursuit>, reset_parameters=True, reset_dense_parameters=False, closure=None, optimizer=None, closure_kwargs=None, optimizer_kwargs=None)[source]
Fits a RobustRelevancePursuitGP model using the given marginal likelihood.
For details, see [Ament2024pursuit] or https://arxiv.org/abs/2410.24222.
- Parameters:
mll (MarginalLogLikelihood) – The marginal likelihood to fit.
numbers_of_outliers (list[int] | None) – An optional list of numbers of outliers to consider during relevance pursuit. By default, the algorithm falls back to a default list of fractions of outliers, see below.
fractions_of_outliers (list[float] | None) – An optional list of fractions of outliers to consider if numbers_of_outliers is None. By default, the algorithm uses
[0, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.75, 1.0].relevance_pursuit_optimizer (Callable) – The relevance pursuit optimizer to use.
reset_parameters (bool) – If True, reset sparse parameters after each iteration.
reset_dense_parameters (bool) – If True, reset dense parameters after each iteration.
closure (Callable[[], tuple[Tensor, Sequence[Tensor | None]]] | None) – A closure to compute loss and gradients.
optimizer (Callable | None) – The numerical optimizer.
closure_kwargs (dict[str, Any] | None) – Additional arguments to pass to the closure.
optimizer_kwargs (Mapping[str, Any] | None) – Additional arguments to pass to fit_gpytorch_mll.
timeout_sec (float | None)
- Returns:
The fitted marginal likelihood.
- Return type:
MarginalLogLikelihood
- class botorch.models.robust_relevance_pursuit_model.RobustRelevancePursuitSingleTaskGP(train_X, train_Y, train_Yvar=None, likelihood=None, covar_module=None, mean_module=None, outcome_transform=<class 'botorch.utils.types.DEFAULT'>, input_transform=None, convex_parameterization=True, prior_mean_of_support=None, cache_model_trace=False)[source]
Bases:
SingleTaskGP,RobustRelevancePursuitMixin- A robust single-task GP model that toggles the relevance pursuit algorithm
during model fitting via
fit_gpytorch_mll.
For details, see [Ament2024pursuit] or https://arxiv.org/abs/2410.24222.
- Parameters:
train_X (Tensor) – A
batch_shape x n x dtensor of training features.train_Y (Tensor) – A
batch_shape x n x mtensor of training observations.train_Yvar (Tensor | None) – An optional
batch_shape x n x mtensor of observed measurement noise.likelihood (Likelihood | None) – A base likelihood that will be wrapped by a
SparseOutlierGaussianLikelihoodto detect outliers. If omitted, use a standardGaussianLikelihoodwith inferred noise level iftrain_Yvaris None, and aFixedNoiseGaussianLikelihoodwith the given noise observations iftrain_Yvaris not None.covar_module (Module | None) – The module computing the covariance (Kernel) matrix. If omitted, uses an
RBFKernel.mean_module (Mean | None) – The mean function to be used. If omitted, use a
ConstantMean.outcome_transform (OutcomeTransform | _DefaultType | None) – An outcome transform that is applied to the training data during instantiation and to the posterior during inference (that is, the
Posteriorobtained by calling.posterioron the model will be on the original scale). We use aStandardizetransform if nooutcome_transformis specified. Pass downNoneto use no outcome transform.input_transform (InputTransform | None) – An input transform that is applied in the model’s forward pass.
convex_parameterization (bool) – If True, use a convex parameterization of the sparse noise model. See
SparseOutlierGaussianLikelihoodfor details.prior_mean_of_support (float | None) – The mean value for the default exponential prior distribution over the support size.
cache_model_trace (bool) – If True, cache the model trace during relevance pursuit.
Example
>>> m = RobustRelevancePursuitSingleTaskGP(train_X=X, train_Y=Y) >>> mll = ExactMarginalLogLikelihood(model=m, likelihood=m.likelihood) >>> mll = fit_gpytorch_mll(mll)
References
David R. Burt and Carl Edward Rasmussen and Mark van der Wilk, Convergence of Sparse Variational Inference in Gaussian Process Regression, Journal of Machine Learning Research, 2020, http://jmlr.org/papers/v21/19-1015.html.
James Hensman and Nicolo Fusi and Neil D. Lawrence, Gaussian Processes for Big Data, Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence, 2013, https://arxiv.org/abs/1309.6835.
Henry B. Moss and Sebastian W. Ober and Victor Picheny, Inducing Point Allocation for Sparse Gaussian Processes in High-Throughput Bayesian Optimization, Proceedings of the 25th International Conference on Artificial Intelligence and Statistics, 2023, https://arxiv.org/pdf/2301.10123.pdf.
- class botorch.models.approximate_gp.ApproximateGPyTorchModel(model=None, likelihood=None, num_outputs=1, *args, **kwargs)[source]
Bases:
GPyTorchModelBotorch wrapper class for various (variational) approximate GP models in GPyTorch.
This can either include stochastic variational GPs (SVGPs) or variational implementations of weight space approximate GPs.
- Parameters:
model (ApproximateGP | None) – Instance of gpytorch.approximate GP models. If omitted, constructs a
_SingleTaskVariationalGP.likelihood (Likelihood | None) – Instance of a GPyTorch likelihood. If omitted, uses a either a
GaussianLikelihood(ifnum_outputs=1) or aMultitaskGaussianLikelihood``(if ``num_outputs>1).num_outputs (int) – Number of outputs expected for the GP model.
args – Optional positional arguments passed to the
_SingleTaskVariationalGPconstructor if no model is provided.kwargs – Optional keyword arguments passed to the
_SingleTaskVariationalGPconstructor if no model is provided.
- property num_outputs
The number of outputs of the model.
- train(mode=True)[source]
Put the model in
trainmode.- Parameters:
mode (bool) – A boolean denoting whether to put in
trainorevalmode. IfFalse, model is put inevalmode.- Return type:
Self
- posterior(X, output_indices=None, observation_noise=False, posterior_transform=None)[source]
Computes the posterior over model outputs at the provided points.
- Parameters:
X – A
(batch_shape) x q x d-dim Tensor, wheredis the dimension of the feature space andqis the number of points considered jointly.observation_noise (bool) – If True, add the observation noise from the likelihood to the posterior. If a Tensor, use it directly as the observation noise (must be of shape
(batch_shape) x q). It is assumed to be in the outcome-transformed space if an outcome transform is used.posterior_transform (PosteriorTransform | None) – An optional PosteriorTransform.
output_indices (list[int] | None)
- Returns:
A
GPyTorchPosteriorobject, representing a batch ofbjoint distributions overqpoints. Includes observation noise if specified.- Return type:
- forward(X)[source]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.- Return type:
MultivariateNormal
- class botorch.models.approximate_gp.SingleTaskVariationalGP(train_X, train_Y=None, likelihood=None, num_outputs=1, learn_inducing_points=True, covar_module=None, mean_module=None, variational_distribution=None, variational_strategy=<class 'gpytorch.variational.variational_strategy.VariationalStrategy'>, inducing_points=None, inducing_point_allocator=None, outcome_transform=None, input_transform=None)[source]
Bases:
ApproximateGPyTorchModelA single-task variational GP model following [hensman2013svgp].
By default, the inducing points are initialized though the
GreedyVarianceReductionof [burt2020svgp], which is known to be effective for building globally accurate models. However, custom inducing point allocators designed for specific down-stream tasks can also be provided (see [moss2023ipa] for details), e.g.GreedyImprovementReductionwhen the goal is to build a model suitable for standard BO.A single-task variational GP using relatively strong priors on the Kernel hyperparameters, which work best when covariates are normalized to the unit cube and outcomes are standardized (zero mean, unit variance).
This model works in batch mode (each batch having its own hyperparameters). When the training observations include multiple outputs, this model will use batching to model outputs independently. However, batches of multi-output models are not supported at this time, if you need to use those, please use a ModelListGP.
Use this model if you have a lot of data or if your responses are non-Gaussian.
To train this model, you should use gpytorch.mlls.VariationalELBO and not the exact marginal log likelihood.
Example
>>> import torch >>> from botorch.models import SingleTaskVariationalGP >>> from gpytorch.mlls import VariationalELBO >>> >>> train_X = torch.rand(20, 2) >>> model = SingleTaskVariationalGP(train_X) >>> mll = VariationalELBO( >>> model.likelihood, model.model, num_data=train_X.shape[-2] >>> )
- Parameters:
train_X (Tensor) – Training inputs (due to the ability of the SVGP to sub-sample this does not have to be all of the training inputs).
train_Y (Tensor | None) – Training targets (optional).
likelihood (Likelihood | None) – Instance of a GPyTorch likelihood. If omitted, uses a either a
GaussianLikelihood(ifnum_outputs=1) or aMultitaskGaussianLikelihood``(if ``num_outputs>1).num_outputs (int) – Number of output responses per input (default: 1).
learn_inducing_points (bool) – If True, the inducing point locations are learned jointly with the other model parameters.
covar_module (Kernel | None) – Kernel function. If omitted, uses an
RBFKernel.mean_module (Mean | None) – Mean of GP model. If omitted, uses a
ConstantMean.variational_distribution (_VariationalDistribution | None) – Type of variational distribution to use (default: CholeskyVariationalDistribution), the properties of the variational distribution will encourage scalability or ease of optimization.
variational_strategy (type[_VariationalStrategy]) – Type of variational strategy to use (default: VariationalStrategy). The default setting uses “whitening” of the variational distribution to make training easier.
inducing_points (Tensor | int | None) – The number or specific locations of the inducing points.
inducing_point_allocator (InducingPointAllocator | None) – The
InducingPointAllocatorused to initialize the inducing point locations. If omitted, usesGreedyVarianceReduction.outcome_transform (OutcomeTransform | None) – An outcome transform that is applied to the training data during instantiation and to the posterior during inference. NOTE: If this model is trained in minibatches, an outcome transform with learnable parameters (such as
Standardize) would update its parameters for each minibatch, which is undesirable. If you do intend to train in minibatches, we recommend you not use an outcome transform and instead pre-transform your whole data set before fitting the model.input_transform (InputTransform | None) – An input transform that is applied in the model’s forward pass. NOTE: If this model is trained in minibatches, an input transform with learnable parameters (such as
Normalize) would update its parameters for each minibatch, which is undesirable. If you do intend to train in minibatches, we recommend you not use an input transform and instead pre-transform your whole data set before fitting the model.
- property batch_shape: Size
The batch shape of the model.
This is a batch shape from an I/O perspective. For a model with
moutputs, atest_batch_shape x q x d-shaped inputXto theposteriormethod returns a Posterior object over an output of shapebroadcast(test_batch_shape, model.batch_shape) x q x m.
- init_inducing_points(inputs)[source]
Reinitialize the inducing point locations in-place with the current kernel applied to
inputsthrough the model’s inducing point allocation strategy. The variational distribution and variational strategy caches are reset.- Parameters:
inputs (Tensor) – (*batch_shape, n, d)-dim input data tensor.
- Returns:
(*batch_shape, m, d)-dim tensor of selected inducing point locations.
- Return type:
Tensor
Model Components
Kernels
- class botorch.models.kernels.categorical.CategoricalKernel(ard_num_dims=None, batch_shape=None, active_dims=None, lengthscale_prior=None, lengthscale_constraint=None, **kwargs)[source]
Bases:
KernelA Kernel for categorical features.
Computes
exp(-dist(x1, x2) / lengthscale), wheredist(x1, x2)is zero ifx1 == x2and one ifx1 != x2. If the last dimension is not a batch dimension, then the mean is considered.Note: This kernel is NOT differentiable w.r.t. the inputs.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- Parameters:
ard_num_dims (int | None)
batch_shape (torch.Size | None)
active_dims (tuple[int, ...] | None)
lengthscale_prior (Prior | None)
lengthscale_constraint (Interval | None)
- class botorch.models.kernels.downsampling.DownsamplingKernel(power_prior=None, offset_prior=None, power_constraint=None, offset_constraint=None, **kwargs)[source]
Bases:
KernelGPyTorch Downsampling Kernel.
Computes a covariance matrix based on the down sampling kernel between inputs
x_1andx_2(we expectd = 1):- K(mathbf{x_1}, mathbf{x_2}) = c + (1 - x_1)^(1 + delta) *
(1 - x_2)^(1 + delta).
where
cis an offset parameter, anddeltais a power parameter.- Parameters:
power_constraint (Interval | None) – Constraint to place on power parameter. Default is
Positive.power_prior (Prior | None) – Prior over the power parameter.
offset_constraint (Interval | None) – Constraint to place on offset parameter. Default is
Positive.active_dims – List of data dimensions to operate on.
len(active_dims)should equalnum_dimensions.offset_prior (Prior | None)
- class botorch.models.kernels.exponential_decay.ExponentialDecayKernel(power_prior=None, offset_prior=None, power_constraint=None, offset_constraint=None, **kwargs)[source]
Bases:
KernelGPyTorch Exponential Decay Kernel.
Computes a covariance matrix based on the exponential decay kernel between inputs
x_1andx_2(we expectd = 1):K(x_1, x_2) = w + beta^alpha / (x_1 + x_2 + beta)^alpha.
where
wis an offset parameter,betais a lenthscale parameter, andalphais a power parameter.- Parameters:
lengthscale_constraint – Constraint to place on lengthscale parameter. Default is
Positive.lengthscale_prior – Prior over the lengthscale parameter.
power_constraint (Interval | None) – Constraint to place on power parameter. Default is
Positive.power_prior (Prior | None) – Prior over the power parameter.
offset_constraint (Interval | None) – Constraint to place on offset parameter. Default is
Positive.active_dims – List of data dimensions to operate on.
len(active_dims)should equalnum_dimensions.offset_prior (Prior | None)
- class botorch.models.kernels.infinite_width_bnn.InfiniteWidthBNNKernel(depth=3, batch_shape=None, active_dims=None, acos_eps=1e-07, device=None)[source]
Bases:
KernelInfinite-width BNN kernel.
Defines the GP kernel which is equivalent to performing exact Bayesian inference on a fully-connected deep neural network with ReLU activations and i.i.d. priors in the infinite-width limit. See [Cho2009kernel] and [Lee2018deep] for details.
[Cho2009kernel]Y. Cho, and L. Saul. Kernel methods for deep learning. Advances in Neural Information Processing Systems 22. 2009.
[Lee2018deep]J. Lee, Y. Bahri, R. Novak, S. Schoenholz, J. Pennington, and J. Dickstein. Deep Neural Networks as Gaussian Processes. International Conference on Learning Representations. 2018.
- Parameters:
depth (int) – Depth of neural network.
batch_shape (torch.Size | None) – This will set a separate weight/bias var for each batch. It should be :math:
B_1 \times \ldots \times B_kif :math:\mathbfis a :math:B_1 \times \ldots \times B_k \times N \times Dtensor.active_dims (param) – Compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions.
acos_eps (param) – A small positive value to restrict acos inputs to :math``[-1 + epsilon, 1 - epsilon]``
device (param) – Device for parameters.
- class botorch.models.kernels.linear_truncated_fidelity.LinearTruncatedFidelityKernel(fidelity_dims, dimension=None, power_prior=None, power_constraint=None, nu=2.5, lengthscale_prior_unbiased=None, lengthscale_prior_biased=None, lengthscale_constraint_unbiased=None, lengthscale_constraint_biased=None, covar_module_unbiased=None, covar_module_biased=None, **kwargs)[source]
Bases:
KernelGPyTorch Linear Truncated Fidelity Kernel.
Computes a covariance matrix based on the Linear truncated kernel between inputs
x_1andx_2for up to two fidelity parameters:K(x_1, x_2) = k_0 + c_1(x_1, x_2)k_1 + c_2(x_1,x_2)k_2 + c_3(x_1,x_2)k_3
where
k_i(i=0,1,2,3)are Matern kernels calculated between non-fidelityparameters of
x_1andx_2with different priors.
c_1=(1 - x_1[f_1])(1 - x_2[f_1]))(1 + x_1[f_1] x_2[f_1])^pis the kernelof the bias term, which can be decomposed into a deterministic part and a polynomial kernel. Here
f_1is the first fidelity dimension andpis the order of the polynomial kernel.
c_3is the same asc_1but is calculated for the second fidelitydimension
f_2.
c_2is the interaction term with four deterministic terms and thepolynomial kernel between
x_1[..., [f_1, f_2]]andx_2[..., [f_1, f_2]].
Example
>>> x = torch.randn(10, 5) >>> # Non-batch: Simple option >>> covar_module = LinearTruncatedFidelityKernel() >>> covar = covar_module(x) # Output: LinearOperator of size (10 x 10) >>> >>> batch_x = torch.randn(2, 10, 5) >>> # Batch: Simple option >>> covar_module = LinearTruncatedFidelityKernel(batch_shape = torch.Size([2])) >>> covar = covar_module(x) # Output: LinearOperator of size (2 x 10 x 10)
- Parameters:
fidelity_dims (list[int]) – A list containing either one or two indices specifying the fidelity parameters of the input.
dimension (int | None) – The dimension of
x. Unused ifactive_dimsis specified.power_prior (Prior | None) – Prior for the power parameter of the polynomial kernel. Default is
None.power_constraint (Interval | None) – Constraint on the power parameter of the polynomial kernel. Default is
Positive.nu (float) – The smoothness parameter for the Matern kernel: either 1/2, 3/2, or 5/2. Unused if both
covar_module_unbiasedandcovar_module_biasedare specified.lengthscale_prior_unbiased (Prior | None) – Prior on the lengthscale parameter of Matern kernel
k_0. Default isGamma(1.1, 1/20).lengthscale_constraint_unbiased (Interval | None) – Constraint on the lengthscale parameter of the Matern kernel
k_0. Default isPositive.lengthscale_prior_biased (Prior | None) – Prior on the lengthscale parameter of Matern kernels
k_i(i>0). Default isGamma(5, 1/20).lengthscale_constraint_biased (Interval | None) – Constraint on the lengthscale parameter of the Matern kernels
k_i(i>0). Default isPositive.covar_module_unbiased (Kernel | None) – Specify a custom kernel for
k_0. If omitted, use aMaternKernel.covar_module_biased (Kernel | None) – Specify a custom kernel for the biased parts
k_i(i>0). If omitted, use aMaternKernel.batch_shape – If specified, use a separate lengthscale for each batch of input data. If
x1is abatch_shape x n x dtensor, this should bebatch_shape.active_dims – Compute the covariance of a subset of input dimensions. The numbers correspond to the indices of the dimensions.
kwargs (Any)
- class botorch.models.kernels.contextual_lcea.LCEAKernel(decomposition, batch_shape, train_embedding=True, cat_feature_dict=None, embs_feature_dict=None, embs_dim_list=None, context_weight_dict=None, device=None)[source]
Bases:
KernelThe Latent Context Embedding Additive (LCE-A) Kernel.
This kernel is similar to the SACKernel, and is used when context breakdowns are unobservable. It assumes the same additive structure and a spatial kernel shared across contexts. Rather than assuming independence, LCEAKernel models the correlation in the latent functions for each context through learning context embeddings.
- Parameters:
decomposition (dict[str, list[int]]) – Keys index context names. Values are the indexes of parameters belong to the context.
batch_shape (Size) – Batch shape as usual for gpytorch kernels. Model does not support batch training. When batch_shape is non-empty, it is used for loading hyper-parameter values generated from MCMC sampling.
train_embedding (bool) – A boolean indicator of whether to learn context embeddings.
cat_feature_dict (dict | None) – Keys are context names and values are list of categorical features i.e. {“context_name” : [cat_0, …, cat_k]}. k equals the number of categorical variables. If None, uses context names in the decomposition as the only categorical feature, i.e., k = 1.
embs_feature_dict (dict | None) – Pre-trained continuous embedding features of each context.
embs_dim_list (list[int] | None) – Embedding dimension for each categorical variable. The length equals to num of categorical features k. If None, the embedding dimension is set to 1 for each categorical variable.
context_weight_dict (dict | None) – Known population weights of each context.
device (device | None)
- class botorch.models.kernels.contextual_sac.SACKernel(decomposition, batch_shape, device=None)[source]
Bases:
KernelThe structural additive contextual(SAC) kernel.
The kernel is used for contextual BO without observing context breakdowns. There are d parameters and M contexts. In total, the dimension of parameter space is d*M and input x can be written as x=[x_11, …, x_1d, x_21, …, x_2d, …, x_M1, …, x_Md].
The kernel uses the parameter decomposition and assumes an additive structure across contexts. Each context component is assumed to be independent.
\[\begin{equation*} k(\mathbf{x}, \mathbf{x'}) = k_1(\mathbf{x_(1)}, \mathbf{x'_(1)}) + \cdots + k_M(\mathbf{x_(M)}, \mathbf{x'_(M)}) \end{equation*}\]where * :math: M is the number of partitions of parameter space. Each partition contains same number of parameters d. Each kernel
k_iacts only on d parameters of ith partition i.e.\mathbf{x}_(i). Each kernelk_iis a scaled RBF kernel with same lengthscales but different outputscales.- Parameters:
decomposition (dict[str, list[int]]) – Keys are context names. Values are the indexes of parameters belong to the context. The parameter indexes are in the same order across contexts.
batch_shape (Size) – Batch shape as usual for gpytorch kernels.
device (device | None) – The torch device.
- class botorch.models.kernels.orthogonal_additive_kernel.OrthogonalAdditiveKernel(dim, base_kernel=None, per_dim_lengthscales=True, quad_deg=32, second_order=False, batch_shape=None, dtype=None, device=None, coeff_constraint=Positive(), offset_prior=None, coeffs_1_prior=None, coeffs_2_prior=None)[source]
Bases:
KernelOrthogonal Additive Kernels (OAKs) were introduced in [Lu2022additive], though only for the case of Gaussian base kernels with a Gaussian input data distribution.
The implementation here generalizes OAKs to arbitrary base kernels by using a Gauss-Legendre quadrature approximation to the required one-dimensional integrals involving the base kernels.
[Lu2022additive]X. Lu, A. Boukouvalas, and J. Hensman. Additive Gaussian processes revisited. Proceedings of the 39th International Conference on Machine Learning. Jul 2022.
- Parameters:
dim (int) – Input dimensionality of the kernel.
base_kernel (Kernel | None) – The kernel which to orthogonalize and evaluate in
forward. If None, creates an RBF kernel whosebatch_shapeis determined byper_dim_lengthscales. If provided, a compatibility check is performed against the expectedbatch_shape.per_dim_lengthscales (bool) – If True (default), the default base kernel gets
batch_shape=(*batch_shape, dim), giving each additive component its own independent lengthscale. If False, the default base kernel getsbatch_shape=batch_shape, sharing a single lengthscale across all components.quad_deg (int) – Number of integration nodes for orthogonalization.
second_order (bool) – Toggles second order interactions. If true, both the time and space complexity of evaluating the kernel are quadratic in
dim.batch_shape (Size | None) – Optional batch shape for the kernel and its parameters.
dtype (dtype | None) – Initialization dtype for required Tensors.
device (device | None) – Initialization device for required Tensors.
coeff_constraint (Interval) – Constraint on the coefficients of the additive kernel.
offset_prior (Prior | None) – Prior on the offset coefficient. Should be prior with non- negative support.
coeffs_1_prior (Prior | None) – Prior on the parameter main effects. Should be prior with non-negative support.
coeffs_2_prior (Prior | None) – Prior on the parameter interactions. Should be prior with non-negative support.
- class botorch.models.kernels.positive_index.PositiveIndexKernel(num_tasks, rank=1, task_prior=None, normalize_covar_matrix=False, var_constraint=None, target_task_index=0, unit_scale_for_target=True, **kwargs)[source]
Bases:
IndexKernelA kernel for discrete indices with strictly positive correlations. This is enforced by a positivity constraint on the decomposed covariance matrix.
Similar to IndexKernel but ensures all off-diagonal correlations are positive by using a Cholesky-like parameterization with positive elements.
\[k(i, j) = \frac{(LL^T)_{i,j}}{(LL^T)_{t,t}}\]where L is a lower triangular matrix with positive elements and t is the target_task_index.
A kernel for discrete indices with strictly positive correlations.
- Parameters:
num_tasks (int) – Total number of indices.
rank (int) – Rank of the covariance matrix parameterization.
task_prior (Prior, optional) – Prior for the covariance matrix.
normalize_covar_matrix (bool) – Whether to normalize the covariance matrix.
target_task_index (int) – Index of the task whose diagonal element should be normalized to 1. Defaults to 0 (first task).
unit_scale_for_target (bool) – Whether to ensure the target task’s has unit outputscale.
**kwargs – Additional arguments passed to IndexKernel.
var_constraint (Interval | None)
Kernels for multi-task GPs with heterogeneous search spaces.
- class botorch.models.kernels.heterogeneous_multitask.MultiTaskConditionalKernel(feature_indices, task_feature_index=-1, use_saas_prior=True, use_combinatorial_kernel=True)[source]
Bases:
KernelA base kernel for multi-task GPs with heterogeneous search spaces for tasks.
This kernel was introduced in [Deshwal2024Heterogeneous].
This kernel conditionally combines multiple sub-kernels to calculate covariances.
The kernel operates on
full_feature_dim + 1dimensional inputs, with the+ 1dimension representing the task feature.- Given a list of indices representing the active feature dimensions for each task,
the feature space is split into several non-overlapping subsets and a base kernel gets constructed for each of these subset dimensions.
- The task feature is embedded into a binary tensor, which, together with a
DeltaKernel, determines which of the sub-kernels are added together for the given inputs.
- There is an additional Combinatorial kernel that operates over the binary
embedding of task features.
Initialize the kernel.
- Parameters:
feature_indices (list[list[int]]) – A list of lists of integers specifying the indices that select the features of a given task from the full tensor of features. The
i``th element of the list should contain ``d_iintegers. These are the active indices for the given task.task_feature_index (int) – Index of the task feature in the input tensor.
use_saas_prior (bool) – If True, use SAAS prior for the Matern kernels.
use_combinatorial_kernel (bool) – If True, use combinatorial kernel over the binary embedding of task features.
Likelihoods
Pairwise likelihood for pairwise preference model (e.g., PairwiseGP).
- class botorch.models.likelihoods.pairwise.PairwiseLikelihood(max_plate_nesting=1)[source]
Bases:
Likelihood,ABCPairwise likelihood base class for pairwise preference GP (e.g., PairwiseGP).
Initialized like a
gpytorch.likelihoods.Likelihood.- Parameters:
max_plate_nesting (int) – Defaults to 1.
- forward(utility, D)[source]
Given the difference in (estimated) utility util_diff = f(v) - f(u), return a Bernoulli distribution object representing the likelihood of the user prefer v over u.
Note that this is not used by the
PairwiseGPmodel,- Parameters:
utility (Tensor)
D (Tensor)
- Return type:
Bernoulli
- abstractmethod p(utility, D)[source]
Given the difference in (estimated) utility util_diff = f(v) - f(u), return the probability of the user prefer v over u.
- Parameters:
utility (Tensor) – A Tensor of shape
(batch_size) x n, the utility at MAP pointD (Tensor) – D is
(batch_size x) m x nmatrix with all elements being zero in last dimension except at two positions D[…, i] = 1 and D[…, j] = -1 respectively, representing item i is preferred over item j.log – if true, return log probability
- Return type:
Tensor
- log_p(utility, D)[source]
return the log of p
- Parameters:
utility (Tensor)
D (Tensor)
- Return type:
Tensor
- negative_log_gradient_sum(utility, D)[source]
- Calculate the sum of negative log gradient with respect to each item’s latent
utility values. Useful for models using laplace approximation.
- Parameters:
utility (Tensor) – A Tensor of shape
(batch_size x) n, the utility at MAP pointD (Tensor) – D is
(batch_size x) m x nmatrix with all elements being zero in last dimension except at two positions D[…, i] = 1 and D[…, j] = -1 respectively, representing item i is preferred over item j.
- Returns:
A
(batch_size x) nTensor representing the sum of negative log gradient values of the likelihood over all comparisons (i.e., the m dimension) with respect to each item.- Return type:
Tensor
- negative_log_hessian_sum(utility, D)[source]
- Calculate the sum of negative log hessian with respect to each item’s latent
utility values. Useful for models using laplace approximation.
- Parameters:
utility (Tensor) – A Tensor of shape
(batch_size) x n, the utility at MAP pointD (Tensor) – D is
(batch_size x) m x nmatrix with all elements being zero in last dimension except at two positions D[…, i] = 1 and D[…, j] = -1 respectively, representing item i is preferred over item j.
- Returns:
A
(batch_size x) n x nTensor representing the sum of negative log hessian values of the likelihood over all comparisons (i.e., the m dimension) with respect to each item.- Return type:
Tensor
- class botorch.models.likelihoods.pairwise.PairwiseProbitLikelihood(max_plate_nesting=1)[source]
Bases:
PairwiseLikelihoodPairwise likelihood using probit function
Given two items v and u with utilities f(v) and f(u), the probability that we prefer v over u with probability std_normal_cdf((f(v) - f(u))/sqrt(2)). Note that this formulation implicitly assume the noise term is fixed at 1.
Initialized like a
gpytorch.likelihoods.Likelihood.- Parameters:
max_plate_nesting (int) – Defaults to 1.
- p(utility, D, log=False)[source]
Given the difference in (estimated) utility util_diff = f(v) - f(u), return the probability of the user prefer v over u.
- Parameters:
utility (Tensor) – A Tensor of shape
(batch_size) x n, the utility at MAP pointD (Tensor) – D is
(batch_size x) m x nmatrix with all elements being zero in last dimension except at two positions D[…, i] = 1 and D[…, j] = -1 respectively, representing item i is preferred over item j.log (bool) – if true, return log probability
- Return type:
Tensor
- negative_log_gradient_sum(utility, D)[source]
- Calculate the sum of negative log gradient with respect to each item’s latent
utility values. Useful for models using laplace approximation.
- Parameters:
utility (Tensor) – A Tensor of shape
(batch_size x) n, the utility at MAP pointD (Tensor) – D is
(batch_size x) m x nmatrix with all elements being zero in last dimension except at two positions D[…, i] = 1 and D[…, j] = -1 respectively, representing item i is preferred over item j.
- Returns:
A
(batch_size x) nTensor representing the sum of negative log gradient values of the likelihood over all comparisons (i.e., the m dimension) with respect to each item.- Return type:
Tensor
- negative_log_hessian_sum(utility, D)[source]
- Calculate the sum of negative log hessian with respect to each item’s latent
utility values. Useful for models using laplace approximation.
- Parameters:
utility (Tensor) – A Tensor of shape
(batch_size) x n, the utility at MAP pointD (Tensor) – D is
(batch_size x) m x nmatrix with all elements being zero in last dimension except at two positions D[…, i] = 1 and D[…, j] = -1 respectively, representing item i is preferred over item j.
- Returns:
A
(batch_size x) n x nTensor representing the sum of negative log hessian values of the likelihood over all comparisons (i.e., the m dimension) with respect to each item.- Return type:
Tensor
- class botorch.models.likelihoods.pairwise.PairwiseLogitLikelihood(max_plate_nesting=1)[source]
Bases:
PairwiseLikelihoodPairwise likelihood using logistic (i.e., sigmoid) function
Given two items v and u with utilities f(v) and f(u), the probability that we prefer v over u with probability sigmoid(f(v) - f(u)). Note that this formulation implicitly assume the beta term in logistic function is fixed at 1.
Initialized like a
gpytorch.likelihoods.Likelihood.- Parameters:
max_plate_nesting (int) – Defaults to 1.
- log_p(utility, D)[source]
return the log of p
- Parameters:
utility (Tensor)
D (Tensor)
- Return type:
Tensor
- p(utility, D)[source]
Given the difference in (estimated) utility util_diff = f(v) - f(u), return the probability of the user prefer v over u.
- Parameters:
utility (Tensor) – A Tensor of shape
(batch_size) x n, the utility at MAP pointD (Tensor) – D is
(batch_size x) m x nmatrix with all elements being zero in last dimension except at two positions D[…, i] = 1 and D[…, j] = -1 respectively, representing item i is preferred over item j.log – if true, return log probability
- Return type:
Tensor
- negative_log_gradient_sum(utility, D)[source]
- Calculate the sum of negative log gradient with respect to each item’s latent
utility values. Useful for models using laplace approximation.
- Parameters:
utility (Tensor) – A Tensor of shape
(batch_size x) n, the utility at MAP pointD (Tensor) – D is
(batch_size x) m x nmatrix with all elements being zero in last dimension except at two positions D[…, i] = 1 and D[…, j] = -1 respectively, representing item i is preferred over item j.
- Returns:
A
(batch_size x) nTensor representing the sum of negative log gradient values of the likelihood over all comparisons (i.e., the m dimension) with respect to each item.- Return type:
Tensor
- negative_log_hessian_sum(utility, D)[source]
- Calculate the sum of negative log hessian with respect to each item’s latent
utility values. Useful for models using laplace approximation.
- Parameters:
utility (Tensor) – A Tensor of shape
(batch_size) x n, the utility at MAP pointD (Tensor) – D is
(batch_size x) m x nmatrix with all elements being zero in last dimension except at two positions D[…, i] = 1 and D[…, j] = -1 respectively, representing item i is preferred over item j.
- Returns:
A
(batch_size x) n x nTensor representing the sum of negative log hessian values of the likelihood over all comparisons (i.e., the m dimension) with respect to each item.- Return type:
Tensor
- class botorch.models.likelihoods.sparse_outlier_noise.SparseOutlierGaussianLikelihood(base_noise, dim, outlier_indices=None, rho_prior=None, rho_constraint=None, batch_shape=None, convex_parameterization=True, loo=True)[source]
Bases:
_GaussianLikelihoodBaseA likelihood that models the noise of a GP with SparseOutlierNoise, a noise model in the Relevance Pursuit family of models, permitting additional “robust” variance for a small set of outlier data points. Notably, the indices of the outlier data points are inferred during the optimization of the associated log marginal likelihood via the Relevance Pursuit algorithm.
For details, see [Ament2024pursuit] or https://arxiv.org/abs/2410.24222.
NOTE: Letting base_noise also use the non-transformed constraints, will lead to more stable optimization, but is orthogonal implementation-wise. If the base noise is a HomoskedasticNoise, one can pass the non-transformed constraint as the
noise_constraint.Example
>>> base_noise = HomoskedasticNoise( >>> noise_constraint=NonTransformedInterval( >>> 1e-5, 1e-1, initial_value=1e-3 >>> ) >>> ) >>> likelihood = SparseOutlierGaussianLikelihood( >>> base_noise=base_noise, >>> dim=X.shape[0], >>> ) >>> model = SingleTaskGP(train_X=X, train_Y=Y, likelihood=likelihood) >>> mll = ExactMarginalLogLikelihood(model.likelihood, model) >>> # NOTE: ``likelihood.noise_covar`` is the ``RelevancePursuitMixin`` >>> sparse_module = likelihood.noise_covar >>> backward_relevance_pursuit(sparse_module, mll)
- Parameters:
base_noise (Noise | FixedGaussianNoise) – The base noise model.
dim (int) – The number of training observations, which determines the maximum number of data-point-specific noise variances of the noise model.
outlier_indices (list[int] | None) – The indices of the outliers.
rho_prior (Prior | None) – Prior for
self.noise_covar’s rho parameter.rho_constraint (NonTransformedInterval | None) – Constraint for
self.noise_covar’s rho parameter. Needs to be aNonTransformedIntervalbecause exact sparsity cannot be represented using smooth transforms like a softplus or sigmoid.batch_shape (Size | None) – The batch shape of the learned noise parameter (default: []).
convex_parameterization (bool) – Whether to use the convex parameterization of rho, which generally improves optimization results and is thus recommended.
loo (bool) – Whether to use leave-one-out (LOO) update equations that can compute the optimal values of each individual rho, keeping all else equal.
- class botorch.models.likelihoods.sparse_outlier_noise.SparseOutlierNoise(base_noise, dim, outlier_indices=None, rho_prior=None, rho_constraint=None, batch_shape=None, convex_parameterization=True, loo=True)[source]
Bases:
Noise,RelevancePursuitMixinA noise model in the Relevance Pursuit family of models, permitting additional “robust” variance for a small set of outlier data points. See also
SparseOutlierGaussianLikelihood, which leverages this noise model.For details, see [Ament2024pursuit] or https://arxiv.org/abs/2410.24222.
Example
>>> base_noise = HomoskedasticNoise( >>> noise_constraint=NonTransformedInterval( >>> 1e-5, 1e-1, initial_value=1e-3 >>> ) >>> ) >>> likelihood = SparseOutlierGaussianLikelihood( >>> base_noise=base_noise, >>> dim=X.shape[0], >>> ) >>> model = SingleTaskGP(train_X=X, train_Y=Y, likelihood=likelihood) >>> mll = ExactMarginalLogLikelihood(model.likelihood, model) >>> # NOTE: ``likelihood.noise_covar`` is the ``SparseOutlierNoise`` >>> sparse_module = likelihood.noise_covar >>> backward_relevance_pursuit(sparse_module, mll)
- Parameters:
base_noise (Noise | FixedGaussianNoise) – The base noise model.
dim (int) – The number of training observations, which determines the maximum number of data-point-specific noise variances of the noise model.
outlier_indices (list[int] | None) – The indices of the outliers.
rho_prior (Prior | None) – Prior for the rho parameter.
rho_constraint (NonTransformedInterval | None) – Constraint for the rho parameter. Needs to be a NonTransformedInterval because exact sparsity cannot be represented using smooth transforms like a softplus or sigmoid.
batch_shape (Size | None) – The batch shape of the learned noise parameter (default: []).
convex_parameterization (bool) – Whether to use the convex parameterization of rho, which generally improves optimization results and is thus recommended.
loo (bool) – Whether to use leave-one-out (LOO) update equations that can compute the optimal values of each individual rho, keeping all else equal.
- property sparse_parameter: Parameter
The sparse parameter, required to have a single indexing dimension.
- set_sparse_parameter(value)[source]
Sets the sparse parameter.
NOTE: We can’t use the property setter @sparse_parameter.setter because of the special way PyTorch treats Parameter types, including custom setters.
- Parameters:
value (Parameter)
- Return type:
None
- property convex_parameterization: bool
- property rho: Tensor
Dense representation of the data-point-specific variances, corresponding to the latent
self.raw_rhovalues, which might be represented sparsely or in the convex parameterization. The last dimension is equal to the number of training pointsself.dim.NOTE:
rhodiffers fromself.sparse_parameterin that the latter returns the parameter in its sparse representation whenself.is_sparseis true, and in its latent convex paramzeterization whenself.convex_parameterizationis true, whilerhoalways returns the data-point-specific variances, embedded in a dense tensor. The dense representation is used to propagate gradients to the sparse rhos in the support.- Returns:
A
batch_shape x self.dim-dim Tensor of robustness variances.
- forward(X=None, shape=None, diag_K=None, **kwargs)[source]
Computes the covariance matrix of the sparse outlier noise model.
- Parameters:
X (Tensor | list[Tensor] | None) – The training inputs, used to determine if the model is applied to the training data, in which case the outlier variances are applied, or not. NOTE: By default, BoTorch passes the transformed training inputs to the likelihood during both training and inference.
shape (Size | None) – The shape of the covariance matrix, which is used to broadcast the rho values to the correct shape.
diag_K (Tensor | None) – The diagonal of the covariance matrix, which is used to scale the rho values in the convex parameterization.
kwargs (Any) – Any additional parameters of the base noise model, same as for GPyTorch’s noise model. Note that this implementation does not support non-kwarg
paramsarguments, which are used in GPyTorch’s noise models.
- Returns:
A
batch_shape x self.dim-dim Tensor of robustness variances.- Return type:
LinearOperator | Tensor
- expansion_objective(mll)[source]
Computes an objective value for all the inactive parameters, i.e. self.sparse_parameter[~self.is_active] since we can’t add already active parameters to the support. This value will be used to select the parameters.
- Parameters:
mll (ExactMarginalLogLikelihood) – The marginal likelihood, containing the model to optimize.
- Returns:
The expansion objective value for all the inactive parameters.
- Return type:
Tensor
Transforms
Outcome Transforms
Outcome transformations for automatically transforming and un-transforming
model outputs. Outcome transformations are typically part of a Model and
applied (i) within the model constructor to transform the train observations
to the model space, and (ii) in the Model.posterior call to untransform
the model posterior back to the original space.
References
D. Eriksson, M. Poloczek. Scalable Constrained Bayesian Optimization. International Conference on Artificial Intelligence and Statistics. PMLR, 2021, http://proceedings.mlr.press/v130/eriksson21a.html
- class botorch.models.transforms.outcome.OutcomeTransform(*args, **kwargs)[source]
Bases:
Module,ABCAbstract base class for outcome transforms.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- Parameters:
args (Any)
kwargs (Any)
- abstractmethod forward(Y, Yvar=None, X=None)[source]
Transform the outcomes in a model’s training targets
- Parameters:
Y (Tensor) – A
batch_shape x n x m-dim tensor of training targets.Yvar (Tensor | None) – A
batch_shape x n x m-dim tensor of observation noises associated with the training targets (if applicable).X (Tensor | None) – A
batch_shape x n x d-dim tensor of training inputs (if applicable).
- Returns:
The transformed outcome observations.
The transformed observation noise (if applicable).
- Return type:
A two-tuple with the transformed outcomes
- subset_output(idcs)[source]
Subset the transform along the output dimension.
This functionality is used to properly treat outcome transformations in the
subset_modelfunctionality.- Parameters:
idcs (list[int]) – The output indices to subset the transform to.
- Returns:
The current outcome transform, subset to the specified output indices.
- Return type:
- untransform(Y, Yvar=None, X=None)[source]
Un-transform previously transformed outcomes
- Parameters:
Y (Tensor) – A
batch_shape x n x m-dim tensor of transformed training targets.Yvar (Tensor | None) – A
batch_shape x n x m-dim tensor of transformed observation noises associated with the training targets (if applicable).X (Tensor | None) – A
batch_shape x n x d-dim tensor of training inputs (if applicable).
- Returns:
The un-transformed outcome observations.
The un-transformed observation noise (if applicable).
- Return type:
A two-tuple with the un-transformed outcomes
- class botorch.models.transforms.outcome.ChainedOutcomeTransform(**transforms)[source]
Bases:
OutcomeTransform,ModuleDictAn outcome transform representing the chaining of individual transforms
Chaining of outcome transforms.
- Parameters:
transforms (OutcomeTransform) – The transforms to chain. Internally, the names of the kwargs are used as the keys for accessing the individual transforms on the module.
- forward(Y, Yvar=None, X=None)[source]
Transform the outcomes in a model’s training targets
- Parameters:
Y (Tensor) – A
batch_shape x n x m-dim tensor of training targets.Yvar (Tensor | None) – A
batch_shape x n x m-dim tensor of observation noises associated with the training targets (if applicable).X (Tensor | None) – A
batch_shape x n x d-dim tensor of training inputs (if applicable).
- Returns:
The transformed outcome observations.
The transformed observation noise (if applicable).
- Return type:
A two-tuple with the transformed outcomes
- subset_output(idcs)[source]
Subset the transform along the output dimension.
- Parameters:
idcs (list[int]) – The output indices to subset the transform to.
- Returns:
The current outcome transform, subset to the specified output indices.
- Return type:
- untransform(Y, Yvar=None, X=None)[source]
Un-transform previously transformed outcomes
- Parameters:
Y (Tensor) – A
batch_shape x n x m-dim tensor of transformed training targets.Yvar (Tensor | None) – A
batch_shape x n x m-dim tensor of transformed observation noises associated with the training targets (if applicable).X (Tensor | None) – A
batch_shape x n x d-dim tensor of training inputs (if applicable).
- Returns:
The un-transformed outcome observations.
The un-transformed observation noise (if applicable).
- Return type:
A two-tuple with the un-transformed outcomes
- class botorch.models.transforms.outcome.Standardize(m, outputs=None, batch_shape=(), min_stdv=1e-08)[source]
Bases:
OutcomeTransformStandardize outcomes (zero mean, unit variance).
This module is stateful: If in train mode, calling forward updates the module state (i.e. the mean/std normalizing constants). If in eval mode, calling forward simply applies the standardization using the current module state.
Standardize outcomes (zero mean, unit variance).
- Parameters:
m (int) – The output dimension.
outputs (list[int] | None) – Which of the outputs to standardize. If omitted, all outputs will be standardized.
batch_shape (torch.Size) – The batch_shape of the training targets.
min_stdv (float) – The minimum standard deviation for which to perform standardization (if lower, only de-mean the data).
- forward(Y, Yvar=None, X=None)[source]
Standardize outcomes.
If the module is in train mode, this updates the module state (i.e. the mean/std normalizing constants). If the module is in eval mode, simply applies the normalization using the module state.
- Parameters:
Y (Tensor) – A
batch_shape x n x m-dim tensor of training targets.Yvar (Tensor | None) – A
batch_shape x n x m-dim tensor of observation noises associated with the training targets (if applicable).X (Tensor | None) – A
batch_shape x n x d-dim tensor of training inputs (if applicable). This argument is not used by this transform, but it is used by its subclass,StratifiedStandardize.
- Returns:
The transformed outcome observations.
The transformed observation noise (if applicable).
- Return type:
A two-tuple with the transformed outcomes
- subset_output(idcs)[source]
Subset the transform along the output dimension.
- Parameters:
idcs (list[int]) – The output indices to subset the transform to.
- Returns:
The current outcome transform, subset to the specified output indices.
- Return type:
- untransform(Y, Yvar=None, X=None)[source]
Un-standardize outcomes.
- Parameters:
Y (Tensor) – A
batch_shape x n x m-dim tensor of standardized targets.Yvar (Tensor | None) – A
batch_shape x n x m-dim tensor of standardized observation noises associated with the targets (if applicable).X (Tensor | None) – A
batch_shape x n x d-dim tensor of inputs (if applicable). This argument is not used by this transform, but it is used by its subclass,StratifiedStandardize.
- Returns:
The un-standardized outcome observations.
The un-standardized observation noise (if applicable).
- Return type:
A two-tuple with the un-standardized outcomes
- untransform_posterior(posterior, X=None)[source]
Un-standardize the posterior.
- Parameters:
posterior (Posterior) – A posterior in the standardized space.
X (Tensor | None) – A
batch_shape x n x d-dim tensor of inputs (if applicable). This argument is not used by this transform, but it is used by its subclass,StratifiedStandardize.
- Returns:
The un-standardized posterior. If the input posterior is a
GPyTorchPosteriororGaussianMixturePosterior, return the same type with analytically rescaled distribution. Otherwise, return aTransformedPosterior.- Return type:
- class botorch.models.transforms.outcome.StratifiedStandardize(stratification_idx, all_task_values, batch_shape=(), min_stdv=1e-08, dtype=torch.float64)[source]
Bases:
StandardizeStandardize outcomes (zero mean, unit variance) along stratification dimension.
This module is stateful: If in train mode, calling forward updates the module state (i.e. the mean/std normalizing constants). If in eval mode, calling forward simply applies the standardization using the current module state.
Standardize outcomes (zero mean, unit variance) along stratification dim.
Note: This currently only supports single output models (including multi-task models that have a single output).
- Parameters:
stratification_idx (int) – The index of the stratification dimension in the input tensor X.
all_task_values (Tensor) –
t-dim tensor of all possible task values that could appear in the dataset.batch_shape (torch.Size) – The batch_shape of the training targets.
min_stdv (float) – The minimum standard deviation for which to perform standardization (if lower, only de-mean the data).
dtype (torch.dtype) – The data type for internal computations.
- forward(Y, Yvar=None, X=None)[source]
Standardize outcomes.
If the module is in train mode, this updates the module state (i.e. the mean/std normalizing constants). If the module is in eval mode, simply applies the normalization using the module state.
- Parameters:
Y (Tensor) – A
batch_shape x n x m-dim tensor of training targets.Yvar (Tensor | None) – A
batch_shape x n x m-dim tensor of observation noises associated with the training targets (if applicable).X (Tensor | None) – A
batch_shape x n x d-dim tensor of input parameters.
- Returns:
The transformed outcome observations.
The transformed observation noise (if applicable).
- Return type:
A two-tuple with the transformed outcomes
- subset_output(idcs)[source]
Subset the transform along the output dimension.
- Parameters:
idcs (list[int]) – The output indices to subset the transform to.
- Returns:
The current outcome transform, subset to the specified output indices.
- Return type:
- untransform(Y, Yvar=None, X=None)[source]
Un-standardize outcomes.
- Parameters:
Y (Tensor) – A
batch_shape x n x m-dim tensor of standardized targets.Yvar (Tensor | None) – A
batch_shape x n x m-dim tensor of standardized observation noises associated with the targets (if applicable).X (Tensor | None) – A
batch_shape x n x d-dim tensor of input parameters.
- Returns:
The un-standardized outcome observations.
The un-standardized observation noise (if applicable).
- Return type:
A two-tuple with the un-standardized outcomes
- untransform_posterior(posterior, X=None)[source]
Un-standardize the posterior.
- Parameters:
posterior (Posterior) – A posterior in the standardized space.
X (Tensor | None) – A
batch_shape x n x d-dim tensor of training inputs (if applicable).
- Returns:
The un-standardized posterior. If the input posterior is a
GPyTorchPosteriororGaussianMixturePosterior, return the same type with analytically rescaled distribution. Otherwise, return aTransformedPosterior.- Return type:
- class botorch.models.transforms.outcome.Log(outputs=None)[source]
Bases:
OutcomeTransformLog-transform outcomes.
Useful if the targets are modeled using a (multivariate) log-Normal distribution. This means that we can use a standard GP model on the log-transformed outcomes and un-transform the model posterior of that GP.
When observation noise is provided, the variance is transformed using the delta method approximation: Var[log(Y)] ≈ Var[Y] / Y^2. This assumes that the observation noise is Gaussian in the log-transformed space, which corresponds to log-normal observation noise in the original space.
Log-transform outcomes.
- Parameters:
outputs (list[int] | None) – Which of the outputs to log-transform. If omitted, all outputs will be standardized.
- subset_output(idcs)[source]
Subset the transform along the output dimension.
- Parameters:
idcs (list[int]) – The output indices to subset the transform to.
- Returns:
The current outcome transform, subset to the specified output indices.
- Return type:
- forward(Y, Yvar=None, X=None)[source]
Log-transform outcomes.
- Parameters:
Y (Tensor) – A
batch_shape x n x m-dim tensor of training targets.Yvar (Tensor | None) – A
batch_shape x n x m-dim tensor of observation noises associated with the training targets (if applicable).X (Tensor | None) – A
batch_shape x n x d-dim tensor of training inputs (if applicable). This argument is not used by this transform.
- Returns:
The transformed outcome observations.
The transformed observation noise (if applicable).
- Return type:
A two-tuple with the transformed outcomes
- untransform(Y, Yvar=None, X=None)[source]
Un-transform log-transformed outcomes
- Parameters:
Y (Tensor) – A
batch_shape x n x m-dim tensor of log-transformed targets.Yvar (Tensor | None) – A
batch_shape x n x m-dim tensor of log- transformed observation noises associated with the training targets (if applicable).X (Tensor | None) – A
batch_shape x n x d-dim tensor of inputs (if applicable). This argument is not used by this transform.
- Returns:
The exponentiated outcome observations.
The exponentiated observation noise (if applicable).
- Return type:
A two-tuple with the un-transformed outcomes
- untransform_posterior(posterior, X=None)[source]
Un-transform the log-transformed posterior.
- Parameters:
posterior (Posterior) – A posterior in the log-transformed space.
X (Tensor | None) – A
batch_shape x n x d-dim tensor of inputs (if applicable). This argument is not used by this transform.
- Returns:
The un-transformed posterior.
- Return type:
- class botorch.models.transforms.outcome.Power(power, outputs=None)[source]
Bases:
OutcomeTransformPower-transform outcomes.
Useful if the targets are modeled using a (multivariate) power transform of a Normal distribution. This means that we can use a standard GP model on the power-transformed outcomes and un-transform the model posterior of that GP.
Power-transform outcomes.
- Parameters:
outputs (list[int] | None) – Which of the outputs to power-transform. If omitted, all outputs will be standardized.
power (float)
- subset_output(idcs)[source]
Subset the transform along the output dimension.
- Parameters:
idcs (list[int]) – The output indices to subset the transform to.
- Returns:
The current outcome transform, subset to the specified output indices.
- Return type:
- forward(Y, Yvar=None, X=None)[source]
Power-transform outcomes.
- Parameters:
Y (Tensor) – A
batch_shape x n x m-dim tensor of training targets.Yvar (Tensor | None) – A
batch_shape x n x m-dim tensor of observation noises associated with the training targets (if applicable).X (Tensor | None) – A
batch_shape x n x d-dim tensor of training inputs (if applicable). This argument is not used by this transform.
- Returns:
The transformed outcome observations.
The transformed observation noise (if applicable).
- Return type:
A two-tuple with the transformed outcomes
- untransform(Y, Yvar=None, X=None)[source]
Un-transform power-transformed outcomes
- Parameters:
Y (Tensor) – A
batch_shape x n x m-dim tensor of power-transformed targets.Yvar (Tensor | None) – A
batch_shape x n x m-dim tensor of power-transformed observation noises associated with the training targets (if applicable).X (Tensor | None) – A
batch_shape x n x d-dim tensor of inputs (if applicable). This argument is not used by this transform.
- Returns:
The un-power transformed outcome observations.
The un-power transformed observation noise (if applicable).
- Return type:
A two-tuple with the un-transformed outcomes
- untransform_posterior(posterior, X=None)[source]
Un-transform the power-transformed posterior.
- Parameters:
posterior (Posterior) – A posterior in the power-transformed space.
X (Tensor | None) – A
batch_shape x n x d-dim tensor of inputs (if applicable). This argument is not used by this transform.
- Returns:
The un-transformed posterior.
- Return type:
- class botorch.models.transforms.outcome.Bilog(outputs=None)[source]
Bases:
OutcomeTransformBilog-transform outcomes.
The Bilog transform [eriksson2021scalable] is useful for modeling outcome constraints as it magnifies values near zero and flattens extreme values.
Bilog-transform outcomes.
- Parameters:
outputs (list[int] | None) – Which of the outputs to Bilog-transform. If omitted, all outputs will be transformed.
- subset_output(idcs)[source]
Subset the transform along the output dimension.
- Parameters:
idcs (list[int]) – The output indices to subset the transform to.
- Returns:
The current outcome transform, subset to the specified output indices.
- Return type:
- forward(Y, Yvar=None, X=None)[source]
Bilog-transform outcomes.
- Parameters:
Y (Tensor) – A
batch_shape x n x m-dim tensor of training targets.Yvar (Tensor | None) – A
batch_shape x n x m-dim tensor of observation noises associated with the training targets (if applicable).X (Tensor | None) – A
batch_shape x n x d-dim tensor of training inputs (if applicable). This argument is not used by this transform.
- Returns:
The transformed outcome observations.
The transformed observation noise (if applicable).
- Return type:
A two-tuple with the transformed outcomes
- untransform(Y, Yvar=None, X=None)[source]
Un-transform bilog-transformed outcomes
- Parameters:
Y (Tensor) – A
batch_shape x n x m-dim tensor of bilog-transformed targets.Yvar (Tensor | None) – A
batch_shape x n x m-dim tensor of bilog-transformed observation noises associated with the training targets (if applicable).X (Tensor | None) – A
batch_shape x n x d-dim tensor of inputs (if applicable). This argument is not used by this transform.
- Returns:
The un-transformed outcome observations.
The un-transformed observation noise (if applicable).
- Return type:
A two-tuple with the un-transformed outcomes
- untransform_posterior(posterior, X=None)[source]
Un-transform the bilog-transformed posterior.
- Parameters:
posterior (Posterior) – A posterior in the bilog-transformed space.
X (Tensor | None) – A
batch_shape x n x d-dim tensor of inputs (if applicable). This argument is not used by this transform.
- Returns:
The un-transformed posterior.
- Return type:
Input Transforms
Input Transformations.
These classes implement a variety of transformations for input parameters including: learned input warping functions, rounding functions, and log transformations. The input transformation is typically part of a Model and applied within the model.forward() method.
- class botorch.models.transforms.input.InputTransform(*args, **kwargs)[source]
Bases:
Module,ABCAbstract base class for input transforms.
- Properties:
- is_one_to_many: A boolean denoting whether the transform produces
multiple values for each input.
- transform_on_train: A boolean indicating whether to apply the
transform in train() mode.
- transform_on_eval: A boolean indicating whether to apply the
transform in eval() mode.
- transform_on_fantasize: A boolean indicating whether to apply
the transform when called from within a
fantasizecall.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- Parameters:
args (Any)
kwargs (Any)
- is_one_to_many: bool = False
- transform_on_eval: bool
- transform_on_train: bool
- transform_on_fantasize: bool
- forward(X)[source]
Transform the inputs to a model.
- Parameters:
X (Tensor) – A
batch_shape x n x d-dim tensor of inputs.- Returns:
A
batch_shape x n' x d-dim tensor of transformed inputs.- Return type:
Tensor
- abstractmethod transform(X)[source]
Transform the inputs to a model.
- Parameters:
X (Tensor) – A
batch_shape x n x d-dim tensor of inputs.- Returns:
A
batch_shape x n x d-dim tensor of transformed inputs.- Return type:
Tensor
- untransform(X)[source]
Un-transform the inputs to a model.
- Parameters:
X (Tensor) – A
batch_shape x n x d-dim tensor of transformed inputs.- Returns:
A
batch_shape x n x d-dim tensor of un-transformed inputs.- Return type:
Tensor
- equals(other)[source]
Check if another input transform is equivalent.
Note: The reason that a custom equals method is defined rather than defining an __eq__ method is because defining an __eq__ method sets the __hash__ method to None. Hashing modules is currently used in pytorch. See https://github.com/pytorch/pytorch/issues/7733.
- Parameters:
other (InputTransform) – Another input transform.
- Returns:
A boolean indicating if the other transform is equivalent.
- Return type:
bool
- preprocess_transform(X)[source]
Apply transforms for preprocessing inputs.
The main use cases for this method are 1) to preprocess training data before calling
set_train_dataand 2) preprocessX_baselinefor noisy acquisition functions so thatX_baselineis “preprocessed” with the same transformations as the cached training inputs.- Parameters:
X (Tensor) – A
batch_shape x n x d-dim tensor of inputs.- Returns:
A
batch_shape x n x d-dim tensor of (transformed) inputs.- Return type:
Tensor
- class botorch.models.transforms.input.BatchBroadcastedInputTransform(transforms, broadcast_index=-3)[source]
Bases:
InputTransform,ModuleDictAn input transform representing a list of transforms to be broadcasted.
A transform list that is broadcasted across a batch dimension specified by
broadcast_index. This is allows using a batched Gaussian process model when the input transforms are different for different batch dimensions.- Parameters:
transforms (list[InputTransform]) – The transforms to broadcast across the first batch dimension. The transform at position i in the list will be applied to
X[i]for a given input tensorXin the forward pass.broadcast_index (int) – The tensor index at which the transforms are broadcasted.
Example
>>> tf1 = Normalize(d=2) >>> tf2 = InputStandardize(d=2) >>> tf = BatchBroadcastedTransformList(transforms=[tf1, tf2])
- transform(X)[source]
Transform the inputs to a model.
Individual transforms are applied in sequence and results are returned as a batched tensor.
- Parameters:
X (Tensor) – A
batch_shape x n x d-dim tensor of inputs.- Returns:
A
batch_shape x n x d-dim tensor of transformed inputs.- Return type:
Tensor
- untransform(X)[source]
Un-transform the inputs to a model.
Un-transforms of the individual transforms are applied in reverse sequence.
- Parameters:
X (Tensor) – A
batch_shape x n x d-dim tensor of transformed inputs.- Returns:
A
batch_shape x n x d-dim tensor of un-transformed inputs.- Return type:
Tensor
- equals(other)[source]
Check if another input transform is equivalent.
- Parameters:
other (InputTransform) – Another input transform.
- Returns:
A boolean indicating if the other transform is equivalent.
- Return type:
bool
- preprocess_transform(X)[source]
Apply transforms for preprocessing inputs.
The main use cases for this method are 1) to preprocess training data before calling
set_train_dataand 2) preprocessX_baselinefor noisy acquisition functions so thatX_baselineis “preprocessed” with the same transformations as the cached training inputs.- Parameters:
X (Tensor) – A
batch_shape x n x d-dim tensor of inputs.- Returns:
A
batch_shape x n x d-dim tensor of (transformed) inputs.- Return type:
Tensor
- class botorch.models.transforms.input.ChainedInputTransform(**transforms)[source]
Bases:
InputTransform,ModuleDictAn input transform representing the chaining of individual transforms.
Chaining of input transforms.
- Parameters:
transforms (InputTransform) – The transforms to chain. Internally, the names of the kwargs are used as the keys for accessing the individual transforms on the module.
Example
>>> tf1 = Normalize(d=2) >>> tf2 = Normalize(d=2) >>> tf = ChainedInputTransform(tf1=tf1, tf2=tf2) >>> list(tf.keys()) ['tf1', 'tf2'] >>> tf["tf1"] Normalize()
- transform(X)[source]
Transform the inputs to a model.
Individual transforms are applied in sequence.
- Parameters:
X (Tensor) – A
batch_shape x n x d-dim tensor of inputs.- Returns:
A
batch_shape x n x d-dim tensor of transformed inputs.- Return type:
Tensor
- untransform(X)[source]
Un-transform the inputs to a model.
Un-transforms of the individual transforms are applied in reverse sequence.
- Parameters:
X (Tensor) – A
batch_shape x n x d-dim tensor of transformed inputs.- Returns:
A
batch_shape x n x d-dim tensor of un-transformed inputs.- Return type:
Tensor
- equals(other)[source]
Check if another input transform is equivalent.
- Parameters:
other (InputTransform) – Another input transform.
- Returns:
A boolean indicating if the other transform is equivalent.
- Return type:
bool
- preprocess_transform(X)[source]
Apply transforms for preprocessing inputs.
The main use cases for this method are 1) to preprocess training data before calling
set_train_dataand 2) preprocessX_baselinefor noisy acquisition functions so thatX_baselineis “preprocessed” with the same transformations as the cached training inputs.- Parameters:
X (Tensor) – A
batch_shape x n x d-dim tensor of inputs.- Returns:
A
batch_shape x n x d-dim tensor of (transformed) inputs.- Return type:
Tensor
- class botorch.models.transforms.input.ReversibleInputTransform(*args, **kwargs)[source]
Bases:
InputTransform,ABCAn abstract class for a reversible input transform.
- Properties:
- reverse: A boolean indicating if the functionality of transform
and untransform methods should be swapped.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- Parameters:
args (Any)
kwargs (Any)
- reverse: bool
- transform(X)[source]
Transform the inputs.
- Parameters:
X (Tensor) – A
batch_shape x n x d-dim tensor of inputs.- Returns:
A
batch_shape x n x d-dim tensor of transformed inputs.- Return type:
Tensor
- untransform(X)[source]
Un-transform the inputs.
- Parameters:
X (Tensor) – A
batch_shape x n x d-dim tensor of inputs.- Returns:
A
batch_shape x n x d-dim tensor of un-transformed inputs.- Return type:
Tensor
- equals(other)[source]
Check if another input transform is equivalent.
- Parameters:
other (InputTransform) – Another input transform.
- Returns:
A boolean indicating if the other transform is equivalent.
- Return type:
bool
- class botorch.models.transforms.input.AffineInputTransform(d, coefficient, offset, indices=None, batch_shape=(), transform_on_train=True, transform_on_eval=True, transform_on_fantasize=True, reverse=False)[source]
Bases:
ReversibleInputTransformApply affine transformation to input:
output = (input - offset) / coefficient- Parameters:
d (int) – The dimension of the input space.
coefficient (Tensor) – Tensor of linear coefficients, shape must to be broadcastable with
(batch_shape x n x d)-dim input tensors.offset (Tensor) – Tensor of offset coefficients, shape must to be broadcastable with
(batch_shape x n x d)-dim input tensors.indices (list[int] | Tensor | None) – The indices of the inputs to transform. If omitted, take all dimensions of the inputs into account. Either a list of ints or a Tensor of type
torch.long.batch_shape (torch.Size) – The batch shape of the inputs (assuming input tensors of shape
batch_shape x n x d). If provided, perform individual transformation per batch, otherwise uses a single transformation.transform_on_train (bool) – A boolean indicating whether to apply the transform in train() mode. Default: True.
transform_on_eval (bool) – A boolean indicating whether to apply the transform in eval() mode. Default: True.
transform_on_fantasize (bool) – A boolean indicating whether to apply the transform when called from within a
fantasizecall. Default: True.reverse (bool) – A boolean indicating whether the forward pass should untransform the inputs.
- property coefficient: Tensor
The tensor of linear coefficients.
- property offset: Tensor
The tensor of offset coefficients.
- property learn_coefficients: bool
- equals(other)[source]
Check if another input transform is equivalent.
- Parameters:
other (InputTransform) – Another input transform.
- Returns:
A boolean indicating if the other transform is equivalent.
- Return type:
bool
- class botorch.models.transforms.input.Normalize(d, indices=None, bounds=None, batch_shape=(), transform_on_train=True, transform_on_eval=True, transform_on_fantasize=True, reverse=False, min_range=1e-08, learn_bounds=None, almost_zero=1e-12, center=0.5)[source]
Bases:
AffineInputTransformNormalize the inputs have unit range and be centered at 0.5 (by default).
If no explicit bounds are provided this module is stateful: If in train mode, calling
forwardupdates the module state (i.e. the normalizing bounds). If in eval mode, callingforwardsimply applies the normalization using the current module state.Normalize the inputs to the unit cube.
- Parameters:
d (int) – The dimension of the input space.
indices (list[int] | Tensor | None) – The indices of the inputs to normalize. If omitted, take all dimensions of the inputs into account.
bounds (Tensor | None) – If provided, use these bounds to normalize the inputs. If omitted, learn the bounds in train mode.
batch_shape (torch.Size) – The batch shape of the inputs (assuming input tensors of shape
batch_shape x n x d). If provided, perform individual normalization per batch, otherwise uses a single normalization.transform_on_train (bool) – A boolean indicating whether to apply the transforms in train() mode. Default: True.
transform_on_eval (bool) – A boolean indicating whether to apply the transform in eval() mode. Default: True.
transform_on_fantasize (bool) – A boolean indicating whether to apply the transform when called from within a
fantasizecall. Default: True.reverse (bool) – A boolean indicating whether the forward pass should untransform the inputs.
min_range (float) – If the range of an input dimension is smaller than
min_range, that input dimension will not be normalized. This is equivalent to using bounds of[0, 1]for this dimension, and helps avoid division by zero errors and related numerical issues. See the example below. NOTE: This only applies iflearn_bounds=True.learn_bounds (bool | None) – Whether to learn the bounds in train mode. Defaults to False if bounds are provided, otherwise defaults to True.
almost_zero (float) – Threshold for determining if a range is effectively zero when learning bounds. Default: 1e-12.
center (float) – The center of the range for each parameter. Default: 0.5.
Example
>>> t = Normalize(d=2) >>> t(torch.tensor([[3., 2.], [3., 6.]])) ... tensor([[3., 2.], ... [3., 6.]]) >>> t.eval() ... Normalize() >>> t(torch.tensor([[3.5, 2.8]])) ... tensor([[3.5, 0.2]]) >>> t.bounds ... tensor([[0., 2.], ... [1., 6.]]) >>> t.coefficient ... tensor([[1., 4.]])
- property ranges
- property mins
- property bounds: Tensor
The bounds used for normalizing the inputs.
- property learn_bounds: bool
- class botorch.models.transforms.input.InputStandardize(d, indices=None, batch_shape=(), transform_on_train=True, transform_on_eval=True, transform_on_fantasize=True, reverse=False, min_std=1e-08)[source]
Bases:
AffineInputTransformStandardize inputs (zero mean, unit variance).
In train mode, calling
forwardupdates the module state (i.e. the mean/std normalizing constants). If in eval mode, callingforwardsimply applies the standardization using the current module state.Standardize inputs (zero mean, unit variance).
- Parameters:
d (int) – The dimension of the input space.
indices (list[int] | Tensor | None) – The indices of the inputs to standardize. If omitted, take all dimensions of the inputs into account.
batch_shape (torch.Size) – The batch shape of the inputs (asssuming input tensors of shape
batch_shape x n x d). If provided, perform individual normalization per batch, otherwise uses a single normalization.transform_on_train (bool) – A boolean indicating whether to apply the transforms in train() mode. Default: True
transform_on_eval (bool) – A boolean indicating whether to apply the transform in eval() mode. Default: True
reverse (bool) – A boolean indicating whether the forward pass should untransform the inputs.
min_std (float) – If the standard deviation of an input dimension is smaller than
min_std, that input dimension will not be standardized. This is equivalent to using a standard deviation of 1.0 and a mean of 0.0 for this dimension, and helps avoid division by zero errors and related numerical issues.transform_on_fantasize (bool)
- property stds
- property means
- class botorch.models.transforms.input.Round(integer_indices=None, categorical_features=None, transform_on_train=True, transform_on_eval=True, transform_on_fantasize=True, approximate=False, tau=0.001)[source]
Bases:
InputTransformA discretization transformation for discrete inputs.
If
approximate=False(the default), uses PyTorch’sround.If
approximate=True, a differentiable approximate rounding function is used, with a temperature parameter oftau. This method is a piecewise approximation of a rounding function where each piece is a hyperbolic tangent function.For integers, this will typically be used in conjunction with normalization as follows:
In eval() mode (i.e. after training), the inputs pass would typically be normalized to the unit cube (e.g. during candidate optimization). 1. These are unnormalized back to the raw input space. 2. The integers are rounded. 3. All values are normalized to the unit cube.
In train() mode, the inputs can either (a) be normalized to the unit cube or (b) provided using their raw values. In the case of (a) transform_on_train should be set to True, so that the normalized inputs are unnormalized before rounding. In the case of (b) transform_on_train should be set to False, so that the raw inputs are rounded and then normalized to the unit cube.
By default, the straight through estimators are used for the gradients as proposed in [Daulton2022bopr]. This transformation supports differentiable approximate rounding (currently only for integers). The rounding function is approximated with a piece-wise function where each piece is a hyperbolic tangent function.
For categorical parameters, the input must be one-hot encoded.
Example
>>> bounds = torch.tensor([[0, 5], [0, 1], [0, 1]]).t() >>> integer_indices = [0] >>> categorical_features = {1: 2} >>> unnormalize_tf = Normalize( >>> d=d, >>> bounds=bounds, >>> transform_on_eval=True, >>> transform_on_train=True, >>> reverse=True, >>> ) >>> round_tf = Round(integer_indices, categorical_features) >>> normalize_tf = Normalize(d=d, bounds=bounds) >>> tf = ChainedInputTransform( >>> tf1=unnormalize_tf, tf2=round_tf, tf3=normalize_tf >>> )
Initialize transform.
- Parameters:
integer_indices (list[int] | LongTensor | None) – The indices of the integer inputs.
categorical_features (dict[int, int] | None) – A dictionary mapping the starting index of each categorical feature to its cardinality. This assumes that categoricals are one-hot encoded.
transform_on_train (bool) – A boolean indicating whether to apply the transforms in train() mode. Default: True.
transform_on_eval (bool) – A boolean indicating whether to apply the transform in eval() mode. Default: True.
transform_on_fantasize (bool) – A boolean indicating whether to apply the transform when called from within a
fantasizecall. Default: True.approximate (bool) – A boolean indicating whether approximate or exact rounding should be used. Default: False.
tau (float) – The temperature parameter for approximate rounding.
- transform(X)[source]
Discretize the inputs.
- Parameters:
X (Tensor) – A
batch_shape x n x d-dim tensor of inputs.- Returns:
A
batch_shape x n x d-dim tensor of discretized inputs.- Return type:
Tensor
- equals(other)[source]
Check if another input transform is equivalent.
- Parameters:
other (InputTransform) – Another input transform.
- Returns:
A boolean indicating if the other transform is equivalent.
- Return type:
bool
- class botorch.models.transforms.input.Log10(indices, transform_on_train=True, transform_on_eval=True, transform_on_fantasize=True, reverse=False)[source]
Bases:
ReversibleInputTransformA base-10 log transformation.
Initialize transform.
- Parameters:
indices (list[int]) – The indices of the inputs to log transform.
transform_on_train (bool) – A boolean indicating whether to apply the transforms in train() mode. Default: True.
transform_on_eval (bool) – A boolean indicating whether to apply the transform in eval() mode. Default: True.
transform_on_fantasize (bool) – A boolean indicating whether to apply the transform when called from within a
fantasizecall. Default: True.reverse (bool) – A boolean indicating whether the forward pass should untransform the inputs.
- class botorch.models.transforms.input.Warp(d, indices, transform_on_train=True, transform_on_eval=True, transform_on_fantasize=True, reverse=False, eps=1e-07, concentration1_prior=None, concentration0_prior=None, batch_shape=None, bounds=None)[source]
Bases:
ReversibleInputTransform,ModuleA transform that uses learned input warping functions.
Each specified input dimension is warped using the CDF of a Kumaraswamy distribution. Typically, MAP estimates of the parameters of the Kumaraswamy distribution, for each input dimension, are learned jointly with the GP hyperparameters.
TODO: implement support using independent warping functions for each output in batched multi-output and multi-task models.
For now, ModelListGPs should be used to learn independent warping functions for each output.
Initialize transform.
- Parameters:
indices (list[int]) – The indices of the inputs to warp.
transform_on_train (bool) – A boolean indicating whether to apply the transforms in train() mode. Default: True.
transform_on_eval (bool) – A boolean indicating whether to apply the transform in eval() mode. Default: True.
transform_on_fantasize (bool) – A boolean indicating whether to apply the transform when called from within a
fantasizecall. Default: True.reverse (bool) – A boolean indicating whether the forward pass should untransform the inputs.
eps (float) – A small value used to clip values to be in the interval (0, 1).
concentration1_prior (Prior | None) – A prior distribution on the concentration1 parameter of the Kumaraswamy distribution.
concentration0_prior (Prior | None) – A prior distribution on the concentration0 parameter of the Kumaraswamy distribution.
batch_shape (torch.Size | None) – An optional batch shape, for learning independent warping parameters for each batch of inputs. This should match the input batch shape of the model (i.e.,
train_X.shape[:-2]). NOTE: This is only supported for single-output models.bounds (Tensor | None) – A
2 x d-dim tensor of lower and upper bounds for the inputs.d (int)
- class botorch.models.transforms.input.AppendFeatures(feature_set=None, f=None, indices=None, fkwargs=None, skip_expand=False, transform_on_train=False, transform_on_eval=True, transform_on_fantasize=False)[source]
Bases:
InputTransformA transform that appends the input with a given set of features either provided beforehand or generated on the fly via a callable.
As an example, the predefined set of features can be used with
RiskMeasureMCObjectiveto optimize risk measures as described in [Cakmak2020risk]. A tutorial notebook implementing the rhoKG acquisition function introduced in [Cakmak2020risk] can be found at https://botorch.org/docs/tutorials/risk_averse_bo_with_environmental_variables.The steps for using this to obtain samples of a risk measure are as follows:
Train a model on
(x, w)inputs and the corresponding observations;Pass in an instance of
AppendFeatureswith thefeature_setdenoting the samples ofWas theinput_transformto the trained model;Call
posterior(...).rsample(...)on the model withxinputs only to get the joint posterior samples over(x, w)``s, where the ``w``s come from the ``feature_set;Pass these posterior samples through the
RiskMeasureMCObjectiveof choice to get the samples of the risk measure.
Note: The samples of the risk measure obtained this way are in general biased since the
feature_setdoes not fully represent the distribution of the environmental variable.Possible examples for using a callable include statistical models that are built on PyTorch, built-in mathematical operations such as torch.sum, or custom scripted functions. By this, this input transform allows for advanced feature engineering and transfer learning models within the optimization loop.
Example
>>> # We consider 1D ``x`` and 1D ``w``, with ``W`` having a >>> # uniform distribution over [0, 1] >>> model = SingleTaskGP( ... train_X=torch.rand(10, 2), ... train_Y=torch.randn(10, 1), ... input_transform=AppendFeatures(feature_set=torch.rand(10, 1)) ... ) >>> mll = ExactMarginalLogLikelihood(model.likelihood, model) >>> fit_gpytorch_mll(mll) >>> test_x = torch.rand(3, 1) >>> # ``posterior_samples`` is a ``10 x 30 x 1``-dim tensor >>> posterior_samples = model.posterior(test_x).rsamples(torch.size([10])) >>> risk_measure = VaR(alpha=0.8, n_w=10) >>> # ``risk_measure_samples`` is a ``10 x 3``-dim tensor of samples of the >>> # risk measure VaR >>> risk_measure_samples = risk_measure(posterior_samples)
Append
feature_setto each input or generate a set of features to append on the fly via a callable.- Parameters:
feature_set (Tensor | None) – An
n_f x d_f-dim tensor denoting the features to be appended to the inputs. Default: None.f (Callable[[Tensor], Tensor] | None) – A callable mapping a
batch_shape x q x d-dim input tensorXto abatch_shape x q x n_f x d_f-dimensional output tensor. Default: None.indices (list[int] | None) – List of indices denoting the indices of the features to be passed into f. Per default all features are passed to
f. Default: None.fkwargs (dict[str, Any] | None) – Dictionary of keyword arguments passed to the callable
f. Default: None.skip_expand (bool) – A boolean indicating whether to expand the input tensor before appending features. This is intended for use with an
InputPerturbation. IfTrue, the input tensor will be expected to be of shapebatch_shape x (q * n_f) x d. Not implemented in combination with a callable.transform_on_train (bool) – A boolean indicating whether to apply the transforms in train() mode. Default: False.
transform_on_eval (bool) – A boolean indicating whether to apply the transform in eval() mode. Default: True.
transform_on_fantasize (bool) – A boolean indicating whether to apply the transform when called from within a
fantasizecall. Default: False.
- is_one_to_many: bool = True
- transform(X)[source]
Transform the inputs by appending
feature_setto each input or by generating a set of features to be appended on the fly via a callable.For each
1 x d-dim element in the input tensor, this will produce ann_f x (d + d_f)-dim tensor withfeature_setappended as the lastd_fdimensions. For a genericbatch_shape x q x d-dimX, this translates to abatch_shape x (q * n_f) x (d + d_f)-dim output, where the values corresponding toX[..., i, :]are found inoutput[..., i * n_f: (i + 1) * n_f, :].Note: Adding the
feature_seton theq-batchdimension is necessary to avoid introducing additional bias by evaluating the inputs on independent GP sample paths.- Parameters:
X (Tensor) – A
batch_shape x q x d-dim tensor of inputs. Ifself.skip_expandisTrue, thenXshould be of shapebatch_shape x (q * n_f) x d, typically obtained by passing abatch_shape x q x dshape input through anInputPerturbationwithn_fperturbation values.- Returns:
A
batch_shape x (q * n_f) x (d + d_f)-dim tensor of appended inputs.- Return type:
Tensor
- class botorch.models.transforms.input.InteractionFeatures(indices=None)[source]
Bases:
AppendFeaturesA transform that appends the first-order interaction terms $x_i * x_j, i < j$, for all or a subset of the input variables.
Initializes the InteractionFeatures transform.
- Parameters:
indices (list[int] | None) – Indices of the subset of dimensions to compute interaction features on.
- class botorch.models.transforms.input.FilterFeatures(feature_indices, transform_on_train=True, transform_on_eval=True, transform_on_fantasize=True)[source]
Bases:
InputTransformA transform that filters the input with a given set of features indices.
As an example, this can be used in a multiobjective optimization with
ModelListGPin which the specific models only share subsets of features (feature selection). A reason could be that it is known that specific features do not have any impact on a specific objective but they need to be included in the model for another one.Filter features from a model.
- Parameters:
feature_indices (Tensor) – An one-dim tensor denoting the indices of the features to be kept and fed to the model.
transform_on_train (bool) – A boolean indicating whether to apply the transforms in train() mode. Default: True.
transform_on_eval (bool) – A boolean indicating whether to apply the transform in eval() mode. Default: True.
transform_on_fantasize (bool) – A boolean indicating whether to apply the transform when called from within a
fantasizecall. Default: True.
- transform(X)[source]
Transform the inputs by keeping only the in
feature_indicesspecified feature indices and filtering out the others.- Parameters:
X (Tensor) – A
batch_shape x q x d-dim tensor of inputs.- Returns:
- A
batch_shape x q x e-dim tensor of filtered inputs, where
eis the length offeature_indices.
- A
- Return type:
Tensor
- equals(other)[source]
Check if another input transform is equivalent.
- Parameters:
other (InputTransform) – Another input transform
- Returns:
A boolean indicating if the other transform is equivalent.
- Return type:
bool
- class botorch.models.transforms.input.InputPerturbation(perturbation_set, bounds=None, indices=None, multiplicative=False, transform_on_train=False, transform_on_eval=True, transform_on_fantasize=False)[source]
Bases:
InputTransformA transform that adds the set of perturbations to the given input.
Similar to
AppendFeatures, this can be used withRiskMeasureMCObjectiveto optimize risk measures. SeeAppendFeaturesfor additional discussion on optimizing risk measures.A tutorial notebook using this with
qNoisyExpectedImprovementcan be found at https://botorch.org/docs/tutorials/risk_averse_bo_with_input_perturbations.Add
perturbation_setto each input.- Parameters:
perturbation_set (Tensor | Callable[[Tensor], Tensor]) – An
n_p x d-dim tensor denoting the perturbations to be added to the inputs. Alternatively, this can be a callable that returnsbatch x n_p x d-dim tensor of perturbations for input of shapebatch x d. This is useful for heteroscedastic perturbations.bounds (Tensor | None) – A
2 x d-dim tensor of lower and upper bounds for each column of the input. If given, the perturbed inputs will be clamped to these bounds.indices (list[int] | None) – A list of indices specifying a subset of inputs on which to apply the transform. Note that
len(indices)should be equal to the second dimension ofperturbation_setandbounds. The dimensionality of the inputX.shape[-1]can be larger if we only transform a subset.multiplicative (bool) – A boolean indicating whether the input perturbations are additive or multiplicative. If True, inputs will be multiplied with the perturbations.
transform_on_train (bool) – A boolean indicating whether to apply the transforms in train() mode. Default: False.
transform_on_eval (bool) – A boolean indicating whether to apply the transform in eval() mode. Default: True.
transform_on_fantasize (bool) – A boolean indicating whether to apply the transform when called from within a
fantasizecall. Default: False.
- is_one_to_many: bool = True
- transform(X)[source]
Transform the inputs by adding
perturbation_setto each input.For each
1 x d-dim element in the input tensor, this will produce ann_p x d-dim tensor with theperturbation_setadded to the input. For a genericbatch_shape x q x d-dimX, this translates to abatch_shape x (q * n_p) x d-dim output, where the values corresponding toX[..., i, :]are found inoutput[..., i * n_w: (i + 1) * n_w, :].Note: Adding the
perturbation_seton theq-batchdimension is necessary to avoid introducing additional bias by evaluating the inputs on independent GP sample paths.- Parameters:
X (Tensor) – A
batch_shape x q x d-dim tensor of inputs.- Returns:
A
batch_shape x (q * n_p) x d-dim tensor of perturbed inputs.- Return type:
Tensor
- property batch_shape
Returns a shape tuple such that
subset_transformpre-allocates a (b x n_p x n x d) - dim tensor, wherebis the batch shape of the inputXof the transform andn_pis the number of perturbations. NOTE: this function is dependent on calling_expanded_perturbations(X)becausen_pis inaccessible otherwise ifperturbation_setis a function.
- class botorch.models.transforms.input.NumericToCategoricalEncoding(dim, categorical_features, encoders, transform_on_train=True, transform_on_eval=True, transform_on_fantasize=True)[source]
Bases:
InputTransformTransform categorical parameters from an integer/numeric representation to a vector based representation like one-hot encoding or a descriptor encoding.
The vector encoding is inserted at the position of the categorical feature in the input tensor. This is demonstrated in the example below in which a categorical feature of cardinality 3 at position 1 in the original representation is one-hot encoded.
Example
>>> import torch >>> from torch.nn.functional import one_hot >>> from functools import partial >>> from botorch.models.transforms.input import NumericToCategoricalEncoding >>> tf = NumericToCategoricalEncoding( ... dim=3, ... categorical_features={1: 3}, ... encoders={1: partial(one_hot, num_classes=3)}, ... ) >>> X = torch.tensor([[0.5, 2, 1.2], [1.1, 0, 0.8]]) >>> tf.transform(X) tensor([[0.5000, 0.0000, 0.0000, 1.0000, 1.2000], [1.1000, 1.0000, 0.0000, 0.0000, 0.8000]])
Initialize.
- Parameters:
dim (int) – The dimension of the numerically encoded input.
categorical_features (dict[int, int]) – A dictionary mapping the index of each categorical feature to its cardinality which has to be greater than 1. This assumes that categoricals are integer encoded.
encoders (dict[int, Callable[[Tensor], Tensor]]) – A dictionary mapping the index of each categorical feature to a callable that encodes the categorical feature into a vector representation.
transform_on_train (bool) – A boolean indicating whether to apply the transforms in train() mode. Default: False.
transform_on_eval (bool) – A boolean indicating whether to apply the transform in eval() mode. Default: True.
transform_on_fantasize (bool) – A boolean indicating whether to apply the transform when called from within a
fantasizecall. Default: False.
- transform(X)[source]
Transform the categorical inputs into a vector representation.
- Parameters:
X (Tensor) – A
batch_shape x n x d-dim tensor of inputs.- Returns:
A
batch_shape x n x d'-dim tensor withd' = d + sum(categorical_features.values())in which the integer-encoded categoricals are transformed to a vector representation.- Return type:
Tensor
- equals(other)[source]
Check if another input transform is equivalent.
- Parameters:
other (InputTransform) – Another input transform.
- Returns:
A boolean indicating if the other transform is equivalent.
- Return type:
bool
- class botorch.models.transforms.input.OneHotToNumeric(dim, categorical_features=None, transform_on_train=True, transform_on_eval=True, transform_on_fantasize=True)[source]
Bases:
InputTransformTransform categorical parameters from a one-hot to a numeric representation.
Initialize.
- Parameters:
dim (int) – The dimension of the one-hot-encoded input.
categorical_features (dict[int, int] | None) – A dictionary mapping the starting index of each categorical feature to its cardinality. This assumes that categoricals are one-hot encoded.
transform_on_train (bool) – A boolean indicating whether to apply the transforms in train() mode. Default: False.
transform_on_eval (bool) – A boolean indicating whether to apply the transform in eval() mode. Default: True.
transform_on_fantasize (bool) – A boolean indicating whether to apply the transform when called from within a
fantasizecall. Default: False.
- transform(X)[source]
Transform the categorical inputs into integer representation.
- Parameters:
X (Tensor) – A
batch_shape x n x d-dim tensor of inputs.- Returns:
A
batch_shape x n x d'-dim tensor of where the one-hot encoded categoricals are transformed to integer representation.- Return type:
Tensor
- untransform(X)[source]
Transform the categoricals from integer representation to one-hot.
- Parameters:
X (Tensor) – A
batch_shape x n x d'-dim tensor of transformed inputs, where the categoricals are represented as integers.- Returns:
A
batch_shape x n x d-dim tensor of inputs, where the categoricals have been transformed to one-hot representation.- Return type:
Tensor
- equals(other)[source]
Check if another input transform is equivalent.
- Parameters:
other (InputTransform) – Another input transform.
- Returns:
A boolean indicating if the other transform is equivalent.
- Return type:
bool
- class botorch.models.transforms.input.LearnedFeatureImputation(feature_indices, d, task_feature_index=-1, target_task=None, bounds=None, transform_on_train=True, transform_on_eval=True, transform_on_fantasize=True, dtype=torch.float64, device=None)[source]
Bases:
InputTransform,ModuleAn input transform that learns imputation values for missing features in heterogeneous multi-task settings.
In multi-task problems where different tasks observe different subsets of features, this transform fills in the unobserved feature columns with learned parameter values. This enables using a standard
MultiTaskGPwith composable input transforms instead of specialized model classes.The input tensor
Xis expected to have shapebatch_shape x n x (d+1), where the column attask_feature_indexcontains the task identifier. For each task, feature columns not listed infeature_indices[task_value]are replaced with the corresponding learned imputation values. Task values need not be contiguous or 0-indexed.Initialize LearnedFeatureImputation.
- Parameters:
feature_indices (dict[int, list[int]]) – A mapping from integer task values (as they appear in the task column of
X) to lists of observed X-column indices for that task. Indices refer directly to columns of the input tensorXand must not include the task column. Whentask_feature_index=-1(the common case), thedfeature columns are0, 1, ..., d-1. Task values need not be contiguous or 0-indexed.d (int) – The total number of feature columns (excluding the task column).
task_feature_index (int) – The column index in
Xthat contains the task identifier. Must be-1(last column). Defaults to-1.target_task (int | None) – The task identifier to use when
Xhasdcolumns (no task column). Required for d-dim inputs since the task cannot be inferred from shape alone — two tasks may share the same number of active dimensions. Must be a key infeature_indices. IfNone, only(d+1)-dim inputs are supported.bounds (Tensor | None) – A
2 x dtensor of[lower, upper]bounds for each feature. If provided, imputation values are constrained to lie within these bounds via a GPyTorchIntervalconstraint. Defaults toNone(unconstrained). This transform is designed to operate on normalized inputs; if bounds differ from[0, 1]^d, a warning is emitted suggesting to chainNormalizebefore this transform.transform_on_train (bool) – If
True, apply the transform in train mode.transform_on_eval (bool) – If
True, apply the transform in eval mode.transform_on_fantasize (bool) – If
True, apply the transform insidefantasizecalls.dtype (torch.dtype) – The dtype for the imputation parameters.
device (torch.device | None) – The device for the imputation parameters.
- property imputation_values: Tensor
The imputation values reshaped to
(num_tasks, d+1), mapped through the Interval constraint when bounds are present.
- transform(X)[source]
Impute missing features with learned values.
- Parameters:
X (Tensor) – A
batch_shape x n x (d+1)-dim tensor of inputs where the last column contains integer task identifiers, or abatch_shape x n x d-dim tensor whentarget_taskwas configured at init (the task column is appended automatically).- Returns:
A
batch_shape x n x (d+1)-dim tensor with missing features replaced by learned imputation values.- Return type:
Tensor
Transform Factory Methods
- botorch.models.transforms.factory.get_rounding_input_transform(one_hot_bounds, integer_indices=None, categorical_features=None, initialization=False, return_numeric=False, approximate=False)[source]
Get a rounding input transform.
The rounding function will take inputs from the unit cube, unnormalize the integers raw search space, round the inputs, and normalize them back to the unit cube.
Categoricals are assumed to be one-hot encoded. Integers are currently assumed to be contiguous ranges (e.g. [1,2,3] and not [1,5,7]).
TODO: support non-contiguous sets of integers by modifying the rounding function.
- Parameters:
one_hot_bounds (Tensor) – The raw search space bounds where categoricals are encoded in one-hot representation and the integer parameters are not normalized.
integer_indices (list[int] | None) – The indices of the integer parameters.
categorical_features (dict[int, int] | None) – A dictionary mapping indices to cardinalities for the categorical features.
initialization (bool) – A boolean indicating whether this exact rounding function is for initialization. For initialization, the bounds for are expanded such that the end point of a range is selected with same probability that an interior point is selected, after rounding.
return_numeric (bool) – A boolean indicating whether to return numeric or one-hot encoded categoricals. Returning a nummeric representation is helpful if the downstream code (e.g. kernel) expects a numeric representation of the categoricals.
approximate (bool) – A boolean indicating whether to use an approximate rounding function.
- Returns:
The rounding function ChainedInputTransform.
- Return type:
Transform Utilities
- botorch.models.transforms.utils.lognorm_to_norm(mu, Cov)[source]
Compute mean and covariance of a MVN from those of the associated log-MVN
If
Yis log-normal with mean mu_ln and covariance Cov_ln, thenX ~ N(mu_n, Cov_n)withCov_n_{ij} = log(1 + Cov_ln_{ij} / (mu_ln_{i} * mu_n_{j})) mu_n_{i} = log(mu_ln_{i}) - 0.5 * log(1 + Cov_ln_{ii} / mu_ln_{i}**2)
- Parameters:
mu (Tensor) – A
batch_shape x nmean vector of the log-Normal distribution.Cov (Tensor) – A
batch_shape x n x ncovariance matrix of the log-Normal distribution.
- Returns:
The
batch_shape x nmean vector of the Normal distributionThe
batch_shape x n x ncovariance matrix of the Normal distribution
- Return type:
A two-tuple containing
- botorch.models.transforms.utils.norm_to_lognorm(mu, Cov)[source]
Compute mean and covariance of a log-MVN from its MVN sufficient statistics
If
X ~ N(mu, Cov)andY = exp(X), thenYis log-normal withmu_ln_{i} = exp(mu_{i} + 0.5 * Cov_{ii}) Cov_ln_{ij} = exp(mu_{i} + mu_{j} + 0.5 * (Cov_{ii} + Cov_{jj})) * (exp(Cov_{ij}) - 1)
- Parameters:
mu (Tensor) – A
batch_shape x nmean vector of the Normal distribution.Cov (Tensor) – A
batch_shape x n x ncovariance matrix of the Normal distribution.
- Returns:
The
batch_shape x nmean vector of the log-Normal distribution.- The
batch_shape x n x ncovariance matrix of the log-Normal distribution.
- The
- Return type:
A two-tuple containing
- botorch.models.transforms.utils.norm_to_lognorm_mean(mu, var)[source]
Compute mean of a log-MVN from its MVN marginals
- Parameters:
mu (Tensor) – A
batch_shape x nmean vector of the Normal distribution.var (Tensor) – A
batch_shape x nvariance vectorof the Normal distribution.
- Returns:
The
batch_shape x nmean vector of the log-Normal distribution.- Return type:
Tensor
- botorch.models.transforms.utils.norm_to_lognorm_variance(mu, var)[source]
Compute variance of a log-MVN from its MVN marginals
- Parameters:
mu (Tensor) – A
batch_shape x nmean vector of the Normal distribution.var (Tensor) – A
batch_shape x nvariance vectorof the Normal distribution.
- Returns:
The
batch_shape x nvariance vector of the log-Normal distribution.- Return type:
Tensor
- botorch.models.transforms.utils.expand_and_copy_tensor(X, batch_shape)[source]
Expand and copy X according to batch_shape.
- Parameters:
X (Tensor) – A
input_batch_shape x n x d-dim tensor of inputs.batch_shape (Size) – The new batch shape.
- Returns:
A
new_batch_shape x n x d-dim tensor of inputs, wherenew_batch_shapeisinput_batch_shapeagainstbatch_shape.- Return type:
Tensor
- botorch.models.transforms.utils.subset_transform(transform)[source]
Decorator of an input transform function to separate out indexing logic.
- botorch.models.transforms.utils.interaction_features(X)[source]
Computes the interaction features between the inputs.
- Parameters:
X (Tensor) – A
batch_shape x q x d-dim tensor of inputs.indices – The input dimensions to generate interaction features for.
- Returns:
A
n x q x 1 x (d * (d-1) / 2))-dim tensor of interaction features.- Return type:
Tensor
- botorch.models.transforms.utils.nanstd(X, dim, keepdim=False)[source]
Computes the standard deviation of the input, ignoring NaNs.
- Parameters:
X (Tensor) – A
batch_shape x n x d-dim tensor of inputs.dim (int) – The dimension along which to compute the standard deviation.
keepdim (bool) – If True, the dimension along which the standard deviation is compute is kept.
- Return type:
Tensor
- botorch.models.transforms.utils.kumaraswamy_warp(X, c0, c1, eps=1e-08)[source]
Warp inputs through a Kumaraswamy CDF.
This assumes that X is contained within the unit cube. This first normalizes inputs to [eps, 1-eps]^d (to ensure that no values are 0 or 1) and then applies passes those inputs through a Kumaraswamy CDF.
- Parameters:
X (Tensor) – A
batch_shape x n x d-dim tensor of inputs.c0 (Tensor) – A
d-dim tensor of the concentration0 parameter for the Kumaraswamy distribution.c1 (Tensor) – A
d-dim tensor of the concentration1 parameter for the Kumaraswamy distribution.eps (float) – A small value that is used to ensure inputs are not 0 or 1.
- Returns:
A
batch_shape x n x d-dim tensor of warped inputs.- Return type:
Tensor
- botorch.models.transforms.utils.inv_kumaraswamy_warp(X, c0, c1, eps=1e-08)[source]
Map warped inputs through an inverse Kumaraswamy CDF.
This takes warped inputs (X) and transforms those via an inverse Kumaraswamy CDF. This then unnormalizes the inputs using bounds of [eps, 1-eps]^d and ensures that the values are within [0, 1]^d.
- Parameters:
X (Tensor) – A
batch_shape x n x d-dim tensor of inputs.c0 (Tensor) – A
d-dim tensor of the concentration0 parameter for the Kumaraswamy distribution.c1 (Tensor) – A
d-dim tensor of the concentration1 parameter for the Kumaraswamy distribution.eps (float) – A small value that is used to ensure inputs are not 0 or 1.
- Returns:
A
batch_shape x n x d-dim tensor of untransformed inputs.- Return type:
Tensor
Utilities
GPyTorch Module Constructors
Pre-packaged kernels for bayesian optimization, including a Scale/Matern kernel that is well-suited to low-dimensional high-noise problems, and a dimension-agnostic RBF kernel without outputscale.
References:
- botorch.models.utils.gpytorch_modules.get_matern_kernel_with_gamma_prior(ard_num_dims, batch_shape=None)[source]
Constructs the Scale-Matern kernel that is used by default by several models. This uses a Gamma(3.0, 6.0) prior for the lengthscale and a Gamma(2.0, 0.15) prior for the output scale.
- Parameters:
ard_num_dims (int)
batch_shape (Size | None)
- Return type:
ScaleKernel
- botorch.models.utils.gpytorch_modules.get_gaussian_likelihood_with_gamma_prior(batch_shape=None)[source]
Constructs the GaussianLikelihood that is used by default by several models. This uses a Gamma(1.1, 0.05) prior and constrains the noise level to be greater than MIN_INFERRED_NOISE_LEVEL (=1e-4).
- Parameters:
batch_shape (Size | None)
- Return type:
GaussianLikelihood
- botorch.models.utils.gpytorch_modules.get_gaussian_likelihood_with_lognormal_prior(batch_shape=None)[source]
Return Gaussian likelihood with a LogNormal(-4.0, 1.0) prior. This prior is based on [Hvarfner2024vanilla].
- Parameters:
batch_shape (Size | None) – Batch shape for the likelihood.
- Returns:
GaussianLikelihood with LogNormal(-4.0, 1.0) prior and constrains the noise level to be greater than MIN_INFERRED_NOISE_LEVEL (=1e-4).
- Return type:
GaussianLikelihood
- botorch.models.utils.gpytorch_modules.get_covar_module_with_dim_scaled_prior(ard_num_dims, batch_shape=None, use_rbf_kernel=True, active_dims=None)[source]
Returns an RBF or Matern kernel with priors from [Hvarfner2024vanilla].
- Parameters:
ard_num_dims (int) – Number of feature dimensions for ARD.
batch_shape (Size | None) – Batch shape for the covariance module.
use_rbf_kernel (bool) – Whether to use an RBF kernel. If False, uses a Matern kernel.
active_dims (Sequence[int] | None) – The set of input dimensions to compute the covariances on. By default, the covariance is computed using the full input tensor. Set this if you’d like to ignore certain dimensions.
- Returns:
A Kernel constructed according to the given arguments. The prior is constrained to have lengthscales larger than 0.025 for numerical stability.
- Return type:
MaternKernel | RBFKernel
Inducing Point Allocators
Functionality for allocating the inducing points of sparse Gaussian process models.
References
Laming Chen and Guoxin Zhang and Hanning Zhou, Fast greedy MAP inference for determinantal point process to improve recommendation diversity, Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2018, https://arxiv.org/abs/1709.05135.
- class botorch.models.utils.inducing_point_allocators.InducingPointAllocator[source]
Bases:
ABCThis class provides functionality to initialize the inducing point locations of an inducing point-based model, e.g. a
SingleTaskVariationalGP.- allocate_inducing_points(inputs, covar_module, num_inducing, input_batch_shape)[source]
Initialize the
num_inducinginducing point locations according to a specific initialization strategy. todo say something about quality- Parameters:
inputs (Tensor) – A (*batch_shape, n, d)-dim input data tensor.
covar_module (Module) – GPyTorch Module returning a LinearOperator kernel matrix.
num_inducing (int) – The maximun number (m) of inducing points (m <= n).
input_batch_shape (Size) – The non-task-related batch shape.
- Returns:
A (*batch_shape, m, d)-dim tensor of inducing point locations.
- Return type:
Tensor
- class botorch.models.utils.inducing_point_allocators.QualityFunction[source]
Bases:
ABCA function that scores inputs with respect to a specific criterion.
- class botorch.models.utils.inducing_point_allocators.UnitQualityFunction[source]
Bases:
QualityFunctionA function returning ones for each element. Using this quality function for inducing point allocation corresponds to allocating inducing points with the sole aim of minimizing predictive variance, i.e. the approach of [burt2020svgp].
- class botorch.models.utils.inducing_point_allocators.ExpectedImprovementQualityFunction(model, maximize)[source]
Bases:
QualityFunctionA function measuring the quality of input points as their expected improvement with respect to a conservative baseline. Expectations are according to the model from the previous BO step. See [moss2023ipa] for details and justification.
- Parameters:
model (Model) – The model fitted during the previous BO step. For now, this must be a single task model (i.e. num_outputs=1).
maximize (bool) – Set True if we are performing function maximization, else set False.
- class botorch.models.utils.inducing_point_allocators.GreedyVarianceReduction[source]
Bases:
InducingPointAllocatorThe inducing point allocator proposed by [burt2020svgp], that greedily chooses inducing point locations with maximal (conditional) predictive variance.
- class botorch.models.utils.inducing_point_allocators.GreedyImprovementReduction(model, maximize)[source]
Bases:
InducingPointAllocatorAn inducing point allocator that greedily chooses inducing points with large predictive variance and that are in promising regions of the search space (according to the model form the previous BO step), see [moss2023ipa].
- Parameters:
model (Model) – The model fitted during the previous BO step.
maximize (bool) – Set True if we are performing function maximization, else set False.
- botorch.models.utils.inducing_point_allocators._pivoted_cholesky_init(train_inputs, kernel_matrix, max_length, quality_scores, epsilon=1e-06)[source]
A pivoted Cholesky initialization method for the inducing points, originally proposed in [burt2020svgp] with the algorithm itself coming from [chen2018dpp]. Code is a PyTorch version from [chen2018dpp], based on https://github.com/laming-chen/fast-map-dpp/blob/master/dpp.py but with a small modification to allow the underlying DPP to be defined through its diversity-quality decomposition,as discussed by [moss2023ipa]. This method returns a greedy approximation of the MAP estimate of the specified DPP, i.e. its returns a set of points that are highly diverse (according to the provided kernel_matrix) and have high quality (according to the provided quality_scores).
- Parameters:
train_inputs (Tensor) – training inputs (of shape n x d)
kernel_matrix (Tensor | LinearOperator) – kernel matrix on the training inputs
max_length (int) – number of inducing points to initialize
quality_scores (Tensor) – scores representing the quality of each candidate input (of shape [n])
epsilon (float) – numerical jitter for stability.
- Returns:
max_length x d tensor of the training inputs corresponding to the top max_length pivots of the training kernel matrix
- Return type:
Tensor
Priors
- class botorch.models.utils.priors.BetaPrior(concentration1, concentration0, validate_args=False, transform=None)[source]
Bases:
Prior,BetaBeta Prior parameterized by concentration1 (alpha) and concentration0 (beta).
pdf(x) = x^(alpha - 1) * (1 - x)^(beta - 1) / B(alpha, beta)
where alpha > 0 and beta > 0 are the concentration parameters. Supported on [0, 1], useful as a prior on correlation parameters.
Initialize BetaPrior.
- Parameters:
concentration1 (float) – Alpha (first concentration) parameter.
concentration0 (float) – Beta (second concentration) parameter.
validate_args (bool) – Whether to validate input arguments.
transform (Optional[Callable[[Tensor], Tensor]]) – Optional transform to apply before computing log_prob.
- expand(batch_shape)[source]
Returns a new distribution instance (or populates an existing instance provided by a derived class) with batch dimensions expanded to
batch_shape. This method callsexpandon the distribution’s parameters. As such, this does not allocate new memory for the expanded distribution instance. Additionally, this does not repeat any args checking or parameter broadcasting in__init__.py, when an instance is first created.- Parameters:
batch_shape (torch.Size) – the desired expanded size.
_instance – new instance provided by subclasses that need to override
.expand.
- Returns:
New distribution instance with batch dimensions expanded to
batch_size.- Return type:
Other Utilties
Assorted helper methods and objects for working with BoTorch models.
- botorch.models.utils.assorted.multioutput_to_batch_mode_transform(train_X, train_Y, num_outputs, train_Yvar=None)[source]
Transforms training inputs for a multi-output model.
Used for multi-output models that internally are represented by a batched single output model, where each output is modeled as an independent batch.
- Parameters:
train_X (Tensor) – A
n x dorinput_batch_shape x n x d(batch mode) tensor of training features.train_Y (Tensor) – A
n x mortarget_batch_shape x n x m(batch mode) tensor of training observations.num_outputs (int) – number of outputs
train_Yvar (Tensor | None) – A
n x mortarget_batch_shape x n x mtensor of observed measurement noise.
- Returns:
3-element tuple containing
A
input_batch_shape x m x n x dtensor of training features.A
target_batch_shape x m x ntensor of training observations.A
target_batch_shape x m x ntensor observed measurement noise.
- Return type:
tuple[Tensor, Tensor, Tensor | None]
- botorch.models.utils.assorted.add_output_dim(X, original_batch_shape)[source]
Insert the output dimension at the correct location.
The trailing batch dimensions of X must match the original batch dimensions of the training inputs, but can also include extra batch dimensions.
- Parameters:
X (Tensor) – A
(new_batch_shape) x (original_batch_shape) x n x dtensor of features.original_batch_shape (Size) – the batch shape of the model’s training inputs.
- Returns:
2-element tuple containing
- A
(new_batch_shape) x (original_batch_shape) x m x n x dtensor of features.
- A
The index corresponding to the output dimension.
- Return type:
tuple[Tensor, int]
- botorch.models.utils.assorted.check_no_nans(Z)[source]
Check that tensor does not contain NaN values.
Raises an InputDataError if
Zcontains NaN values.- Parameters:
Z (Tensor) – The input tensor.
- Return type:
None
- botorch.models.utils.assorted.check_min_max_scaling(X, strict=False, atol=0.01, raise_on_fail=False, ignore_dims=None)[source]
Check that tensor is normalized to the unit cube.
- Parameters:
X (Tensor) – A
batch_shape x n x dinput tensor. Typically the training inputs of a model.strict (bool) – If True, require
Xto be scaled to the unit cube (rather than just to be contained within the unit cube).atol (float) – The tolerance for the boundary check. Only used if
strict=True.raise_on_fail (bool) – If True, raise an exception instead of a warning.
ignore_dims (list[int] | None) – Subset of dimensions where the min-max scaling check is omitted.
- Return type:
None
- botorch.models.utils.assorted.check_standardization(Y, atol_mean=0.01, atol_std=0.01, raise_on_fail=False)[source]
Check that tensor is standardized (zero mean, unit variance).
- Parameters:
Y (Tensor) – The input tensor of shape
batch_shape x n x m. Typically the train targets of a model. Standardization is checked across then-dimension.atol_mean (float) – The tolerance for the mean check.
atol_std (float) – The tolerance for the std check.
raise_on_fail (bool) – If True, raise an exception instead of a warning.
- Return type:
None
- botorch.models.utils.assorted.validate_input_scaling(train_X, train_Y, train_Yvar=None, raise_on_fail=False, ignore_X_dims=None, check_nans_only=False)[source]
Helper function to validate input data to models.
- Parameters:
train_X (Tensor) – A
n x dorbatch_shape x n x d(batch mode) tensor of training features.train_Y (Tensor) – A
n x morbatch_shape x n x m(batch mode) tensor of training observations.train_Yvar (Tensor | None) – A
batch_shape x n x morbatch_shape x n x m(batch mode) tensor of observed measurement noise.raise_on_fail (bool) – If True, raise an error instead of emitting a warning (only for normalization/standardization checks, an error is always raised if NaN values are present).
ignore_X_dims (list[int] | None) – For this subset of dimensions from
{1, ..., d}, ignore the min-max scaling check.check_nans_only (bool) – If True, only check for NaN values. Skips min-max scaling and standardization checks. This is used when the model is provided with a custom covariance module, to avoid potentially irrelevant warnings.
- Return type:
None
This function is typically called inside the constructor of standard BoTorch models. It validates the following: (i) none of the inputs contain NaN values (ii) the training data (
train_X) is normalized to the unit cube for all dimensions except those inignore_X_dims. (iii) the training targets (train_Y) are standardized (zero mean, unit var) No checks (other than the NaN check) are performed for observed variances (train_Yvar) at this point.
- botorch.models.utils.assorted.mod_batch_shape(module, names, b)[source]
Recursive helper to modify gpytorch modules’ batch shape attribute.
Modifies the module in-place.
- Parameters:
module (Module) – The module to be modified.
names (list[str]) – The list of names to access the attribute. If the full name of the module is
"module.sub_module.leaf_module", this will be["sub_module", "leaf_module"].b (int) – The new size of the last element of the module’s
batch_shapeattribute.
- Return type:
None
- botorch.models.utils.assorted.gpt_posterior_settings()[source]
Context manager for settings used for computing model posteriors.
- botorch.models.utils.assorted.detect_duplicates(X, rtol=0, atol=1e-08)[source]
Returns an iterator over index pairs
(duplicate index, original index)for all duplicate entries ofX. Supporting 2-d Tensor only.- Parameters:
X (Tensor) – the datapoints tensor with potential duplicated entries
rtol (float) – relative tolerance
atol (float) – absolute tolerance
- Return type:
Iterator[tuple[int, int]]
- botorch.models.utils.assorted.consolidate_duplicates(X, Y, rtol=0.0, atol=1e-08)[source]
Drop duplicated Xs and update the indices tensor Y accordingly. Supporting 2d Tensor only as in batch mode block design is not guaranteed.
- Parameters:
X (Tensor) – the datapoints tensor
Y (Tensor) – the index tensor to be updated (e.g., pairwise comparisons)
rtol (float) – relative tolerance
atol (float) – absolute tolerance
- Returns:
the consolidated X consolidated_Y: the consolidated Y (e.g., pairwise comparisons indices) new_indices: new index of each original item in X, a tensor of size X.shape[-2]
- Return type:
consolidated_X
- class botorch.models.utils.assorted.fantasize(state=True)[source]
Bases:
_FlagA flag denoting whether we are currently in a
fantasizecontext.- Parameters:
state (bool)
- botorch.models.utils.assorted.get_task_value_remapping(all_task_values, dtype)[source]
Construct a mapping of task values to contiguous int-valued floats.
This function creates a mapping tensor that remaps task indices. All tasks in
all_task_valuesare mapped to contiguous integers starting from 0. Task values not inall_task_valuesare mapped to NaN.- Parameters:
all_task_values (Tensor) – A sorted long-valued tensor of all possible task values in the full task space.
dtype (dtype) – The dtype of the model inputs (e.g.
X), which the new task values should have mapped to (e.g. float, double).
- Returns:
A tensor of shape
all_task_values.max() + 1that maps task values to new task values. The indexing operationmapper[task_value]will produce a tensor of new task values, of the same shape as the original. All task values inall_task_valuesare mapped to contiguous integers [0, 1, …, n-1] where n is the number of tasks. Task values not inall_task_valuesare mapped to NaN. ReturnsNonewhenall_task_valuesequals [0, 1, …, n-1].- Return type:
Tensor | None
- botorch.models.utils.assorted.extract_targets_and_noise_single_output(model)[source]
Extract targets and noise variance for single-output models (m=1).
- Parameters:
model – A GPyTorch model.
- Returns:
A tuple of (Y, Yvar) where Y and Yvar have shape
batch_shape x n x 1.- Return type:
tuple[Tensor, Tensor | None]
- botorch.models.utils.assorted.restore_targets_and_noise_single_output(model, Y, Yvar, strict)[source]
Restore targets and noise variance for single-output models (m=1).
- Parameters:
model – A GPyTorch model.
Y (Tensor) – Targets tensor in shape
batch_shape x n x 1.Yvar (Tensor | None) – Optional noise variance tensor in shape
batch_shape x n x 1.strict (bool) – Whether to strictly enforce shape constraints.
- Return type:
None
- botorch.models.utils.assorted.get_data_for_optimization_help(model, path='optimization_help_data.json')[source]
Save model and training data as JSON for filing Optimization Help issues.
This function packages all the information needed to diagnose optimization issues into a single JSON file that can be uploaded to a GitHub issue.
See the following tutorial for an example of how to use this file to get help with optimization: https://github.com/meta-pytorch/botorch/blob/main/tutorials/optimization_issue_diagnostics/optimization_issue_diagnostics.ipynb
- Parameters:
model (GPyTorchModel) – A BoTorch model with training data.
path (str) – File path where the JSON data will be saved. Defaults to “optimization_help_data.json”.
- Return type:
None