botorch.utils
Constraints
Helpers for handling input or outcome constraints.
- botorch.utils.constraints.get_outcome_constraint_transforms(outcome_constraints)[source]
Create outcome constraint callables from outcome constraint tensors.
- Parameters:
outcome_constraints (tuple[Tensor, Tensor] | None) – A tuple of
(A, b). Forkoutcome constraints andmoutputs atf(x)`,Aisk x mandbisk x 1such thatA f(x) <= b.- Returns:
A list of callables, each mapping a Tensor of size
b x q x mto a tensor of sizeb x q, wheremis the number of outputs of the model. Negative values imply feasibility. The callables support broadcasting (e.g. for calling on a tensor of shapemc_samples x b x q x m).- Return type:
list[Callable[[Tensor], Tensor]] | None
Example
>>> # constrain ``f(x)[0] <= 0`` >>> A = torch.tensor([[1., 0.]]) >>> b = torch.tensor([[0.]]) >>> outcome_constraints = get_outcome_constraint_transforms((A, b))
- botorch.utils.constraints.get_monotonicity_constraints(d, descending=False, dtype=None, device=None)[source]
Returns a system of linear inequalities
(A, b)that generically encodes order constraints on the elements of ad-dimsensional space, i.e.A @ x < bimpliesx[i] < x[i + 1]for ad-dimensional vectorx.Idea: Could encode
Aas sparse matrix, if it is supported well.- Parameters:
d (int) – Dimensionality of the constraint space, i.e. number of monotonic parameters.
descending (bool) – If True, forces the elements of a vector to be monotonically de- creasing and be monotonically increasing otherwise.
dtype (dtype | None) – The dtype of the returned Tensors.
device (device | None) – The device of the returned Tensors.
- Returns:
A tuple of Tensors
(A, b)representing the monotonicity constraint as a system of linear inequalitiesA @ x < b.Ais(d - 1) x d-dimensional andbis(d - 1) x 1-dimensional.- Return type:
tuple[Tensor, Tensor]
- class botorch.utils.constraints.NonTransformedInterval(lower_bound, upper_bound, initial_value=None)[source]
Bases:
IntervalModification of the GPyTorch interval class that does not apply transformations.
This is generally useful, and it is a requirement for the sparse parameters of the Relevance Pursuit model [Ament2024pursuit], since it is not possible to achieve exact zeros with the sigmoid transformations that are applied by default in the GPyTorch Interval class. The variant implemented here does not apply transformations to the parameters, instead passing the bounds constraint to the scipy L-BFGS optimizer. This allows for the expression of exact zeros for sparse optimization algorithms.
NOTE: On a high level, the cleanest solution for this would be to separate out the 1) definition and book-keeping of parameter constraints on the one hand, and 2) the re-parameterization of the variables with some monotonic transformation, since the two steps are orthogonal, but this would require refactoring GPyTorch.
Constructor of the NonTransformedInterval class.
- Parameters:
lower_bound (float | Tensor) – The lower bound of the interval.
upper_bound (float | Tensor) – The upper bound of the interval.
initial_value (float | Tensor | None) – The initial value of the parameter.
- transform(tensor)[source]
Transforms a tensor to satisfy the specified bounds.
If upper_bound is finite, we assume that
self.transformsaturates at 1 as tensor -> infinity. Similarly, if lower_bound is finite, we assume thatself.transformsaturates at 0 as tensor -> -infinity.Example transforms for one of the bounds being finite include torch.exp and torch.nn.functional.softplus. An example transform for the case where both are finite is torch.nn.functional.sigmoid.
- Parameters:
tensor (Tensor)
- Return type:
Tensor
- class botorch.utils.constraints.LogTransformedInterval(lower_bound, upper_bound, initial_value=None)[source]
Bases:
IntervalModification of the GPyTorch interval class.
The Interval class in GPyTorch will map the parameter to the range [0, 1] before applying the inverse transform. LogTransformedInterval skips this step to avoid numerical issues, and applies the log transform directly to the parameter values. GPyTorch automatically recognizes that the bound constraint have not been applied yet, and passes the bounds to the optimizer instead, which then optimizes log(parameter) under the constraints log(lower) <= log(parameter) <= log(upper).
Constructor of the LogTransformedInterval class.
- Parameters:
lower_bound (float) – The lower bound of the interval.
upper_bound (float) – The upper bound of the interval.
initial_value (float | None) – The initial value of the parameter.
Containers
Representations for different kinds of data.
- class botorch.utils.containers.BotorchContainer[source]
Bases:
ABCAbstract base class for BoTorch’s data containers.
A BotorchContainer represents a tensor, which should be the sole object returned by its
__call__method. Said tensor is expected to consist of one or more “events” (e.g. data points or feature vectors), whose shape is given by the requiredevent_shapefield.- event_shape: Size
- abstract property shape: Size
- abstract property device: device
- abstract property dtype: dtype
- class botorch.utils.containers.DenseContainer(*, values, event_shape)[source]
Bases:
BotorchContainerBasic representation of data stored as a dense Tensor.
- Parameters:
values (Tensor)
event_shape (Size)
- values: Tensor
- event_shape: Size
- property shape: Size
- property device: device
- property dtype: dtype
- class botorch.utils.containers.SliceContainer(*, values, indices, event_shape)[source]
Bases:
BotorchContainerRepresent data points formed by concatenating (n-1)-dimensional slices taken from the leading dimension of an n-dimensional source tensor.
- Parameters:
values (Tensor)
indices (LongTensor)
event_shape (Size)
- values: Tensor
- indices: LongTensor
- event_shape: Size
- property shape: Size
- property device: device
- property dtype: dtype
Context Managers
Utilities for optimization.
- class botorch.utils.context_managers.TensorCheckpoint(values, device, dtype)[source]
Bases:
NamedTupleCreate new instance of TensorCheckpoint(values, device, dtype)
- Parameters:
values (Tensor)
device (device | None)
dtype (dtype | None)
- values: Tensor
Alias for field number 0
- device: device | None
Alias for field number 1
- dtype: dtype | None
Alias for field number 2
- botorch.utils.context_managers.delattr_ctx(instance, *attrs, enforce_hasattr=False)[source]
Contextmanager for temporarily deleting attributes.
- Parameters:
instance (object)
attrs (str)
enforce_hasattr (bool)
- Return type:
Generator[None, None, None]
- botorch.utils.context_managers.parameter_rollback_ctx(parameters, checkpoint=None, **tkwargs)[source]
Contextmanager that exits by rolling back a module’s state_dict.
- Parameters:
module – Module instance.
name_filter – Optional Boolean function used to filter items by name.
checkpoint (dict[str, TensorCheckpoint] | None) – Optional cache of values and tensor metadata specifying the rollback state for the module (or some subset thereof).
**tkwargs (Any) – Keyword arguments passed to
torch.Tensor.towhen copying data from each tensor inmodule.state_dict()to the internally created checkpoint. Only adhered to when thecheckpointargument is None.parameters (dict[str, Tensor])
- Yields:
A dictionary of TensorCheckpoints for the module’s state_dict. Any in-places changes to the checkpoint will be observed at rollback time. If the checkpoint is cleared, no rollback will occur.
- Return type:
Generator[dict[str, TensorCheckpoint], None, None]
- botorch.utils.context_managers.module_rollback_ctx(module, name_filter=None, checkpoint=None, **tkwargs)[source]
Contextmanager that exits by rolling back a module’s state_dict.
- Parameters:
module (Module) – Module instance.
name_filter (Callable[[str], bool] | None) – Optional Boolean function used to filter items by name.
checkpoint (dict[str, TensorCheckpoint] | None) – Optional cache of values and tensor metadata specifying the rollback state for the module (or some subset thereof).
**tkwargs (Any) – Keyword arguments passed to
torch.Tensor.towhen copying data from each tensor inmodule.state_dict()to the internally created checkpoint. Only adhered to when thecheckpointargument is None.
- Yields:
A dictionary of TensorCheckpoints for the module’s state_dict. Any in-places changes to the checkpoint will be observed at rollback time. If the checkpoint is cleared, no rollback will occur.
- Return type:
Generator[dict[str, TensorCheckpoint], None, None]
Datasets
Representations for different kinds of datasets.
- class botorch.utils.datasets.SupervisedDataset(X, Y, *, feature_names, outcome_names, Yvar=None, validate_init=True, group_indices=None)[source]
Bases:
objectBase class for datasets consisting of labelled pairs
(X, Y)and an optionalYvarthat stipulates observations variances so thatY[i] ~ N(f(X[i]), Yvar[i]).Example:
X = torch.rand(16, 2) Y = torch.rand(16, 1) feature_names = ["learning_rate", "embedding_dim"] outcome_names = ["neg training loss"] A = SupervisedDataset( X=X, Y=Y, feature_names=feature_names, outcome_names=outcome_names, ) B = SupervisedDataset( X=DenseContainer(X, event_shape=X.shape[-1:]), Y=DenseContainer(Y, event_shape=Y.shape[-1:]), feature_names=feature_names, outcome_names=outcome_names, ) assert A == B
Constructs a
SupervisedDataset.- Parameters:
X (BotorchContainer | Tensor) – A
TensororBotorchContainerrepresenting the input features.Y (BotorchContainer | Tensor) – A
TensororBotorchContainerrepresenting the outcomes.feature_names (list[str]) – A list of names of the features in
X.outcome_names (list[str]) – A list of names of the outcomes in
Y.Yvar (BotorchContainer | Tensor | None) – An optional
TensororBotorchContainerrepresenting the observation noise.validate_init (bool) – If
True, validates the input shapes.group_indices (Tensor | None) – A
Tensorrepresenting the which rows of X and Y are grouped together. This is used to support applications in which multiple observations should be considered as a group, e.g., learning-curve-based modeling. If provided, its shape must be compatible with X and Y.
- property X: Tensor
- property Y: Tensor
- property Yvar: Tensor | None
- clone(deepcopy=False, mask=None)[source]
Return a copy of the dataset.
- Parameters:
deepcopy (bool) – If True, perform a deep copy. Otherwise, use the same tensors/lists.
mask (Tensor | None) – A
n-dim boolean mask indicating which rows to keep. This is used along the -2 dimension.
- Returns:
The new dataset.
- Return type:
- class botorch.utils.datasets.RankingDataset(X, Y, feature_names, outcome_names, validate_init=True)[source]
Bases:
SupervisedDatasetA SupervisedDataset whose labelled pairs
(x, y)consist of m-ary combinationsx ∈ Z^{m}of elements from a ground setZ = (z_1, ...)and ranking vectorsy {0, ..., m - 1}^{m}with properties:Ranks start at zero, i.e. min(y) = 0.
Sorted ranks are contiguous unless one or more ties are present.
kranks are skipped after ak-way tie.
Example:
X = SliceContainer( values=torch.rand(16, 2), indices=torch.stack([torch.randperm(16)[:3] for _ in range(8)]), event_shape=torch.Size([3 * 2]), ) Y = DenseContainer( torch.stack([torch.randperm(3) for _ in range(8)]), event_shape=torch.Size([3]) ) feature_names = ["item_0", "item_1"] outcome_names = ["ranking outcome"] dataset = RankingDataset( X=X, Y=Y, feature_names=feature_names, outcome_names=outcome_names, )
Construct a
RankingDataset.- Parameters:
X (SliceContainer) – A
SliceContainerrepresenting the input features being ranked.Y (BotorchContainer | Tensor) – A
TensororBotorchContainerrepresenting the rankings.feature_names (list[str]) – A list of names of the features in X.
outcome_names (list[str]) – A list of names of the outcomes in Y.
validate_init (bool) – If
True, validates the input shapes.
- class botorch.utils.datasets.MultiTaskDataset(datasets, target_outcome_name, task_feature_index=None)[source]
Bases:
SupervisedDatasetThis is a multi-task dataset that is constructed from the datasets of individual tasks. It offers functionality to combine parts of individual datasets to construct the inputs necessary for the
MultiTaskGPmodels.The datasets of individual tasks are allowed to represent different sets of features. When there are heterogeneous feature sets, calling
MultiTaskDataset.Xwill result in an error.Construct a
MultiTaskDataset.- Parameters:
datasets (list[SupervisedDataset]) – A list of the datasets of individual tasks. Each dataset is expected to contain data for only one outcome.
target_outcome_name (str) – Name of the target outcome to be modeled.
task_feature_index (int | None) – If the task feature is included in the Xs of the individual datasets, this should be used to specify its index. If omitted, the task feature will be appended while concatenating Xs. If given, we sanity-check that the names of the task features match between all datasets.
- classmethod from_joint_dataset(dataset, task_feature_index, target_task_value, outcome_names_per_task=None)[source]
Construct a
MultiTaskDatasetfrom a joint dataset that includes the data for all tasks with the task feature index.This will break down the joint dataset into individual datasets by the value of the task feature. Each resulting dataset will have its outcome name set based on
outcome_names_per_task, with the missing values defaulting totask_<task_feature>(except for the target task, which will retain the original outcome name from the dataset).- Parameters:
dataset (SupervisedDataset) – The joint dataset.
task_feature_index (int) – The column index of the task feature in
dataset.X.target_task_value (int) – The value of the task feature for the target task in the dataset. The data for the target task is filtered according to
dataset.X[task_feature_index] == target_task_value.outcome_names_per_task (dict[int, str] | None) – Optional dictionary mapping task feature values to the outcome names for each task. If not provided, the auxiliary tasks will be named
task_<task_feature>and the target task will retain the outcome name from the dataset.
- Returns:
A
MultiTaskDatasetinstance.- Return type:
- property X: Tensor
Appends task features, if needed, and concatenates the Xs of datasets to produce the
train_Xexpected byMultiTaskGPand subclasses.If appending the task features, 0 is reserved for the target task and the remaining tasks are populated with 1, 2, …, len(datasets) - 1.
- property Y: Tensor
Concatenates Ys of the datasets.
- property Yvar: Tensor | None
Concatenates Yvars of the datasets if they exist.
- get_dataset_without_task_feature(outcome_name)[source]
A helper for extracting the child datasets with their task features removed.
If the task feature index is
None, the dataset will be returned as is.- Parameters:
outcome_name (str) – The outcome name for the dataset to extract.
- Returns:
The dataset without the task feature.
- Return type:
- get_heterogeneous_feature_mapping()[source]
Compute canonical feature ordering for heterogeneous datasets.
Target features come first (preserving order), then source-only features are appended. The task column (at
task_feature_index) is excluded from the mapping.- Returns:
Ordered datasets (target first, then sources).
Feature indices mapping each dataset’s non-task features to the canonical ordering.
Full feature dimensionality (number of unique non-task features).
- Return type:
A 3-tuple of
- Raises:
NotImplementedError – If
task_feature_indexis not-1.
- clone(deepcopy=False, mask=None)[source]
Return a copy of the dataset.
- Parameters:
deepcopy (bool) – If True, perform a deep copy. Otherwise, use the same tensors/lists/datasets.
mask (Tensor | None) – A
n-dim boolean mask indicating which rows to keep from the target dataset. This is used along the -2 dimension.
- Returns:
The new dataset.
- Return type:
- class botorch.utils.datasets.ContextualDataset(datasets, parameter_decomposition, metric_decomposition=None)[source]
Bases:
SupervisedDatasetThis is a contextual dataset that is constructed from either a single dateset containing overall outcome or a list of datasets that each corresponds to a context breakdown.
Construct a
ContextualDataset.- Parameters:
datasets (list[SupervisedDataset]) – A list of the datasets of individual tasks. Each dataset is expected to contain data for only one outcome.
parameter_decomposition (dict[str, list[str]]) – Dict from context name to list of feature names corresponding to that context.
metric_decomposition (dict[str, list[str]] | None) – Context breakdown metrics. Keys are context names. Values are the lists of metric names belonging to the context: {‘context1’: [‘m1_c1’], ‘context2’: [‘m1_c2’],}.
- property feature_names: list[str]
- property outcome_names: list[str]
- property X: Tensor
- property Y: Tensor
Concatenates the Ys from the child datasets to create the Y expected by LCEM model if there are multiple datasets; Or return the Y expected by LCEA model if there is only one dataset.
- property Yvar: Tensor | None
Concatenates the Yvars from the child datasets to create the Y expected by LCEM model if there are multiple datasets; Or return the Yvar expected by LCEA model if there is only one dataset.
- clone(deepcopy=False, mask=None)[source]
Return a copy of the dataset.
- Parameters:
deepcopy (bool) – If True, perform a deep copy. Otherwise, use the same tensors/lists/datasets.
mask (Tensor | None) – A
n-dim boolean mask indicating which rows to keep. This is used along the -2 dimension.nhere corresponds to the number of rows in an individual dataset.
- Returns:
The new dataset.
- Return type:
Dispatcher
- botorch.utils.dispatcher.type_bypassing_encoder(arg)[source]
- Parameters:
arg (Any)
- Return type:
type
- class botorch.utils.dispatcher.Dispatcher(name, doc=None, encoder=<class 'type'>)[source]
Bases:
DispatcherClearing house for multiple dispatch functionality. This class extends
<multipledispatch.Dispatcher>by: (i) generalizing the argument encoding convention during method lookup, (ii) implementing__getitem__as a dedicated method lookup function.- Parameters:
name (str) – A string identifier for the
Dispatcherinstance.doc (str | None) – A docstring for the multiply dispatched method(s).
encoder (Callable[Any, type]) – A callable that individually transforms the arguments passed at runtime in order to construct the key used for method lookup as
tuple(map(encoder, args)). Defaults totype.
- dispatch(*types)[source]
Method lookup strategy. Checks for an exact match before traversing the set of registered methods according to the current ordering.
- Parameters:
types (type) – A tuple of types that gets compared with the signatures of registered methods to determine compatibility.
- Returns:
The first method encountered with a matching signature.
- Return type:
Callable
- encode_args(args)[source]
Converts arguments into a tuple of types used during method lookup.
- Parameters:
args (Any)
- Return type:
tuple[type]
- help(*args, **kwargs)[source]
Prints the retrieved method’s docstring.
- Parameters:
args (Any)
kwargs (Any)
- Return type:
None
- property encoder: Callable[Any, type]
- name
- funcs
- doc
Evaluation
- botorch.utils.evaluation.compute_in_sample_model_fit_metric(model, criterion)[source]
Compute a in-sample model fit metric.
- Parameters:
model (ExactGP) – A fitted ExactGP.
criterion (str) – Evaluation criterion. One of “MLL”, “AIC”, “BIC”. AIC penalizes the MLL based on the number of parameters. BIC uses a slightly different penalty based on the number of parameters and data points.
- Returns:
The in-sample evaluation metric.
- Return type:
float
Low-Rank Cholesky Update Utils
- botorch.utils.low_rank.extract_batch_covar(mt_mvn)[source]
Extract a batched independent covariance matrix from an MTMVN.
- Parameters:
mt_mvn (MultitaskMultivariateNormal) – A multi-task multivariate normal with a block diagonal covariance matrix.
- Returns:
- A lazy covariance matrix consisting of a batch of the blocks of
the diagonal of the MultitaskMultivariateNormal.
- Return type:
LinearOperator
- botorch.utils.low_rank.sample_cached_cholesky(posterior, baseline_L, q, base_samples, sample_shape, max_tries=6)[source]
Get posterior samples at the
qnew points from the joint multi-output posterior.- Parameters:
posterior (GPyTorchPosterior) – The joint posterior is over (X_baseline, X).
baseline_L (Tensor) – The baseline lower triangular cholesky factor.
q (int) – The number of new points in X.
base_samples (Tensor) – The base samples.
sample_shape (Size) – The sample shape.
max_tries (int) – The number of tries for computing the Cholesky decomposition with increasing jitter.
- Returns:
- A
sample_shape x batch_shape x q x m-dim tensor of posterior samples at the new points.
- A
- Return type:
Tensor
Multi-Task Distribution Utils
Helpers for multitask modeling.
Objective
Helpers for handling objectives.
- botorch.utils.objective.get_objective_weights_transform(weights)[source]
Create a linear objective callable from a set of weights.
Create a callable mapping a Tensor of size
b x q x mand an (optional) Tensor of sizeb x q x dto a Tensor of sizeb x q, wheremis the number of outputs of the model using scalarization via the objective weights. This callable supports broadcasting (e.g. for calling on a tensor of shapemc_samples x b x q x m). Form = 1, the objective weight is used to determine the optimization direction.- Parameters:
weights (Tensor | None) – a 1-dimensional Tensor containing a weight for each task. If not provided, the identity mapping is used.
- Returns:
Transform function using the objective weights.
- Return type:
Callable[[Tensor, Tensor | None], Tensor]
Example
>>> weights = torch.tensor([0.75, 0.25]) >>> transform = get_objective_weights_transform(weights)
- botorch.utils.objective.apply_constraints_nonnegative_soft(obj, constraints, samples, eta)[source]
Applies constraints to a non-negative objective.
This function uses a sigmoid approximation to an indicator function for each constraint.
- Parameters:
obj (Tensor) – A
n_samples x b x q (x m')-dim Tensor of objective values.constraints (list[Callable[[Tensor], Tensor]]) – A list of callables, each mapping a Tensor of size
b x q x mto a Tensor of sizeb x q, where negative values imply feasibility. This callable must support broadcasting. Only relevant for multi- output models (m> 1).samples (Tensor) – A
n_samples x b x q x mTensor of samples drawn from the posterior.eta (Tensor | float) – The temperature parameter for the sigmoid function. Can be either a float or a 1-dim tensor. In case of a float the same eta is used for every constraint in constraints. In case of a tensor the length of the tensor must match the number of provided constraints. The i-th constraint is then estimated with the i-th eta value.
- Returns:
A
n_samples x b x q (x m')-dim tensor of feasibility-weighted objectives.- Return type:
Tensor
- botorch.utils.objective.compute_feasibility_indicator(constraints, samples, marginalize_dim=None)[source]
Computes the feasibility of a list of constraints given posterior samples.
- Parameters:
constraints (list[Callable[[Tensor], Tensor]] | None) – A list of callables, each mapping a batch_shape x q x m`-dim Tensor to a
batch_shape x q-dim Tensor, where negative values imply feasibility.samples (Tensor) – A batch_shape x q x m`-dim Tensor of posterior samples.
marginalize_dim (int | None) – A batch dimension that should be marginalized. For example, this is useful when using a batched fully Bayesian model.
- Returns:
A
batch_shape x q-dim tensor of Boolean feasibility values.- Return type:
Tensor
- botorch.utils.objective.compute_smoothed_feasibility_indicator(constraints, samples, eta, log=False, fat=False)[source]
Computes the smoothed feasibility indicator of a list of constraints.
Given posterior samples, using a sigmoid to smoothly approximate the feasibility indicator of each individual constraint to ensure differentiability and high gradient signal. The
fatandlogoptions improve the numerical behavior of the smooth approximation.NOTE: Negative constraint values are associated with feasibility.
- Parameters:
constraints (list[Callable[[Tensor], Tensor]]) – A list of callables, each mapping a Tensor of size
b x q x mto a Tensor of sizeb x q. Thefatkeyword defines how the callable is further processed. By default a sigmoid or fatmoid transformation is applied where negative values imply feasibility. The applied transformation maps the feasibility indicator of the constraint from the interval [-inf, inf] to the interval [0, 1]. IfNoneis provided forfat, no transformation is applied and it is expected that the constraint callable delivers values in the interval [0, 1] without further processing that can be interpreted as probabilities of feasibility directly. This is especially useful for using classifiers as constraints. The callable must support broadcasting. Only relevant for multi-output models (m> 1).samples (Tensor) – A
n_samples x b x q x mTensor of samples drawn from the posterior.eta (Tensor | float) – The temperature parameter for the sigmoid/fatmoid function. Can be either a float or a 1-dim tensor. In case of a float the same eta is used for every constraint in constraints. In case of a tensor the length of the tensor must match the number of provided constraints. The i-th constraint is then estimated with the i-th eta value. In case no fatmoid/sigmoid is applied, eta is ignored.
log (bool) – Toggles the computation of the log-feasibility indicator.
fat (list[bool | None] | bool) – Toggles the computation of the fat-tailed feasibility indicator. Can be either a list or a boolean. If case of a boolean, the same feasibility indicator is used for all constraints. If a list is provided, the length of the list must match the number of provided constraints. The i-th constraint is then associated with the i-th fat value. In case, the i-th fat value is
None, no fatmoid/sigmoid transformation is applied to the i-th constraint and it is assumed that the constraint by itself delivers values in the interval [0, 1]. This is especially useful for using classifiers as constraints. If a boolean is provided and its value isTrue, a fatmoid transformation is applied, if its value isFalse, a sigmoid transformation is applied.
- Returns:
A
n_samples x b x q-dim tensor of feasibility indicator values.- Return type:
Tensor
- botorch.utils.objective.apply_constraints(obj, constraints, samples, infeasible_cost, eta=0.001)[source]
Apply constraints using an infeasible_cost
Mfor negative objectives.This allows feasibility-weighting an objective for the case where the objective can be negative by using the following strategy: (1) Add
Mto make obj non-negative; (2) Apply constraints using the sigmoid approximation; (3) Shift by-M.- Parameters:
obj (Tensor) – A
n_samples x b x q (x m')-dim Tensor of objective values.constraints (list[Callable[[Tensor], Tensor]]) – A list of callables, each mapping a Tensor of size
b x q x mto a Tensor of sizeb x q, where negative values imply feasibility. This callable must support broadcasting. Only relevant for multi- output models (m> 1).samples (Tensor) – A
n_samples x b x q x mTensor of samples drawn from the posterior.infeasible_cost (float) – The infeasible value.
eta (Tensor | float) – The temperature parameter of the sigmoid function. Can be either a float or a 1-dim tensor. In case of a float the same eta is used for every constraint in constraints. In case of a tensor the length of the tensor must match the number of provided constraints. The i-th constraint is then estimated with the i-th eta value.
- Returns:
A
n_samples x b x q (x m')-dim tensor of feasibility-weighted objectives.- Return type:
Tensor
Rounding
Discretization (rounding) functions for acquisition optimization.
References
- botorch.utils.rounding.approximate_round(X, tau=0.001)[source]
Differentiable approximate rounding function.
This method is a piecewise approximation of a rounding function where each piece is a hyperbolic tangent function.
- Parameters:
X (Tensor) – The tensor to round to the nearest integer (element-wise).
tau (float) – A temperature hyperparameter.
- Returns:
The approximately rounded input tensor.
- Return type:
Tensor
- class botorch.utils.rounding.IdentitySTEFunction(*args, **kwargs)[source]
Bases:
FunctionBase class for functions using straight through gradient estimators.
This class approximates the gradient with the identity function.
- class botorch.utils.rounding.RoundSTE(*args, **kwargs)[source]
Bases:
IdentitySTEFunctionRound the input tensor and use a straight-through gradient estimator.
[Daulton2022bopr] proposes using this in acquisition optimization.
- class botorch.utils.rounding.OneHotArgmaxSTE(*args, **kwargs)[source]
Bases:
IdentitySTEFunctionDiscretize a continuous relaxation of a one-hot encoded categorical.
This returns a one-hot encoded categorical and use a straight-through gradient estimator via an identity function.
[Daulton2022bopr] proposes using this in acquisition optimization.
- static forward(ctx, X)[source]
Discretize the input tensor.
This applies a argmax along the last dimensions of the input tensor and one-hot encodes the result.
- Parameters:
X (Tensor) – The tensor to be rounded.
- Returns:
A tensor where each element is rounded to the nearest integer.
- Return type:
Tensor
Sampling
Utilities for MC and qMC sampling.
References
T. A. Trikalinos and G. van Valkenhoef. Efficient sampling from uniform density n-polytopes. Technical report, Brown University, 2014.
- botorch.utils.sampling.manual_seed(seed=None)[source]
Contextmanager for manual setting the torch.random seed.
- Parameters:
seed (int | None) – The seed to set the random number generator to.
- Returns:
Generator
- Return type:
Generator[None, None, None]
Example
>>> with manual_seed(1234): >>> X = torch.rand(3)
- botorch.utils.sampling.draw_sobol_samples(bounds, n, q, batch_shape=None, seed=None)[source]
Draw qMC samples from the box defined by bounds.
- Parameters:
bounds (Tensor) – A
2 x ddimensional tensor specifying box constraints on ad-dimensional space, where bounds[0, :] and bounds[1, :] correspond to lower and upper bounds, respectively.n (int) – The number of (q-batch) samples. As a best practice, use powers of 2.
q (int) – The size of each q-batch.
batch_shape (Iterable[int] | Size | None) – The batch shape of the samples. If given, returns samples of shape
n x batch_shape x q x d, where each batch is ann x q x d-dim tensor of qMC samples.seed (int | None) – The seed used for initializing Owen scrambling. If None (default), use a random seed.
- Returns:
A
n x batch_shape x q x d-dim tensor of qMC samples from the box defined by bounds.- Return type:
Tensor
Example
>>> bounds = torch.stack([torch.zeros(3), torch.ones(3)]) >>> samples = draw_sobol_samples(bounds, 16, 2)
- botorch.utils.sampling.draw_sobol_normal_samples(d, n, device=None, dtype=None, seed=None)[source]
Draw qMC samples from a multi-variate standard normal N(0, I_d).
A primary use-case for this functionality is to compute an QMC average of f(X) over X where each element of X is drawn N(0, 1).
- Parameters:
d (int) – The dimension of the normal distribution.
n (int) – The number of samples to return. As a best practice, use powers of 2.
device (device | None) – The torch device.
dtype (dtype | None) – The torch dtype.
seed (int | None) – The seed used for initializing Owen scrambling. If None (default), use a random seed.
- Returns:
A tensor of qMC standard normal samples with dimension
n x dwith device and dtype specified by the input.- Return type:
Tensor
Example
>>> samples = draw_sobol_normal_samples(2, 16)
- botorch.utils.sampling.sample_hypersphere(d, n=1, qmc=False, seed=None, device=None, dtype=None)[source]
Sample uniformly from a unit d-sphere.
- Parameters:
d (int) – The dimension of the hypersphere.
n (int) – The number of samples to return.
qmc (bool) – If True, use QMC Sobol sampling (instead of i.i.d. uniform).
seed (int | None) – If provided, use as a seed for the RNG.
device (device | None) – The torch device.
dtype (dtype | None) – The torch dtype.
- Returns:
An
n x dtensor of uniform samples from from the d-hypersphere.- Return type:
Tensor
Example
>>> sample_hypersphere(d=5, n=10)
- botorch.utils.sampling.sample_simplex(d, n=1, qmc=False, seed=None, device=None, dtype=None)[source]
Sample uniformly from a d-simplex.
- Parameters:
d (int) – The dimension of the simplex.
n (int) – The number of samples to return.
qmc (bool) – If True, use QMC Sobol sampling (instead of i.i.d. uniform).
seed (int | None) – If provided, use as a seed for the RNG.
device (device | None) – The torch device.
dtype (dtype | None) – The torch dtype.
- Returns:
An
n x dtensor of uniform samples from from the d-simplex.- Return type:
Tensor
Example
>>> sample_simplex(d=3, n=10)
- botorch.utils.sampling.sample_polytope(A, b, x0, n=10000, n0=100, n_thinning=1, seed=None)[source]
Hit and run sampler from uniform sampling points from a polytope, described via inequality constraints A*x<=b.
- Parameters:
A (Tensor) – A
m x d-dim Tensor describing inequality constraints so that all samples satisfyAx <= b.b (Tensor) – A
m-dim Tensor describing the inequality constraints so that all samples satisfyAx <= b.x0 (Tensor) – A
d-dim Tensor representing a starting point of the chain satisfying the constraints.n (int) – The number of resulting samples kept in the output.
n0 (int) – The number of burn-in samples. The chain will produce n+n0 samples but the first n0 samples are not saved.
n_thinning (int) – The amount of thinnning. This function will return every
n_thinning-th sample from the chain (after burn-in).seed (int | None) – The seed for the sampler. If omitted, use a random seed.
- Returns:
(n, d) dim Tensor containing the resulting samples.
- Return type:
Tensor
- botorch.utils.sampling.batched_multinomial(weights, num_samples, replacement=False, generator=None, out=None)[source]
Sample from multinomial with an arbitrary number of batch dimensions.
- Parameters:
weights (Tensor) – A
batch_shape x num_categoriestensor of weights. For each batch indexi, j, ..., this functions samples from a multinomial with inputweights[i, j, ..., :]. Note that the weights need not sum to one, but must be non-negative, finite and have a non-zero sum.num_samples (int) – The number of samples to draw for each batch index. Must be smaller than
num_categoriesifreplacement=False.replacement (bool) – If True, samples are drawn with replacement.
generator (Generator | None) – A pseudorandom number generator for sampling.
out (Tensor | None) – The output tensor (optional). If provided, must be of size
batch_shape x num_samples.
- Returns:
A
batch_shape x num_samplestensor of samples.- Return type:
LongTensor
This is a thin wrapper around
torch.multinomialthat allows weight (input) tensors with an arbitrary number of batch dimensions (torch.multinomialonly allows a single batch dimension). The calling signature is the same as fortorch.multinomial.Example
>>> weights = torch.rand(2, 3, 10) >>> samples = batched_multinomial(weights, 4) # shape is 2 x 3 x 4
- botorch.utils.sampling.find_interior_point(A, b, A_eq=None, b_eq=None)[source]
Find an interior point of a polytope via linear programming.
- Parameters:
A (ndarray[tuple[Any, ...], dtype[_ScalarT]]) – A
n_ineq x d-dim numpy array containing the coefficients of the constraint inequalities.b (ndarray[tuple[Any, ...], dtype[_ScalarT]]) – A
n_ineq x 1-dim numpy array containing the right hand sides of the constraint inequalities.A_eq (ndarray[tuple[Any, ...], dtype[_ScalarT]] | None) – A
n_eq x d-dim numpy array containing the coefficients of the constraint equalities.b_eq (ndarray[tuple[Any, ...], dtype[_ScalarT]] | None) – A
n_eq x 1-dim numpy array containing the right hand sides of the constraint equalities.
- Returns:
A
d-dim numpy array containing an interior point of the polytope. This function will raise a ValueError if there is no such point.- Return type:
ndarray[tuple[Any, …], dtype[_ScalarT]]
This method solves the following Linear Program:
min -s subject to A @ x <= b - 2 * s, s >= 0, A_eq @ x = b_eq
In case the polytope is unbounded, then it will also constrain the slack variable
stos<=1.
- class botorch.utils.sampling.PolytopeSampler(inequality_constraints=None, equality_constraints=None, bounds=None, interior_point=None)[source]
Bases:
ABCBase class for samplers that sample points from a polytope.
- Parameters:
inequality_constraints (tuple[Tensor, Tensor] | None) – Tensors
(A, b)describing inequality constraintsA @ x <= b, whereAis an_ineq_con x d-dim Tensor andbis an_ineq_con x 1-dim Tensor, withn_ineq_conthe number of inequalities anddthe dimension of the sample space.equality_constraints (tuple[Tensor, Tensor] | None) – Tensors
(C, d)describing the equality constraintsC @ x = d, whereCis an_eq_con x d-dim Tensor anddis an_eq_con x 1-dim Tensor withn_eq_conthe number of equalities.bounds (Tensor | None) – A
2 x d-dim tensor of box bounds, whereinf(-inf) means that the respective dimension is unbounded above (below).interior_point (Tensor | None) – A
d x 1-dim Tensor presenting a point in the (relative) interior of the polytope. If omitted, determined automatically by solving a Linear Program.
- feasible(x)[source]
Check whether a point is contained in the polytope.
- Parameters:
x (Tensor) – A
d x 1-dim Tensor.- Returns:
True if
xis contained inside the polytope (incl. its boundary), False otherwise.- Return type:
bool
- class botorch.utils.sampling.HitAndRunPolytopeSampler(inequality_constraints=None, equality_constraints=None, bounds=None, interior_point=None, n_burnin=200, n_thinning=20, seed=None)[source]
Bases:
PolytopeSamplerA sampler for sampling from a polyope using a hit-and-run algorithm.
A sampler for sampling from a polyope using a hit-and-run algorithm.
- Parameters:
inequality_constraints (tuple[Tensor, Tensor] | None) – Tensors
(A, b)describing inequality constraintsA @ x <= b, whereAis an_ineq_con x d-dim Tensor andbis an_ineq_con x 1-dim Tensor, withn_ineq_conthe number of inequalities anddthe dimension of the sample space.equality_constraints (tuple[Tensor, Tensor] | None) – Tensors
(C, d)describing the equality constraintsC @ x = d, whereCis an_eq_con x d-dim Tensor anddis an_eq_con x 1-dim Tensor withn_eq_conthe number of equalities.bounds (Tensor | None) – A
2 x d-dim tensor of box bounds, whereinf(-inf) means that the respective dimension is unbounded from above (below). If omitted, no bounds (in addition to the above constraints) are applied.interior_point (Tensor | None) – A
d x 1-dim Tensor representing a point in the (relative) interior of the polytope. If omitted, determined automatically by solving a Linear Program.n_burnin (int) – The number of burn in samples. The sampler will discard
n_burninsamples before returning the first sample.n_thinning (int) – The amount of thinning. The sampler will return every
n_thinningsample (after burn-in). This may need to be increased for sets of constraints that are difficult to satisfy (i.e. in which case the volume of the constraint polytope is small relative to that of its bounding box).seed (int | None) – The random seed.
- class botorch.utils.sampling.DelaunayPolytopeSampler(inequality_constraints=None, equality_constraints=None, bounds=None, interior_point=None)[source]
Bases:
PolytopeSamplerA polytope sampler using Delaunay triangulation.
This sampler first enumerates the vertices of the constraint polytope and then uses a Delaunay triangulation to tesselate its convex hull.
The sampling happens in two stages: 1. First, we sample from the set of hypertriangles generated by the Delaunay triangulation (i.e. which hyper-triangle to draw the sample from) with probabilities proportional to the triangle volumes. 2. Then, we sample uniformly from the chosen hypertriangle by sampling uniformly from the unit simplex of the appropriate dimension, and then computing the convex combination of the vertices of the hypertriangle according to that draw from the simplex.
The best reference (not exactly the same, but functionally equivalent) is [Trikalinos2014polytope]. A simple R implementation is available at https://github.com/gertvv/tesselample.
Initialize DelaunayPolytopeSampler.
- Parameters:
inequality_constraints (tuple[Tensor, Tensor] | None) – Tensors
(A, b)describing inequality constraintsA @ x <= b, whereAis an_ineq_con x d-dim Tensor andbis an_ineq_con x 1-dim Tensor, withn_ineq_conthe number of inequalities anddthe dimension of the sample space.equality_constraints (tuple[Tensor, Tensor] | None) – Tensors
(C, d)describing the equality constraintsC @ x = d, whereCis an_eq_con x d-dim Tensor anddis an_eq_con x 1-dim Tensor withn_eq_conthe number of equalities.bounds (Tensor | None) – A
2 x d-dim tensor of box bounds, whereinf(-inf) means that the respective dimension is unbounded from above (below).interior_point (Tensor | None) – A
d x 1-dim Tensor representing a point in the (relative) interior of the polytope. If omitted, determined automatically by solving a Linear Program.
Warning: The vertex enumeration performed in this algorithm can become extremely costly if there are a large number of inequalities. Similarly, the triangulation can get very expensive in high dimensions. Only use this algorithm for moderate dimensions / moderately complex constraint sets. An alternative is the
HitAndRunPolytopeSampler.
- botorch.utils.sampling.normalize_sparse_linear_constraints(bounds, constraints)[source]
Normalize sparse linear constraints to the unit cube.
- Parameters:
bounds (Tensor) – A
2 x d-dim tensor containing the box bounds.constraints (list[tuple[Tensor, Tensor, float]]) – A list of tuples (
indices,coefficients,rhs), withindicesandcoefficientsone-dimensional tensors andrhsa scalar, where each tuple encodes an inequality constraint of the form\sum_i (X[indices[i]] * coefficients[i]) >= rhsor\sum_i (X[indices[i]] * coefficients[i]) = rhs.
- Return type:
list[tuple[Tensor, Tensor, float]]
- botorch.utils.sampling.normalize_dense_linear_constraints(bounds, constraints)[source]
Normalize dense linear constraints to the unit cube.
- Parameters:
bounds (Tensor) – A
2 x d-dim tensor containing the box bounds.constraints (tuple[Tensor, Tensor]) – A tensor tuple
(A, b)describing constraintsA @ x (<)= b, whereAis an_con x d-dim Tensor andbis an_con x 1-dim Tensor, withn_conthe number of constraints anddthe dimension of the sample space.
- Returns:
A tensor tuple
(A_nlz, b_nlz)of normalized constraints.- Return type:
tuple[Tensor, Tensor]
- botorch.utils.sampling.get_polytope_samples(n, bounds, inequality_constraints=None, equality_constraints=None, seed=None, n_burnin=10000, n_thinning=32)[source]
Sample from polytope defined by box bounds and (in)equality constraints.
This uses a hit-and-run Markov chain sampler.
NOTE: Much of the functionality of this method has been moved into
HitAndRunPolytopeSampler. If you want to repeatedly draw samples, you should useHitAndRunPolytopeSamplerdirectly in order to avoid repeatedly running a burn-in of the chain. To do so, you need to convert the sparse constraint format thatget_polytope_samplesexpects to the dense constraint format thatHitAndRunPolytopeSamplerexpects. This can be done via thesparse_to_dense_constraintsmethod (but remember to adjust the constraint from theAx >= bformat expected here to theAx <= bformat expected byPolytopeSamplerby multiplying bothAandbby -1.)NOTE: This method does not support the kind of “inter-point constraints” that are supported by
optimize_acqf(). To achieve this behavior, you need define the problem on the joint space overqpoints and impose use constraints, see: https://github.com/meta-pytorch/botorch/issues/2468#issuecomment-2287706461- Parameters:
n (int) – The number of samples.
bounds (Tensor) – A
2 x d-dim tensor containing the box bounds.inequality_constraints (list[tuple[Tensor, Tensor, float]] | None) – A list of tuples (
indices,coefficients,rhs), withindicesandcoefficientsone-dimensional tensors andrhsa scalar, where each tuple encodes an inequality constraint of the form\sum_i (X[indices[i]] * coefficients[i]) >= rhs.equality_constraints (list[tuple[Tensor, Tensor, float]] | None) – A list of tuples (
indices,coefficients,rhs), withindicesandcoefficientsone-dimensional tensors andrhsa scalar, where each tuple encodes an equality constraint of the form\sum_i (X[indices[i]] * coefficients[i]) = rhs.seed (int | None) – The random seed.
n_burnin (int) – The number of burn-in samples for the Markov chain sampler.
n_thinning (int) – The amount of thinnning. This function will return every
n_thinning-th sample from the chain (after burn-in).
- Returns:
A
n x d-dim tensor of samples.- Return type:
Tensor
- botorch.utils.sampling.sparse_to_dense_constraints(d, constraints)[source]
Convert parameter constraints from a sparse format into a dense format.
This method converts sparse triples of the form (indices, coefficients, rhs) to constraints of the form Ax >= b or Ax = b.
- Parameters:
d (int) – The input dimension.
constraints (list[tuple[Tensor, Tensor, float]]) – A list of tuples (
indices,coefficients,rhs), withindicesandcoefficientsone-dimensional tensors andrhsa scalar, where each tuple encodes an (in)equality constraint of the form\sum_i (X[indices[i]] * coefficients[i]) >= rhsor\sum_i (X[indices[i]] * coefficients[i]) = rhs.
- Returns:
A: A
n_constraints x d-dim tensor of coefficients.b: A
n_constraints x 1-dim tensor of right hand sides.
- Return type:
A two-element tuple containing
- botorch.utils.sampling.optimize_posterior_samples(paths, bounds, raw_samples=1024, num_restarts=20, sample_transform=None, return_transformed=False)[source]
Cheaply maximizes posterior samples by random querying followed by gradient-based optimization using SciPy’s L-BFGS-B routine.
- Parameters:
paths (GenericDeterministicModel) – Random Fourier Feature-based sample paths from the GP
bounds (Tensor) – The bounds on the search space.
raw_samples (int) – The number of samples with which to query the samples initially.
num_restarts (int) – The number of points selected for gradient-based optimization.
sample_transform (Callable[[Tensor], Tensor] | None) – A callable transform of the sample outputs (e.g. MCAcquisitionObjective or ScalarizedPosteriorTransform.evaluate) used to negate the objective or otherwise transform the output.
return_transformed (bool) – A boolean indicating whether to return the transformed or non-transformed samples.
- Returns:
X_opt: A
num_optima x [batch_size] x d-dim tensor of optimal inputs x*.- f_opt: A
num_optima x [batch_size] x m-dim, optionally num_optima x [batch_size] x 1-dim, tensor of optimal outputs f*.
- f_opt: A
- Return type:
A two-element tuple containing
- botorch.utils.sampling.boltzmann_sample(function_values, num_samples, eta, replacement=False, temp_decrease=0.5)[source]
Perform Boltzmann sampling from a set of function values, weighted by the exponentiated difference between function values and their standardized mean.
- Parameters:
function_values (Tensor) – A
batch_shape x Ntensor of function values.num_samples (int) – The number of samples (restarts) to draw.
eta (float) – The Boltzmann temperature, controls the sharpness of the weighting. If the temperature is too high, causing NaN values, the eta parameter is succesively decreased by ‘temp_decrease’.
replacement (bool) – If True, samples are drawn with replacement, allowing duplicates.
temp_decrease (float) – The rate at which temperature decreases in case of inf weights.
Returns
positions. (A batch_shape x num_samples tensor of indices of sampled)
- botorch.utils.sampling.sample_truncated_normal_perturbations(X, n_discrete_points, sigma, bounds, qmc=True)[source]
Sample points around
X.Sample perturbed points around
Xsuch that the added perturbations are sampled from N(0, sigma^2 I) and truncated to be within [0,1]^d.- Parameters:
X (Tensor) – A
n x d-dim tensor starting points.n_discrete_points (int) – The number of points to sample.
sigma (float) – The standard deviation of the additive gaussian noise for perturbing the points.
bounds (Tensor) – A
2 x d-dim tensor containing the bounds.qmc (bool) – A boolean indicating whether to use qmc.
- Returns:
A
n_discrete_points x d-dim tensor containing the sampled points.- Return type:
Tensor
- botorch.utils.sampling.sample_perturbed_subset_dims(X, bounds, n_discrete_points, sigma=0.1, qmc=True, prob_perturb=None)[source]
Sample around
Xby perturbing a subset of the dimensions.By default, dimensions are perturbed with probability equal to
min(20 / d, 1). As shown in [Regis], perturbing a small number of dimensions can be beneificial. The perturbations are sampled from N(0, sigma^2 I) and truncated to be within [0,1]^d.- Parameters:
X (Tensor) – A
n x d-dim tensor starting points.Xmust be normalized to be within[0, 1]^d.bounds (Tensor) – The bounds to sample perturbed values from
n_discrete_points (int) – The number of points to sample.
sigma (float) – The standard deviation of the additive gaussian noise for perturbing the points.
qmc (bool) – A boolean indicating whether to use qmc.
prob_perturb (float | None) – The probability of perturbing each dimension. If omitted, defaults to
min(20 / d, 1).
- Returns:
A
n_discrete_points x d-dim tensor containing the sampled points.- Return type:
Tensor
Testing
- botorch.utils.testing.skip_if_import_error(func)[source]
- Parameters:
func (Callable)
- Return type:
Callable
- botorch.utils.testing.sample_random_feasible(f, dtype, device)[source]
Sample random feasible point for the given test function.
- Parameters:
f (BaseTestProblem) – The test function instance.
dtype (dtype) – The dtype of the random point.
device (device) – The device of the random point.
- Returns:
A random feasible point of shape
1 x f.dim.- Return type:
Tensor
- class botorch.utils.testing.BotorchTestCase(methodName='runTest')[source]
Bases:
TestCaseBasic test case for Botorch.
- This
sets the default device to be
torch.device("cpu")ensures that no warnings are suppressed by default.
Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.
- device = device(type='cpu')
- setUp(suppress_input_warnings=True)[source]
Set up the test case.
- Parameters:
suppress_input_warnings (bool) – If True, suppress common input warnings (see below).
- Return type:
None
- assertAllClose(input, other, rtol=1e-05, atol=1e-08, equal_nan=False)[source]
Assert that two tensors are close.
Calls torch.testing.assert_close, using the signature and default behavior of torch.allclose.
The formula asserted is abs(input - other) <= atol + rtol * abs(other).
- Parameters:
input (Any) – First tensor or tensor-or-scalar-like to compare
other (Any) – Second tensor or tensor-or-scalar-like to compare
rtol (float) – Relative tolerance
atol (float) – Absolute tolerance
equal_nan (bool) – If True, consider NaN values as equal
- Return type:
None
- Example output:
AssertionError: Scalars are not close!
Absolute difference: 1.0000034868717194 (up to 0.0001 allowed) Relative difference: 0.8348668001940709 (up to 1e-05 allowed)
- class botorch.utils.testing.BaseTestProblemTestCaseMixIn[source]
Bases:
objectMixin for testing BaseTestProblem (functions) implementations.
- test_forward_and_evaluate_true()[source]
Run every BaseTestProblem in
self.functionson random inputs. Runs bothforwardandevaluate_true.
- abstract property functions: Sequence[BaseTestProblem]
The functions that should be tested.
Typically defined as a class attribute on the test case subclassing this class.
- class botorch.utils.testing.SyntheticTestFunctionTestCaseMixin[source]
Bases:
objectMixin for testing synthetic
BaseTestProblemaka test functions.- test_optimal_value()[source]
Test that a function’s optimal_value is correctly computed, and defined if it should be.
- test_optimizer()[source]
Test that optimizers are correctly computed and the optimizer value is better than the function value at some random point.
- abstract property functions: Sequence[BaseTestProblem]
The functions that should be tested.
Typically defined as a class attribute on the test case subclassing this class.
- class botorch.utils.testing.MultiObjectiveTestProblemTestCaseMixin[source]
Bases:
objectMixin for testing multi-objective test problems.
This class provides test cases for attributes, maximum hypervolume, and reference points of multi-objective test problems.
- test_ref_point()[source]
Test the reference point (ref_point) attribute for each function (for each dtype).
- abstract property functions: Sequence[BaseTestProblem]
The functions that should be tested.
Typically defined as a class attribute on the test case subclassing this class.
- class botorch.utils.testing.ConstrainedTestProblemTestCaseMixin[source]
Bases:
objectMixin for testing constrained test problems.
This class provides test cases for attributes and methods of constrained test problems, including testing the number of constraints and the evaluation of constraint slack.
- test_evaluate_slack()[source]
Test the evaluate_slack method for each function.
This test verifies that:
- The evaluate_slack_true and evaluate_slack methods
return tensors of the expected shape
2. The relationship between evaluate_slack and evaluate_slack_true is consistent with the constraint_noise_std attribute of the function
- test_worst_feasible_value()[source]
Test that a function’s worst_feasible_value is correctly computed, and defined if it should be.
- abstract property functions: Sequence[BaseTestProblem]
The functions that should be tested.
Typically defined as a class attribute on the test case subclassing this class.
- class botorch.utils.testing.TestCorruptedProblemsMixin(methodName='runTest')[source]
Bases:
BotorchTestCaseMixin for testing corrupted test problems.
This class provides setup and utility functions for testing corrupted test problems using a specified outlier generator and a Rosenbrock problem.
Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.
- class botorch.utils.testing.MockPosterior(mean=None, variance=None, samples=None, base_shape=None, batch_range=None)[source]
Bases:
PosteriorThis class is used to simulate a posterior with specified mean, variance, and samples.
Everything is deterministic in this class.
Initialize the MockPosterior with specified attributes.
- Parameters:
mean (torch.Tensor | None) – The mean of the posterior.
variance (torch.Tensor | None) – The variance of the posterior.
samples (torch.Tensor | None) – Samples to return from
rsample, unlessbase_samplesis provided.base_shape (torch.Size | None) – If given, this is returned as
base_sample_shape, and also used as the base of the_extended_shape.batch_range (tuple[int, int] | None) – If given, this is returned as
batch_range. Defaults to (0, -2).
- property device: device
Return the device of the posterior.
- property dtype: dtype
Return the data type of the posterior.
- property batch_shape: Size
Return the batch shape of the posterior.
- property base_sample_shape: Size
Return the base sample shape of the posterior.
- property batch_range: tuple[int, int]
Return the batch range of the posterior.
- property mean
Return the mean of the posterior.
- property variance
Return the variance of the posterior.
- rsample(sample_shape=None)[source]
Return mock samples by extending the shape of the initially specified samples.
- Parameters:
sample_shape (Size | None) – The shape of the samples to generate.
- Returns:
A tensor of samples with the specified shape.
- Return type:
Tensor
- rsample_from_base_samples(sample_shape, base_samples)[source]
Sample from the posterior (with gradients) using base samples.
This is intended to be used with a sampler that produces the corresponding base samples, and enables acquisition optimization via Sample Average Approximation.
- Parameters:
sample_shape (Size) – A
torch.Sizeobject specifying the sample shape. To drawnsamples, set totorch.Size([n]). To drawbbatches ofnsamples each, set totorch.Size([b, n]).base_samples (Tensor) – The base samples, obtained from the appropriate sampler. This is a tensor of shape
sample_shape x base_sample_shape.
- Returns:
Samples from the posterior, a tensor of shape
self._extended_shape(sample_shape=sample_shape).- Return type:
Tensor
- botorch.utils.testing.get_sampler_mock(posterior, sample_shape, **kwargs)[source]
Get a
StochasticSamplerwith the specifiedsample_shape.- Parameters:
posterior (MockPosterior) – Used only for dispatching so that
get_samplerworks with aMockPosterior.sample_shape (Size) – The shape of the samples to generate.
kwargs (Any) – Passed to
StochasticSampler
- Returns:
A
StochasticSamplerfor the mock posterior.- Return type:
- class botorch.utils.testing.MockModel(posterior)[source]
Bases:
Model,FantasizeMixinMock
Modelthat implements dummy methods and feeds through specified outputs.Its
posterioris aMockPosterior.Initialize the MockModel with a specified posterior.
- Parameters:
posterior (MockPosterior) – The mock posterior to use for the model.
- posterior(X, output_indices=None, posterior_transform=None, observation_noise=False)[source]
Return the posterior of the model.
- Parameters:
X (Tensor) – Ignored; present for compatibility with super class.
output_indices (list[int] | None) – Ignored; present for compatibility with super class.
posterior_transform (PosteriorTransform | None) – Optional.
observation_noise (bool | Tensor) – Ignored; present for compatibility with super class.
- Returns:
The posterior of the model, possibly transformed.
- Return type:
- property num_outputs: int
Return the number of outputs of the model.
- property batch_shape: Size
Return the batch shape of the model.
- load_state_dict(state_dict=None, strict=False)[source]
Dummy method, has no effect.
- Parameters:
state_dict (OrderedDict | None) – The state dictionary to load.
strict (bool) – Whether to strictly enforce that the keys in state_dict match the keys returned by this module’s state_dict function.
- Return type:
None
- class botorch.utils.testing.MockAcquisitionFunction[source]
Bases:
objectMock acquisition function object that implements dummy methods.
Initialize the MockAcquisitionFunction. This function does not really do anything, but it takes an input of shape (b,q,d) and returns a tensor of shape (b,).
- botorch.utils.testing.get_random_data(batch_shape, m, d=1, n=10, **tkwargs)[source]
Generate random data for testing purposes.
- Parameters:
batch_shape (Size) – The batch shape of the data.
m (int) – The number of outputs.
d (int) – The dimension of the input.
n (int) – The number of data points.
tkwargs –
deviceanddtypetensor constructor kwargs.
- Returns:
A tuple
(train_X, train_Y)with randomly generated training data.- Return type:
tuple[Tensor, Tensor]
- botorch.utils.testing.get_test_posterior(batch_shape, q=1, m=1, interleaved=True, lazy=False, independent=False, **tkwargs)[source]
Generate a Posterior for testing purposes.
- Parameters:
batch_shape (Size) – The batch shape of the data.
q (int) – The number of candidates
m (int) – The number of outputs.
interleaved (bool) – A boolean indicating the format of the MultitaskMultivariateNormal
lazy (bool) – A boolean indicating if the posterior should be lazy
independent (bool) – A boolean indicating whether the outputs are independent
tkwargs –
deviceanddtypetensor constructor kwargs.
- Return type:
- botorch.utils.testing.get_max_violation_of_bounds(samples, bounds)[source]
The maximum value by which samples lie outside bounds.
A negative value indicates that all samples lie within bounds.
- Parameters:
samples (Tensor) – An
n x q x d- dimension tensor, as might be returned fromsample_q_batches_from_polytope.bounds (Tensor) – A
2 x dtensor of lower and upper bounds for each column.
- Return type:
float
- botorch.utils.testing.get_max_violation_of_constraints(samples, constraints, equality)[source]
Amount by which equality constraints are not obeyed.
- Parameters:
samples (Tensor) – An
n x q x d- dimension tensor, as might be returned fromsample_q_batches_from_polytope.constraints (list[tuple[Tensor, Tensor, float]] | None) – A list of tuples (indices, coefficients, rhs), with each tuple encoding an inequality constraint of the form
\sum_i (X[indices[i]] * coefficients[i]) = rhs, or>=ifequalityis False.equality (bool) – Whether these are equality constraints (not inequality).
- Return type:
float
Test Helpers
Dummy classes and other helpers that are used in multiple test files should be defined here to avoid relative imports.
- botorch.utils.test_helpers.get_model(train_X, train_Y, standardize_model=False, use_model_list=False, *, train_Yvar=None)[source]
- Parameters:
train_X (Tensor)
train_Y (Tensor)
standardize_model (bool)
use_model_list (bool)
train_Yvar (Tensor | None)
- Return type:
- botorch.utils.test_helpers.get_fully_bayesian_model(train_X, train_Y, num_models, standardize_model=False, infer_noise=True, **tkwargs)[source]
- Parameters:
train_X (Tensor)
train_Y (Tensor)
num_models (int)
standardize_model (bool)
infer_noise (bool)
tkwargs (Any)
- Return type:
- botorch.utils.test_helpers.get_fully_bayesian_model_list(train_X, train_Y, num_models, standardize_model, infer_noise, **tkwargs)[source]
- Parameters:
train_X (Tensor)
train_Y (Tensor)
num_models (int)
standardize_model (bool)
infer_noise (bool)
tkwargs (Any)
- Return type:
- botorch.utils.test_helpers.get_sample_moments(samples, sample_shape)[source]
Computes the mean and covariance of a set of samples.
- Parameters:
samples (Tensor) – A tensor of shape
sample_shape x batch_shape x q.sample_shape (Size) – The sample_shape input used while generating the samples using the pathwise sampling API.
- Return type:
tuple[Tensor, Tensor]
- botorch.utils.test_helpers.standardize_moments(transform, loc, covariance_matrix)[source]
Standardizes the loc and covariance_matrix using the mean and standard deviations from a Standardize transform.
- Parameters:
transform (Standardize)
loc (Tensor)
covariance_matrix (Tensor)
- Return type:
tuple[Tensor, Tensor]
- botorch.utils.test_helpers.gen_multi_task_dataset(yvar=None, task_values=None, skip_task_features_in_datasets=False, **tkwargs)[source]
Constructs a multi-task dataset with two tasks, each with 10 data points.
- Parameters:
yvar (float | None) – The noise level to use for
train_Yvar. If None, usestrain_Yvar=None.task_values (list[int] | None) – The values of the task features. If None, uses [0, 1].
skip_task_features_in_datasets (bool) – If True, the task features are not included in Xs of the datasets used to construct the datasets. This is useful for testing
MultiTaskDataset.
- Return type:
tuple[MultiTaskDataset, tuple[Tensor, Tensor, Tensor | None]]
- botorch.utils.test_helpers.get_pvar_expected(posterior, model, X, m)[source]
Computes the expected variance of a posterior after adding the predictive noise from the likelihood.
- Parameters:
posterior (TorchPosterior) – The posterior to compute the variance of. Must be a
TorchPosteriorobject.model (Model) – The model that generated the posterior. If
m > 1, this must be aBatchedMultiOutputGPyTorchModel.X (Tensor) – The test inputs.
m (int) – The number of outputs.
- Returns:
The expected variance of the posterior after adding the observation noise from the likelihood.
- Return type:
Tensor
- class botorch.utils.test_helpers.DummyNonScalarizingPosteriorTransform(*args, **kwargs)[source]
Bases:
PosteriorTransformInitialize internal Module state, shared by both nn.Module and ScriptModule.
- Parameters:
args (Any)
kwargs (Any)
- scalarize = False
- evaluate(Y, X=None)[source]
Evaluate the transform on a set of outcomes.
- Parameters:
Y (Tensor) – A
batch_shape x q x m-dim tensor of outcomes.X (Tensor | None) – A
batch_shape x q x d-dim tensor of inputs. Relevant only if the transform depends on the inputs explicitly.
- Returns:
A
batch_shape x q' [x m']-dim tensor of transformed outcomes.- Return type:
Tensor
- class botorch.utils.test_helpers.SimpleGPyTorchModel(train_X, train_Y, outcome_transform=None, input_transform=None)[source]
Bases:
GPyTorchModel,ExactGP,FantasizeMixin- Parameters:
train_X – A tensor of inputs, passed to self.transform_inputs.
train_Y – Passed to outcome_transform.
outcome_transform – Transform applied to train_Y.
input_transform – A Module that performs the input transformation, passed to self.transform_inputs.
- last_fantasize_flag: bool = False
- forward(x)[source]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
Torch
- class botorch.utils.torch.BufferDict(buffers=None)[source]
Bases:
ModuleHolds buffers in a dictionary.
BufferDict can be indexed like a regular Python dictionary, but buffers it contains are properly registered, and will be visible by all Module methods.
:class:
~torch.nn.BufferDictis an ordered dictionary that respectsthe order of insertion, and
in :meth:
~torch.nn.BufferDict.update, the order of the mergedOrderedDictor another :class:~torch.nn.BufferDict(the argument to :meth:~torch.nn.BufferDict.update).
Note that :meth:
~torch.nn.BufferDict.updatewith other unordered mapping types (e.g., Python’s plaindict) does not preserve the order of the merged mapping.- Parameters:
buffers (iterable, optional) – a mapping (dictionary) of (string : :class:
~torch.Tensor) or an iterable of key-value pairs of type (string, :class:~torch.Tensor)
Example:
class MyModule(nn.Module): def __init__(self): super(MyModule, self).__init__() self.buffers = nn.BufferDict({ 'left': torch.randn(5, 10), 'right': torch.randn(5, 10) }) def forward(self, x, choice): x = self.buffers[choice].mm(x) return x
- Parameters:
buffers – A mapping (dictionary) from string to :class:
~torch.Tensor, or an iterable of key-value pairs of type (string, :class:~torch.Tensor).
- pop(key)[source]
Remove key from the BufferDict and return its buffer.
- Parameters:
key (string) – key to pop from the BufferDict
- update(buffers)[source]
Update the :class:
~torch.nn.BufferDictwith the key-value pairs from a mapping or an iterable, overwriting existing keys.Note
If :attr:
buffersis anOrderedDict, a :class:~torch.nn.BufferDict, or an iterable of key-value pairs, the order of new elements in it is preserved.- Parameters:
buffers (iterable) – a mapping (dictionary) from string to :class:
~torch.Tensor, or an iterable of key-value pairs of type (string, :class:~torch.Tensor)
Transformations
Some basic data transformation helpers.
- botorch.utils.transforms.standardize(Y)[source]
Standardizes (zero mean, unit variance) a tensor by dim=-2.
If the tensor is single-dimensional, simply standardizes the tensor. If for some batch index all elements are equal (or if there is only a single data point), this function will return 0 for that batch index.
- Parameters:
Y (Tensor) – A
batch_shape x n x m-dim tensor.- Returns:
The standardized
Y.- Return type:
Tensor
Example
>>> Y = torch.rand(4, 3) >>> Y_standardized = standardize(Y)
- botorch.utils.transforms.normalize(X, bounds, update_constant_bounds=True)[source]
Min-max normalize X w.r.t. the provided bounds.
- Parameters:
X (Tensor) –
... x dtensor of databounds (Tensor) –
2 x dtensor of lower and upper bounds for each of the X’s d columns.update_constant_bounds (bool) – If
True, update the constant bounds in order to avoid division by zero issues. When the upper and lower bounds are identical for a dimension, that dimension will not be scaled. Such dimensions will only be shifted asnew_X[..., i] = X[..., i] - bounds[0, i].
- Returns:
- A
... x d-dim tensor of normalized data, given by (X - bounds[0]) / (bounds[1] - bounds[0]). If all elements ofXare contained withinbounds, the normalized values will be contained within[0, 1]^d.
- A
- Return type:
Tensor
Example
>>> X = torch.rand(4, 3) >>> bounds = torch.stack([torch.zeros(3), 0.5 * torch.ones(3)]) >>> X_normalized = normalize(X, bounds)
- botorch.utils.transforms.unnormalize(X, bounds, update_constant_bounds=True)[source]
Un-normalizes X w.r.t. the provided bounds.
- Parameters:
X (Tensor) –
... x dtensor of databounds (Tensor) –
2 x dtensor of lower and upper bounds for each of the X’s d columns.update_constant_bounds (bool) – If
True, update the constant bounds in order to avoid division by zero issues. When the upper and lower bounds are identical for a dimension, that dimension will not be scaled. Such dimensions will only be shifted asnew_X[..., i] = X[..., i] + bounds[0, i]. This is the inverse of the behavior ofnormalizewhenupdate_constant_bounds=True.
- Returns:
- A
... x d-dim tensor of unnormalized data, given by X * (bounds[1] - bounds[0]) + bounds[0]. If all elements ofXare contained in[0, 1]^d, the un-normalized values will be contained withinbounds.
- A
- Return type:
Tensor
Example
>>> X_normalized = torch.rand(4, 3) >>> bounds = torch.stack([torch.zeros(3), 0.5 * torch.ones(3)]) >>> X = unnormalize(X_normalized, bounds)
- botorch.utils.transforms.normalize_indices(indices, d)[source]
Normalize a list of indices to ensure that they are positive.
- Parameters:
indices (list[int] | None) – A list of indices (may contain negative indices for indexing “from the back”).
d (int) – The dimension of the tensor to index.
- Returns:
A normalized list of indices such that each index is between
0andd-1, or None if indices is None.- Return type:
list[int] | None
- botorch.utils.transforms.is_fully_bayesian(model)[source]
Check if at least one model is a fully Bayesian model.
- Parameters:
model (Model) – A BoTorch model (may be a
ModelListorModelListGP)- Returns:
True if at least one model is a fully Bayesian model.
- Return type:
bool
- botorch.utils.transforms.is_ensemble(model)[source]
Check if at least one model is an ensemble model.
- Parameters:
model (Model) – A BoTorch model (may be a
ModelListorModelListGP)- Returns:
True if at least one model is an ensemble model.
- Return type:
bool
- botorch.utils.transforms.t_batch_mode_transform(expected_q=None, assert_output_shape=True)[source]
Factory for decorators enabling consistent t-batch behavior.
This method creates decorators for instance methods to transform an input tensor
Xto t-batch mode (i.e. with at least 3 dimensions). This assumes the tensor has a q-batch dimension. The decorator also checks the q-batch size ifexpected_qis provided, and the output shape ifassert_output_shapeisTrue.- Parameters:
expected_q (int | None) – The expected q-batch size of
X. If specified, this will raise an AssertionError ifX’s q-batch size does not equal expected_q.assert_output_shape (bool) – If
True, this will raise an AssertionError if the output shape does not match either the t-batch shape ofX, or theacqf.model.batch_shapefor acquisition functions using batched models.
- Returns:
The decorated instance method.
- Return type:
Callable[[Callable[[AcquisitionFunction, Any], Any]], Callable[[AcquisitionFunction, Any], Any]]
Example
>>> class ExampleClass: >>> @t_batch_mode_transform(expected_q=1) >>> def single_q_method(self, X): >>> ... >>> >>> @t_batch_mode_transform() >>> def arbitrary_q_method(self, X): >>> ...
- botorch.utils.transforms.average_over_ensemble_models(method)[source]
Decorator for averaging acquisition values over ensemble models.
For example, if the model is an ensemble,
is_ensemble(model) == Truelike for a SAAS model, the acquisition value is averaged over the samples in the ensemble.NOTE: If the class has a
_logattribute, the acquisition value is averaged using logmeanexp instead of mean so that the log of the averaged acquisition value is averaged in a numerically stable way.- Parameters:
method (Callable[[AcquisitionFunction, Any], Any]) – The method to be decorated, usually
forward.- Returns:
The decorated method.
- Return type:
Callable[[AcquisitionFunction, Any], Any]
Example
>>> # Without decorator, forward returns a >>> # ``batch_shape x ensemble_shape`` tensor >>> class SimpleAcquisition: ... def forward(self, X): ... samples, obj = self._get_samples_and_objectives(X) ... # shape is ``sample_sample x batch_shape x ensemble_shape x q`` ... sample_acqvals = self._sample_forward(obj) ... # return shape is ``batch_shape x ensemble_shape`` ... return sample_acqvals.mean(dim=0).max(dim=-1) ... >>> # With decorator, forward returns a ``batch_shape``-dim tensor >>> class EnsembleAcquisition: ... @average_over_ensemble_models ... def forward(self, X): ... ... # same as above ... # return shape through decorator is ``batch_shape`` ... return sample_acqvals.mean(dim=0).max(dim=-1)
- botorch.utils.transforms.concatenate_pending_points(method)[source]
Decorator concatenating X_pending into an acquisition function’s argument.
This decorator works on the
forwardmethod of acquisition functions taking a tensorXas the argument. If the acquisition function has anX_pendingattribute (that is notNone), this is concatenated into the inputX, appropriately expanding the pending points to match the batch shape ofX.Example
>>> class ExampleAcquisitionFunction: >>> @concatenate_pending_points >>> @t_batch_mode_transform() >>> def forward(self, X): >>> ...
- Parameters:
method (Callable[[Any, Tensor], Any])
- Return type:
Callable[[Any, Tensor], Any]
- botorch.utils.transforms.match_batch_shape(X, Y)[source]
Matches the batch dimension of a tensor to that of another tensor.
- Parameters:
X (Tensor) – A
batch_shape_X x q x dtensor, whose batch dimensions that correspond to batch dimensions ofYare to be matched to those (if compatible).Y (Tensor) – A
batch_shape_Y x q' x dtensor.
- Returns:
A
batch_shape_Y x q x dtensor containing the data ofXexpanded to the batch dimensions ofY(if compatible). For instance, ifXisb'' x b' x q x dandYisb x q x d, then the returned tensor isb'' x b x q x d.- Return type:
Tensor
Example
>>> X = torch.rand(2, 1, 5, 3) >>> Y = torch.rand(2, 6, 4, 3) >>> X_matched = match_batch_shape(X, Y) >>> X_matched.shape torch.Size([2, 6, 5, 3])
Feasible Volume
- botorch.utils.feasible_volume.get_feasible_samples(samples, inequality_constraints=None)[source]
Checks which of the samples satisfy all of the inequality constraints.
- Parameters:
samples (Tensor) – A
sample size x dsize tensor of feature samples, where d is a feature dimension.constraints (inequality) – A list of tuples (indices, coefficients, rhs), with each tuple encoding an inequality constraint of the form
\sum_i (X[indices[i]] * coefficients[i]) >= rhs.inequality_constraints (list[tuple[Tensor, Tensor, float]] | None)
- Returns:
2-element tuple containing
Samples satisfying the linear constraints.
Estimated proportion of samples satisfying the linear constraints.
- Return type:
tuple[Tensor, float]
- botorch.utils.feasible_volume.get_outcome_feasibility_probability(model, X, outcome_constraints, threshold=0.1, nsample_outcome=1000, seed=None)[source]
Monte Carlo estimate of the feasible volume with respect to the outcome constraints.
- Parameters:
model (Model) – The model used for sampling the posterior.
X (Tensor) – A tensor of dimension
batch-shape x 1 x d, where d is feature dimension.outcome_constraints (list[Callable[[Tensor], Tensor]]) – A list of callables, each mapping a Tensor of dimension
sample_shape x batch-shape x q x mto a Tensor of dimensionsample_shape x batch-shape x q, where negative values imply feasibility.threshold (float) – A lower limit for the probability of posterior samples feasibility.
nsample_outcome (int) – The number of samples from the model posterior.
seed (int | None) – The seed for the posterior sampler. If omitted, use a random seed.
- Returns:
Estimated proportion of features for which posterior samples satisfy given outcome constraints with probability above or equal to the given threshold.
- Return type:
float
- botorch.utils.feasible_volume.estimate_feasible_volume(bounds, model, outcome_constraints, inequality_constraints=None, nsample_feature=1000, nsample_outcome=1000, threshold=0.1, verbose=False, seed=None, device=None, dtype=None)[source]
Monte Carlo estimate of the feasible volume with respect to feature constraints and outcome constraints.
- Parameters:
bounds (Tensor) – A
2 x dtensor of lower and upper bounds for each column ofX.model (Model) – The model used for sampling the outcomes.
outcome_constraints (list[Callable[[Tensor], Tensor]]) – A list of callables, each mapping a Tensor of dimension
sample_shape x batch-shape x q x mto a Tensor of dimensionsample_shape x batch-shape x q, where negative values imply feasibility.constraints (inequality) – A list of tuples (indices, coefficients, rhs), with each tuple encoding an inequality constraint of the form
\sum_i (X[indices[i]] * coefficients[i]) >= rhs.nsample_feature (int) – The number of feature samples satisfying the bounds.
nsample_outcome (int) – The number of outcome samples from the model posterior.
threshold (float) – A lower limit for the probability of outcome feasibility
seed (int | None) – The seed for both feature and outcome samplers. If omitted, use a random seed.
verbose (bool) – An indicator for whether to log the results.
inequality_constraints (list[tuple[Tensor, Tensor, float]] | None)
device (device | None)
dtype (dtype | None)
- Returns:
- Estimated proportion of volume in feature space that is
feasible wrt the bounds and the inequality constraints (linear).
- Estimated proportion of feasible features for which
posterior samples (outcome) satisfies the outcome constraints with probability above the given threshold.
- Return type:
2-element tuple containing
JAX Utilities
Utilities for converting between PyTorch tensors and JAX arrays.
Types and Type Hints
- class botorch.utils.types.DEFAULT
Bases:
object
Constants
- botorch.utils.constants.get_constants(values, device=None, dtype=None)[source]
Returns scalar-valued Tensors containing each of the given constants. Used to expedite tensor operations involving scalar arithmetic. Note that the returned Tensors should not be modified in-place.
- Parameters:
values (Number | Iterator[Number])
device (device | None)
dtype (dtype | None)
- Return type:
Tensor | tuple[Tensor, …]
Safe Math
Special implementations of mathematical functions that solve numerical issues of naive implementations.
- botorch.utils.safe_math.add(a, b, **kwargs)[source]
- Parameters:
a (Tensor)
b (Tensor)
- Return type:
Tensor
- botorch.utils.safe_math.log1mexp(x)[source]
Numerically accurate evaluation of log(1 - exp(x)) for x < 0. See [Maechler2012accurate] for details.
- Parameters:
x (Tensor)
- Return type:
Tensor
- botorch.utils.safe_math.log1pexp(x)[source]
Numerically accurate evaluation of log(1 + exp(x)). See [Maechler2012accurate] for details.
- Parameters:
x (Tensor)
- Return type:
Tensor
- botorch.utils.safe_math.logexpit(X)[source]
Computes the logarithm of the expit (a.k.a. sigmoid) function.
- Parameters:
X (Tensor)
- Return type:
Tensor
- botorch.utils.safe_math.logplusexp(a, b)[source]
Computes log(exp(a) + exp(b)) similar to logsumexp.
- Parameters:
a (Tensor)
b (Tensor)
- Return type:
Tensor
- botorch.utils.safe_math.logdiffexp(log_a, log_b)[source]
Computes log(b - a) accurately given log(a) and log(b). Assumes, log_b > log_a, i.e. b > a > 0.
- Parameters:
log_a (Tensor) – The logarithm of a, assumed to be less than log_b.
log_b (Tensor) – The logarithm of b, assumed to be larger than log_a.
- Returns:
A Tensor of values corresponding to log(b - a).
- Return type:
Tensor
- botorch.utils.safe_math.logsumexp(x, dim, keepdim=False)[source]
Version of logsumexp that has a well-behaved backward pass when x contains infinities.
In particular, the gradient of the standard torch version becomes NaN 1) for any element that is positive infinity, and 2) for any slice that only contains negative infinities.
This version returns a gradient of 1 for any positive infinities in case 1, and for all elements of the slice in case 2, in agreement with the asymptotic behavior of the function.
- Parameters:
x (Tensor) – The Tensor to which to apply
logsumexp.dim (int | tuple[int, ...]) – An integer or a tuple of integers, representing the dimensions to reduce.
keepdim (bool) – Whether to keep the reduced dimensions. Defaults to False.
- Returns:
A Tensor representing the log of the summed exponentials of
x.- Return type:
Tensor
- botorch.utils.safe_math.logmeanexp(X, dim, keepdim=False)[source]
Computes
log(mean(exp(X), dim=dim, keepdim=keepdim)).- Parameters:
X (Tensor) – Values of which to compute the logmeanexp.
dim (int | tuple[int, ...]) – The dimension(s) over which to compute the mean.
keepdim (bool) – If True, keeps the reduced dimensions.
- Returns:
A Tensor of values corresponding to
log(mean(exp(X), dim=dim)).- Return type:
Tensor
- botorch.utils.safe_math.log_softplus(x, tau=1.0)[source]
Computes the logarithm of the softplus function with high numerical accuracy.
- Parameters:
x (Tensor) – Input tensor, should have single or double precision floats.
tau (float | Tensor) – Decreasing tau increases the tightness of the approximation to ReLU. Non-negative and defaults to 1.0.
- Returns:
Tensor corresponding to
log(softplus(x)).- Return type:
Tensor
- botorch.utils.safe_math.smooth_amax(X, dim=-1, keepdim=False, tau=1.0)[source]
Computes a smooth approximation to
max(X, dim=dim), i.e the maximum value ofXover dimensiondim, using the logarithm of thel_(1/tau)norm ofexp(X). Note that whenX = log(U)is the logarithm of an acquisition utilityU,logsumexp(log(U) / tau) * tau = log(sum(U^(1/tau))^tau)= log(norm(U, ord=(1/tau))- Parameters:
X (Tensor) – A Tensor from which to compute the smoothed amax.
dim (int | tuple[int, ...]) – The dimensions to reduce over.
keepdim (bool) – If True, keeps the reduced dimensions.
tau (float | Tensor) – Temperature parameter controlling the smooth approximation to max operator, becomes tighter as tau goes to 0. Needs to be positive.
- Returns:
A Tensor of smooth approximations to
max(X, dim=dim).- Return type:
Tensor
- botorch.utils.safe_math.smooth_amin(X, dim=-1, keepdim=False, tau=1.0)[source]
A smooth approximation to
min(X, dim=dim), similar tosmooth_amax.- Parameters:
X (Tensor)
dim (int | tuple[int, ...])
keepdim (bool)
tau (float | Tensor)
- Return type:
Tensor
- botorch.utils.safe_math.check_dtype_float32_or_float64(X)[source]
- Parameters:
X (Tensor)
- Return type:
None
- botorch.utils.safe_math.log_fatplus(x, tau=1.0)[source]
Computes the logarithm of the fat-tailed softplus.
NOTE: Separated out in case the complexity of the
logimplementation increases in the future.- Parameters:
x (Tensor)
tau (float | Tensor)
- Return type:
Tensor
- botorch.utils.safe_math.fatplus(x, tau=1.0)[source]
Computes a fat-tailed approximation to
ReLU(x) = max(x, 0)by linearly combining a regular softplus function and the density function of a Cauchy distribution. The coefficientalphaof the Cauchy density is chosen to guarantee monotonicity and convexity.- Parameters:
x (Tensor) – A Tensor on whose values to compute the smoothed function.
tau (float | Tensor) – Temperature parameter controlling the smoothness of the approximation.
- Returns:
A Tensor of values of the fat-tailed softplus.
- Return type:
Tensor
- botorch.utils.safe_math.fatmax(x, dim, keepdim=False, tau=1.0, alpha=2.0)[source]
Computes a smooth approximation to amax(X, dim=dim) with a fat tail.
- Parameters:
X – A Tensor from which to compute the smoothed maximum.
dim (int | tuple[int, ...]) – The dimensions to reduce over.
keepdim (bool) – If True, keeps the reduced dimensions.
tau (float | Tensor) – Temperature parameter controlling the smooth approximation to max operator, becomes tighter as tau goes to 0. Needs to be positive.
alpha (float) – The exponent of the asymptotic power decay of the approximation. The default value is 2. Higher alpha parameters make the function behave more similarly to the standard logsumexp approximation to the max, so it is recommended to keep this value low or moderate, e.g. < 10.
x (Tensor)
- Returns:
A Tensor of smooth approximations to
amax(X, dim=dim)with a fat tail.- Return type:
Tensor
- botorch.utils.safe_math.fatmin(x, dim, keepdim=False, tau=1.0, alpha=2.0)[source]
Computes a smooth approximation to amin(X, dim=dim) with a fat tail.
- Parameters:
X – A Tensor from which to compute the smoothed minimum.
dim (int | tuple[int, ...]) – The dimensions to reduce over.
keepdim (bool) – If True, keeps the reduced dimensions.
tau (float | Tensor) – Temperature parameter controlling the smooth approximation to min operator, becomes tighter as tau goes to 0. Needs to be positive.
alpha (float) – The exponent of the asymptotic power decay of the approximation. The default value is 2. Higher alpha parameters make the function behave more similarly to the standard logsumexp approximation to the max, so it is recommended to keep this value low or moderate, e.g. < 10.
x (Tensor)
- Returns:
A Tensor of smooth approximations to
amin(X, dim=dim)with a fat tail.- Return type:
Tensor
- botorch.utils.safe_math.fatmaximum(a, b, tau=1.0, alpha=2.0)[source]
Computes a smooth approximation to torch.maximum(a, b) with a fat tail.
- Parameters:
a (Tensor) – The first Tensor from which to compute the smoothed component-wise maximum.
b (Tensor) – The second Tensor from which to compute the smoothed component-wise maximum.
tau (float | Tensor) – Temperature parameter controlling the smoothness of the approximation. A smaller tau corresponds to a tighter approximation that leads to a sharper objective landscape that might be more difficult to optimize.
alpha (float) – The exponent of the asymptotic power decay of the approximation. The default value is 2. Higher alpha parameters make the function behave more similarly to the standard logsumexp approximation to the max, so it is recommended to keep this value low or moderate, e.g. < 10.
- Returns:
A smooth approximation of torch.maximum(a, b).
- Return type:
Tensor
- botorch.utils.safe_math.fatminimum(a, b, tau=1.0, alpha=2.0)[source]
Computes a smooth approximation to torch.minimum(a, b) with a fat tail.
- Parameters:
a (Tensor) – The first Tensor from which to compute the smoothed component-wise minimum.
b (Tensor) – The second Tensor from which to compute the smoothed component-wise minimum.
tau (float | Tensor) – Temperature parameter controlling the smoothness of the approximation. A smaller tau corresponds to a tighter approximation that leads to a sharper objective landscape that might be more difficult to optimize.
alpha (float) – The exponent of the asymptotic power decay of the approximation. The default value is 2. Higher alpha parameters make the function behave more similarly to the standard logsumexp approximation to the max, so it is recommended to keep this value low or moderate, e.g. < 10.
- Returns:
A smooth approximation of torch.minimum(a, b).
- Return type:
Tensor
- botorch.utils.safe_math.log_fatmoid(X, tau=1.0)[source]
Computes the logarithm of the fatmoid. Separated out in case the implementation of the logarithm becomes more complex in the future to ensure numerical stability.
- Parameters:
X (Tensor)
tau (float | Tensor)
- Return type:
Tensor
- botorch.utils.safe_math.fatmoid(X, tau=1.0)[source]
Computes a twice continuously differentiable approximation to the Heaviside step function with a fat tail, i.e.
O(1 / x^2)asxgoes to -inf.- Parameters:
X (Tensor) – A Tensor from which to compute the smoothed step function.
tau (float | Tensor) – Temperature parameter controlling the smoothness of the approximation.
- Returns:
A tensor of fat-tailed approximations to the Heaviside step function.
- Return type:
Tensor
- botorch.utils.safe_math.cauchy(x)[source]
Computes a Lorentzian, i.e. an un-normalized Cauchy density function.
- Parameters:
x (Tensor)
- Return type:
Tensor
- botorch.utils.safe_math.sigmoid(X, log=False, fat=False)[source]
A sigmoid function with an optional fat tail and evaluation in log space for better numerical behavior. Notably, the fat-tailed sigmoid can be used to remedy numerical underflow problems in the value and gradient of the canonical sigmoid.
- Parameters:
X (Tensor) – The Tensor on which to evaluate the sigmoid.
log (bool) – Toggles the evaluation of the log sigmoid.
fat (bool) – Toggles the evaluation of the fat-tailed sigmoid.
- Returns:
A Tensor of (log-)sigmoid values.
- Return type:
Tensor
Multi-Objective Utilities
Abstract Box Decompositions
Box decomposition algorithms.
References
- class botorch.utils.multi_objective.box_decompositions.box_decomposition.BoxDecomposition(ref_point, sort, Y=None)[source]
Bases:
Module,ABCAn abstract class for box decompositions.
Note: Internally, we store the negative reference point (minimization).
Initialize BoxDecomposition.
- Parameters:
ref_point (Tensor) – A
m-dim tensor containing the reference point.sort (bool) – A boolean indicating whether to sort the Pareto frontier.
Y (Tensor | None) – A
(batch_shape) x n x m-dim tensor of outcomes.
- property pareto_Y: Tensor
This returns the non-dominated set.
- Returns:
A
n_pareto x m-dim tensor of outcomes.
- property ref_point: Tensor
Get the reference point.
- Returns:
A
m-dim tensor of outcomes.
- property Y: Tensor
Get the raw outcomes.
- Returns:
A
n x m-dim tensor of outcomes.
- abstractmethod get_hypercell_bounds()[source]
Get the bounds of each hypercell in the decomposition.
- Returns:
- A
2 x num_cells x num_outcomes-dim tensor containing the lower and upper vertices bounding each hypercell.
- A
- Return type:
Tensor
- class botorch.utils.multi_objective.box_decompositions.box_decomposition.FastPartitioning(ref_point, Y=None)[source]
Bases:
BoxDecomposition,ABCA class for partitioning the (non-)dominated space into hyper-cells.
Note: this assumes maximization. Internally, it multiplies outcomes by -1 and performs the decomposition under minimization.
This class is abstract to support to two applications of Alg 1 from [Lacour17]: 1) partitioning the space that is dominated by the Pareto frontier and 2) partitioning the space that is not dominated by the Pareto frontier.
- Parameters:
ref_point (Tensor) – A
m-dim tensor containing the reference point.Y (Tensor | None) – A
(batch_shape) x n x m-dim tensor
Box Decomposition List
Box decomposition container.
- class botorch.utils.multi_objective.box_decompositions.box_decomposition_list.BoxDecompositionList(*box_decompositions)[source]
Bases:
ModuleA list of box decompositions.
Initialize the box decomposition list.
- Parameters:
*box_decompositions (BoxDecomposition) – An variable number of box decompositions
Example
>>> bd1 = FastNondominatedPartitioning(ref_point, Y=Y1) >>> bd2 = FastNondominatedPartitioning(ref_point, Y=Y2) >>> bd = BoxDecompositionList(bd1, bd2)
- property pareto_Y: list[Tensor]
This returns the non-dominated set.
Note: Internally, we store the negative pareto set (minimization).
- Returns:
- A list where the ith element is the
n_pareto_i x m-dim tensor of pareto optimal outcomes for each box_decomposition
i.
- A list where the ith element is the
- property ref_point: Tensor
Get the reference point.
Note: Internally, we store the negative reference point (minimization).
- Returns:
A
n_box_decompositions x m-dim tensor of outcomes.
- get_hypercell_bounds()[source]
Get the bounds of each hypercell in the decomposition.
- Returns:
- A
2 x n_box_decompositions x num_cells x num_outcomes-dim tensor containing the lower and upper vertices bounding each hypercell.
- A
- Return type:
Tensor
Box Decomposition Utilities
Utilities for box decomposition algorithms.
- botorch.utils.multi_objective.box_decompositions.utils.compute_local_upper_bounds(U, Z, z)[source]
Compute local upper bounds.
Note: this assumes minimization.
This uses the incremental algorithm (Alg. 1) from [Lacour17].
- Parameters:
U (Tensor) – A
n x m-dim tensor containing the local upper bounds.Z (Tensor) – A
n x m x m-dim tensor containing the defining points.z (Tensor) – A
m-dim tensor containing the new point.
- Returns:
A new
n' x m-dim tensor local upper bounds.A
n' x m x m-dim tensor containing the defining points.
- Return type:
2-element tuple containing
- botorch.utils.multi_objective.box_decompositions.utils.get_partition_bounds(Z, U, ref_point)[source]
Get the cell bounds given the local upper bounds and the defining points.
This implements Equation 2 in [Lacour17].
- Parameters:
Z (Tensor) – A
n x m x m-dim tensor containing the defining points. The first dimension corresponds to u_idx, the second dimension corresponds to j, and Z[u_idx, j] is the set of definining points Z^j(u) where u = U[u_idx].U (Tensor) – A
n x m-dim tensor containing the local upper bounds.ref_point (Tensor) – A
m-dim tensor containing the reference point.
- Returns:
- A
2 x num_cells x m-dim tensor containing the lower and upper vertices bounding each hypercell.
- A
- Return type:
Tensor
- botorch.utils.multi_objective.box_decompositions.utils.update_local_upper_bounds_incremental(new_pareto_Y, U, Z)[source]
Update the current local upper with the new pareto points.
This assumes minimization.
- Parameters:
new_pareto_Y (Tensor) – A
n x m-dim tensor containing the new Pareto points.U (Tensor) – A
n' x m-dim tensor containing the local upper bounds.Z (Tensor) – A
n x m x m-dim tensor containing the defining points.
- Returns:
A new
n' x m-dim tensor local upper bounds.A
n' x m x m-dim tensor containing the defining points
- Return type:
2-element tuple containing
- botorch.utils.multi_objective.box_decompositions.utils.compute_non_dominated_hypercell_bounds_2d(pareto_Y_sorted, ref_point)[source]
Compute an axis-aligned partitioning of the non-dominated space for 2 objectives.
- Parameters:
pareto_Y_sorted (Tensor) – A
(batch_shape) x n_pareto x 2-dim tensor of pareto outcomes that are sorted by the 0th dimension in increasing order. All points must be better than the reference point.ref_point (Tensor) – A
(batch_shape) x 2-dim reference point.
- Returns:
A
2 x (batch_shape) x n_pareto + 1 x m-dim tensor of cell bounds.- Return type:
Tensor
- botorch.utils.multi_objective.box_decompositions.utils.compute_dominated_hypercell_bounds_2d(pareto_Y_sorted, ref_point)[source]
Compute an axis-aligned partitioning of the dominated space for 2-objectives.
- Parameters:
pareto_Y_sorted (Tensor) – A
(batch_shape) x n_pareto x 2-dim tensor of pareto outcomes that are sorted by the 0th dimension in increasing order.ref_point (Tensor) – A
2-dim reference point.
- Returns:
A
2 x (batch_shape) x n_pareto x m-dim tensor of cell bounds.- Return type:
Tensor
Dominated Partitionings
Algorithms for partitioning the dominated space into hyperrectangles.
- class botorch.utils.multi_objective.box_decompositions.dominated.DominatedPartitioning(ref_point, Y=None)[source]
Bases:
FastPartitioningPartition dominated space into axis-aligned hyperrectangles.
This uses the Algorithm 1 from [Lacour17].
Example
>>> bd = DominatedPartitioning(ref_point, Y)
- Parameters:
ref_point (Tensor) – A
m-dim tensor containing the reference point.Y (Tensor | None) – A
(batch_shape) x n x m-dim tensor
Hypervolume
Hypervolume Utilities.
References
C. M. Fonseca, L. Paquete, and M. Lopez-Ibanez. An improved dimension-sweep algorithm for the hypervolume indicator. In IEEE Congress on Evolutionary Computation, pages 1157-1163, Vancouver, Canada, July 2006.
H. Ishibuchi, N. Akedo, and Y. Nojima. A many-objective test problem for visually examining diversity maintenance behavior in a decision space. Proc. 13th Annual Conf. Genetic Evol. Comput., 2011.
- botorch.utils.multi_objective.hypervolume.infer_reference_point(pareto_Y, max_ref_point=None, scale=0.1, scale_max_ref_point=False)[source]
Get reference point for hypervolume computations.
This sets the reference point to be
ref_point = nadir - scale * rangewhen there is nopareto_Ythat is better thanmax_ref_point. If there’spareto_Ybetter thanmax_ref_point, the reference point will be set tomax_ref_point - scale * rangeifscale_max_ref_pointis true and tomax_ref_pointotherwise.[Ishibuchi2011] find 0.1 to be a robust multiplier for scaling the nadir point.
Note: this assumes maximization of all objectives.
- Parameters:
pareto_Y (Tensor) – A
n x m-dim tensor of Pareto-optimal points.max_ref_point (Tensor | None) – A
mdim tensor indicating the maximum reference point. Some elements can be NaN, except whenpareto_Yis empty, in which case these dimensions will be treated as if nomax_ref_pointwas provided and set tonadir - scale * range.scale (float) – A multiplier used to scale back the reference point based on the range of each objective.
scale_max_ref_point (bool) – A boolean indicating whether to apply scaling to the max_ref_point based on the range of each objective.
- Returns:
A
m-dim tensor containing the reference point.- Return type:
Tensor
- class botorch.utils.multi_objective.hypervolume.Hypervolume(ref_point)[source]
Bases:
objectHypervolume computation dimension sweep algorithm from [Fonseca2006].
Adapted from Simon Wessing’s implementation of the algorithm (Variant 3, Version 1.2) in [Fonseca2006] in PyMOO: https://github.com/msu-coinlab/pymoo/blob/master/pymoo/vendor/hv.py
Maximization is assumed.
TODO: write this in C++ for faster looping.
Initialize hypervolume object.
- Parameters:
ref_point (Tensor) –
m-dim Tensor containing the reference point.
- property ref_point: Tensor
Get reference point (for maximization).
- Returns:
A
m-dim tensor containing the reference point.
- botorch.utils.multi_objective.hypervolume.sort_by_dimension(nodes, i)[source]
Sorts the list of nodes in-place by the specified objective.
- Parameters:
nodes (list[Node]) – A list of Nodes
i (int) – The index of the objective to sort by
- Return type:
None
- class botorch.utils.multi_objective.hypervolume.Node(m, dtype, device, data=None)[source]
Bases:
objectNode in the MultiList data structure.
Initialize MultiList.
- Parameters:
m (int) – The number of objectives
dtype (torch.dtype) – The dtype
device (torch.device) – The device
data (Tensor | None) – The tensor data to be stored in this Node.
- class botorch.utils.multi_objective.hypervolume.MultiList(m, dtype, device)[source]
Bases:
objectA special data structure used in hypervolume computation.
It consists of several doubly linked lists that share common nodes. Every node has multiple predecessors and successors, one in every list.
Initialize
mdoubly linked lists.- Parameters:
m (int) – number of doubly linked lists
dtype (torch.dtype) – the dtype
device (torch.device) – the device
- append(node, index)[source]
Appends a node to the end of the list at the given index.
- Parameters:
node (Node) – the new node
index (int) – the index where the node should be appended.
- Return type:
None
- extend(nodes, index)[source]
Extends the list at the given index with the nodes.
- Parameters:
nodes (list[Node]) – list of nodes to append at the given index.
index (int) – the index where the nodes should be appended.
- Return type:
None
- reinsert(node, index, bounds)[source]
Re-inserts the node at its original position.
Re-inserts the node at its original position in all lists in [0, ‘index’] before it was removed. This method assumes that the next and previous nodes of the node that is reinserted are in the list.
- Parameters:
node (Node) – The node
index (int) – The upper bound on the range of indices
bounds (Tensor) – A
2 x m-dim tensor bounds on the objectives
- Return type:
None
- class botorch.utils.multi_objective.hypervolume.SubsetIndexCachingMixin[source]
Bases:
objectA Mixin class that adds q-subset index computations and caching.
Initializes the class with q_out = -1 and an empty q_subset_indices dict.
- compute_q_subset_indices(q_out, device)[source]
Returns and caches a dict of indices equal to subsets of
{1, ..., q_out}.This means that consecutive calls to
self.compute_q_subset_indiceswith the sameq_outdo not recompute the indices for all (2^q_out - 1) subsets.NOTE: This will use more memory than regenerating the indices for each i and then deleting them, but it will be faster for repeated evaluations (e.g. during optimization).
- Parameters:
q_out (int) – The batch size of the objectives. This is typically equal to the q-batch size of
X. However, if using a set valued objective (e.g., MVaR) that producessobjective values for each point on the q-batch ofX, we need to properly account for each objective while calculating the hypervolume contributions by usingq_out = q * s.device (torch.device)
- Returns:
A dict that maps “q choose i” to all size-i subsets of
{1, ..., q_out}.- Return type:
BufferDict[str, Tensor]
- botorch.utils.multi_objective.hypervolume.compute_subset_indices(q, device=None)[source]
Compute all (2^q - 1) distinct subsets of {1, …,
q}.- Parameters:
q (int) – An integer defininig the set {1, …,
q} whose subsets to compute.device (torch.device | None)
- Returns:
A dict that maps “q choose i” to all size-i subsets of {1, …,
q_out}.- Return type:
BufferDict[str, Tensor]
- class botorch.utils.multi_objective.hypervolume.NoisyExpectedHypervolumeMixin(model, ref_point, X_baseline, sampler=None, objective=None, constraints=None, X_pending=None, prune_baseline=False, alpha=0.0, cache_pending=True, max_iep=0, incremental_nehvi=True, cache_root=None, marginalize_dim=None)[source]
Bases:
CachedCholeskyMCSamplerMixinInitialize a mixin that contains functions for the batched Pareto-frontier partitioning used by the noisy hypervolume-improvement-based acquisition functions, i.e. qNEHVI and qLogNEHVI.
- Parameters:
model (Model) – A fitted model.
ref_point (list[float] | Tensor) – A list or tensor with
melements representing the reference point (in the outcome space) w.r.t. to which compute the hypervolume. This is a reference point for the objective values (i.e. after applyingobjectiveto the samples).X_baseline (Tensor) – A
r x d-dim Tensor ofrdesign points that have already been observed. These points are considered as potential approximate pareto-optimal design points.sampler (MCSampler | None) – The sampler used to draw base samples. If not given, a sampler is generated using
get_sampler. NOTE: A box decomposition of the Pareto front is created for each MC sample, an operation that scales asO(n^m)and thus becomes particularly costly form> 2.objective (MCMultiOutputObjective | None) – The MCMultiOutputObjective under which the samples are evaluated. Defaults to
IdentityMCMultiOutputObjective().constraints (list[Callable[[Tensor], Tensor]] | None) – A list of callables, each mapping a Tensor of dimension
sample_shape x batch-shape x q x mto a Tensor of dimensionsample_shape x batch-shape x q, where negative values imply feasibility. The acquisition function will compute expected feasible hypervolume.X_pending (Tensor | None) – A
batch_shape x m x d-dim Tensor ofmdesign points that have been submitted for function evaluation, but have not yet been evaluated.prune_baseline (bool) – If True, remove points in
X_baselinethat are highly unlikely to be the pareto optimal and better than the reference point. This can significantly improve computation time and is generally recommended. In order to customize pruning parameters, instead manually callprune_inferior_points_multi_objectiveonX_baselinebefore instantiating the acquisition function.alpha (float) – The hyperparameter controlling the approximate non-dominated partitioning. The default value of 0.0 means an exact partitioning is used. As the number of objectives
mincreases, consider increasing this parameter in order to limit computational complexity.cache_pending (bool) – A boolean indicating whether to use cached box decompositions (CBD) for handling pending points. This is generally recommended.
max_iep (int) – The maximum number of pending points before the box decompositions will be recomputed.
incremental_nehvi (bool) – A boolean indicating whether to compute the incremental NEHVI from the
i``th point where ``i=1, ..., qunder sequential greedy optimization, or the full qNEHVI overqpoints.cache_root (bool | None) – A boolean indicating whether to cache the root decomposition over
X_baselineand use low-rank updates.marginalize_dim (int | None) – A batch dimension that should be marginalized. For example, this is useful when using a batched fully Bayesian model.
- property X_baseline: Tensor
Return X_baseline augmented with pending points cached using CBD.
- botorch.utils.multi_objective.hypervolume.get_hypervolume_maximizing_subset(n, Y, ref_point)[source]
Find an approximately hypervolume-maximizing subset of size
n.This greedily selects points from Y to maximize the hypervolume of the subset sequentially. This has bounded error since hypervolume is submodular.
- Parameters:
n (int) – The size of the subset to return.
Y (Tensor) – A
n' x m-dim tensor of outcomes.ref_point (Tensor) – A
m-dim tensor containing the reference point.
- Returns:
- A two-element tuple containing
A
n x m-dim tensor of outcomes.A
n-dim tensor of indices of the outcomes in the original set.
- Return type:
tuple[Tensor, Tensor]
Non-dominated Partitionings
Algorithms for partitioning the non-dominated space into rectangles.
References
I. Couckuyt, D. Deschrijver and T. Dhaene, “Towards Efficient Multiobjective Optimization: Multiobjective statistical criterions,” 2012 IEEE Congress on Evolutionary Computation, Brisbane, QLD, 2012, pp. 1-8.
S. Watanabe. “Approximation of Box Decomposition Algorithm for Fast Hypervolume-Based Multi-Objective Optimization,” arXiv preprint arXiv:2512.05825. 2025.
- class botorch.utils.multi_objective.box_decompositions.non_dominated.NondominatedPartitioning(ref_point, Y=None, alpha=0.0)[source]
Bases:
BoxDecompositionA class for partitioning the non-dominated space into hyper-cells.
Note: this assumes maximization. Internally, it multiplies outcomes by -1 and performs the decomposition under minimization. TODO: use maximization internally as well.
Note: it is only feasible to use this algorithm to compute an exact decomposition of the non-dominated space for
m<5objectives (alpha=0.0).The alpha parameter can be increased to obtain an approximate partitioning faster. The
alphais a fraction of the total hypervolume encapsuling the entire Pareto set. When a hypercell’s volume divided by the total hypervolume is less thanalpha, we discard the hypercell. See Figure 2 in [Watanabe2025] for a visual representation.This PyTorch implementation of the binary partitioning algorithm ([Couckuyt2012]) is adapted from numpy/tensorflow implementation at: https://github.com/GPflow/GPflowOpt/blob/master/gpflowopt/pareto.py.
TODO: replace this with a more efficient decomposition. E.g. https://link.springer.com/content/pdf/10.1007/s10898-019-00798-7.pdf
Initialize NondominatedPartitioning.
- Parameters:
ref_point (Tensor) – A
m-dim tensor containing the reference point.Y (Tensor | None) – A
(batch_shape) x n x m-dim tensor.alpha (float) – A thresold fraction of total volume used in an approximate decomposition.
Example
>>> bd = NondominatedPartitioning(ref_point, Y=Y1)
- class botorch.utils.multi_objective.box_decompositions.non_dominated.FastNondominatedPartitioning(ref_point, Y=None)[source]
Bases:
FastPartitioningA class for partitioning the non-dominated space into hyper-cells.
Note: this assumes maximization. Internally, it multiplies by -1 and performs the decomposition under minimization.
This class is far more efficient than NondominatedPartitioning for exact box partitionings
- This class uses the two-step approach similar to that in [Yang2019], where:
- first, Alg 1 from [Lacour17] is used to find the local lower bounds
for the maximization problem
- second, the local lower bounds are used as the Pareto frontier for the
minimization problem, and [Lacour17] is applied again to partition the space dominated by that Pareto frontier.
Initialize FastNondominatedPartitioning.
- Parameters:
ref_point (Tensor) – A
m-dim tensor containing the reference point.Y (Tensor | None) – A
(batch_shape) x n x m-dim tensor.
Example
>>> bd = FastNondominatedPartitioning(ref_point, Y=Y1)
Optimize
- class botorch.utils.multi_objective.optimize.DiscreteParameterRepair(discrete_choices)[source]
Bases:
RepairPymoo Repair operator that rounds discrete parameters to valid values.
This repair operator is applied after each generation to ensure that discrete parameters are snapped to their nearest allowed values.
Initialize the repair operator.
- Parameters:
discrete_choices (dict[int, list[float]]) – A mapping from dimension index to allowed discrete values. Only dimensions in this mapping will be rounded.
- class botorch.utils.multi_objective.optimize.BotorchPymooProblem(n_var, n_obj, xl, xu, acqf, dtype, device, ref_point=None, objective=None, constraints=None, inequality_constraints=None)[source]
Bases:
ProblemPyMOO problem for optimizing the model posterior mean using NSGA-II.
This is instantiated and used within
optimize_with_nsgaiito define the optimization problem to interface with pymoo.This assumes maximization of all objectives.
- Parameters:
n_var (int) – The number of tunable parameters (
d).n_obj (int) – The number of objectives.
xl (np.ndarray) – A
d-dim np.ndarray of lower bounds for each tunable parameter.xu (np.ndarray) – A
d-dim np.ndarray of upper bounds for each tunable parameter.acqf (MultiOutputAcquisitionFunction) – A MultiOutputAcquisitionFunction.
dtype (torch.dtype) – The torch dtype.
device (torch.device) – The torch device.
acqf – The acquisition function to optimize.
ref_point (Tensor | None) – A list or tensor with
melements representing the reference point (in the outcome space), which is treated as a lower bound on the objectives, after applyingobjectiveto the samples.objective (MCMultiOutputObjective | None) – The MCMultiOutputObjective under which the samples are evaluated. Defaults to
IdentityMultiOutputObjective(). This can be used to determine which outputs of the MultiOutputAcquisitionFunction should be used as objectives/constraints in NSGA-II.constraints (list[Callable[[Tensor], Tensor]] | None) – A list of callables, each mapping a Tensor of dimension
sample_shape x batch-shape x q x mto a Tensor of dimensionsample_shape x batch-shape x q, where negative values imply feasibility.inequality_constraints (list[tuple[Tensor, Tensor, float]] | None) – A list of tuples (indices, coefficients, rhs), representing inequality constraints of the form
sum_i (X[indices[i]] * coefficients[i]) >= rhs. These are parameter-space constraints (as opposed to outcome-space constraints).
- botorch.utils.multi_objective.optimize.optimize_with_nsgaii(acq_function, bounds, num_objectives, q=None, ref_point=None, objective=None, constraints=None, inequality_constraints=None, population_size=250, max_gen=None, seed=None, fixed_features=None, max_attempts=2, discrete_choices=None, post_processing_func=None)[source]
Optimize the posterior mean via NSGA-II, returning the Pareto set and front.
This assumes maximization of all objectives.
- Parameters:
acq_function (MultiOutputAcquisitionFunction) – The MultiOutputAcquisitionFunction to optimize.
bounds (Tensor) – A
2 x dtensor of lower and upper bounds for each column ofX.q (int | None) – The number of candidates. If None, return the full population.
num_objectives (int) – The number of objectives.
ref_point (list[float] | Tensor | None) – A list or tensor with
melements representing the reference point (in the outcome space), which is treated as a lower bound on the objectives, after applyingobjectiveto the samples.objective (MCMultiOutputObjective | None) – The MCMultiOutputObjective under which the samples are evaluated. Defaults to
IdentityMultiOutputObjective(). This can be used to determine which outputs of the MultiOutputAcquisitionFunction should be used as objectives/constraints in NSGA-II.constraints (list[Callable[[Tensor], Tensor]] | None) – A list of callables, each mapping a Tensor of dimension
sample_shape x batch-shape x q x mto a Tensor of dimensionsample_shape x batch-shape x q, where negative values imply feasibility.inequality_constraints (list[tuple[Tensor, Tensor, float]] | None) – A list of tuples (indices, coefficients, rhs), representing inequality constraints of the form
sum_i (X[indices[i]] * coefficients[i]) >= rhs. These are parameter-space constraints (as opposed to outcome-space constraints).population_size (int) – the population size for NSGA-II.
max_gen (int | None) – The number of iterations for NSGA-II. If None, this uses the default termination condition in pymoo for NSGA-II.
seed (int | None) – The random seed for NSGA-II.
fixed_features (dict[int, float] | None) – A map
{feature_index: value}for features that should be fixed to a particular value during generation. All indices should be non-negative.max_attempts (int) – The total number of times to run the optimization if it fails (usually due to NSGA-II failing to find a feasible point).
discrete_choices (dict[int, list[float]] | None) – A mapping from dimension index to allowed discrete values. When provided, a repair operator is used during NSGA-II optimization to ensure discrete dimensions are snapped to their nearest allowed values after each generation. This provides better handling of mixed continuous/discrete search spaces compared to post-hoc rounding. Dimensions in
fixed_featuresare automatically excluded.post_processing_func (Callable[[Tensor], Tensor] | None) –
A function that post-processes optimization results, e.g., to round discrete dimensions to valid values. The function should take an
n x dtensor and return a tensor of the same shape with post-processed values. When provided, the objective values Y are re-evaluated after post-processing to ensure accuracy.Note: Constraint feasibility is not re-checked after post-processing. NSGA-II enforces constraints on the original (pre-processed) X, but post-processing (e.g., rounding) could make previously feasible solutions infeasible. This mirrors the behavior of other optimizers like
optimize_acqf. For parameter-space constraints, use Ax-level validation (e.g.,validate_candidates) as a safety net.
- Returns:
A two-element tuple containing the pareto set X and pareto frontier Y.
- Return type:
tuple[Tensor, Tensor]
Pareto
- botorch.utils.multi_objective.pareto.is_non_dominated(Y, maximize=True, deduplicate=True)[source]
Computes the non-dominated front.
Note: this assumes maximization.
For small
n, this method uses a highly parallel methodology that compares all pairs of points in Y. However, this is memory intensive and slow for largen. For largen(or if Y is larger than 5MB), this method will dispatch to a loop-based approach that is faster and has a lower memory footprint.- Parameters:
Y (Tensor) – A
(batch_shape) x n x m-dim tensor of outcomes. If any element ofYis NaN, the corresponding point will be treated as a dominated point (returning False).maximize (bool) – If True, assume maximization (default).
deduplicate (bool) – A boolean indicating whether to only return unique points on the pareto frontier.
- Returns:
A
(batch_shape) x n-dim boolean tensor indicating whether each point is non-dominated.- Return type:
Tensor
Scalarization
Helper utilities for constructing scalarizations.
References
- botorch.utils.multi_objective.scalarization.get_chebyshev_scalarization(weights, Y, alpha=0.05)[source]
Construct an augmented Chebyshev scalarization.
- The augmented Chebyshev scalarization is given by
g(y) = max_i(w_i * y_i) + alpha * sum_i(w_i * y_i)
where the goal is to minimize g(y) in the setting where all objectives y_i are to be minimized. Since the default in BoTorch is to maximize all objectives, this method constructs a Chebyshev scalarization where the inputs are first multiplied by -1, so that all objectives are to be minimized. Then, it computes g(y) (which should be minimized), and returns -g(y), which should be maximized.
Minimizing an objective is supported by passing a negative weight for that objective. To make all w * y’s have the same sign such that they are comparable when computing max(w * y), outcomes of minimization objectives are shifted from [0,1] to [-1,0].
See [Knowles2005] for details.
This scalarization can be used with qExpectedImprovement to implement q-ParEGO as proposed in [Daulton2020qehvi].
- Parameters:
weights (Tensor) – A
m-dim tensor of weights. Positive for maximization and negative for minimization.Y (Tensor) – A
n x m-dim tensor of observed outcomes, which are used for scaling the outcomes to [0,1] or [-1,0]. Ifn=0, then outcomes are left unnormalized.alpha (float) – Parameter governing the influence of the weighted sum term. The default value comes from [Knowles2005].
- Returns:
Transform function using the objective weights.
- Return type:
Callable[[Tensor, Tensor | None], Tensor]
Example
>>> weights = torch.tensor([0.75, -0.25]) >>> transform = get_aug_chebyshev_scalarization(weights, Y)
Probability Utilities
Multivariate Gaussian Probabilities via Bivariate Conditioning
Bivariate conditioning algorithm for approximating Gaussian probabilities, see [Genz2016numerical] and [Trinh2015bivariate].
G. Trinh and A. Genz. Bivariate conditioning approximations for multivariate normal probabilities. Statistics and Computing, 2015.
A. Genz and G. Trinh. Numerical Computation of Multivariate Normal Probabilities using Bivariate Conditioning. Monte Carlo and Quasi-Monte Carlo Methods, 2016.
GJ. Gibson, CA Galsbey, and DA Elston. Monte Carlo evaluation of multivariate normal integrals and sensitivity to variate ordering. Advances in Numerical Methods and Applications. 1994.
- class botorch.utils.probability.mvnxpb.mvnxpbState[source]
Bases:
TypedDict- step: int
- perm: LongTensor
- bounds: Tensor
- piv_chol: PivotedCholesky
- plug_ins: Tensor
- log_prob: Tensor
- log_prob_extra: Tensor | None
- class botorch.utils.probability.mvnxpb.MVNXPB(covariance_matrix, bounds)[source]
Bases:
objectAn algorithm for approximating Gaussian probabilities
P(X \in bounds), whereX ~ N(0, covariance_matrix).Initializes an MVNXPB instance.
- Parameters:
covariance_matrix (Tensor) – Covariance matrices of shape
batch_shape x [n, n].bounds (Tensor) – Tensor of lower and upper bounds,
batch_shape x [n, 2]. These bounds are standardized internally and clipped to STANDARDIZED_RANGE.
- log_prob_extra: Tensor | None
- classmethod build(step, perm, bounds, piv_chol, plug_ins, log_prob, log_prob_extra=None)[source]
Creates an MVNXPB instance from raw arguments. Unlike MVNXPB.__init__, this methods does not preprocess or copy terms.
- Parameters:
step (int) – Integer used to track the solver’s progress.
bounds (Tensor) – Tensor of lower and upper bounds,
batch_shape x [n, 2].piv_chol (PivotedCholesky) – A PivotedCholesky instance for the system.
plug_ins (Tensor) – Tensor of plug-in estimators used to update lower and upper bounds on random variables that have yet to be integrated out.
log_prob (Tensor) – Tensor of log probabilities.
log_prob_extra (Tensor | None) – Tensor of conditional log probabilities for the next random variable. Used when integrating over an odd number of random variables.
perm (Tensor)
- Return type:
- solve(num_steps=None, eps=1e-10)[source]
Runs the MVNXPB solver instance for a fixed number of steps.
Calculates a bivariate conditional approximation to P(X in bounds), where X ~ N(0, Σ). For details, see [Genz2016numerical] or [Trinh2015bivariate].
- Parameters:
num_steps (int | None)
eps (float)
- Return type:
Tensor
- select_pivot()[source]
GGE variable prioritization strategy from [Gibson1994monte].
Returns the index of the random variable least likely to satisfy its bounds when conditioning on the previously integrated random variables
X[:t - 1]attaining the values of plug-in estimatorsy[:t - 1]. Equivalently,` argmin_{i = t, ..., n} P(X[i] \in bounds[i] | X[:t-1] = y[:t -1]), `wheretdenotes the current step.- Return type:
LongTensor | None
- pivot_(pivot)[source]
Swap random variables at
pivotandsteppositions.- Parameters:
pivot (LongTensor)
- Return type:
None
- augment(covariance_matrix, bounds, cross_covariance_matrix, disable_pivoting=False, jitter=None, max_tries=None)[source]
Augment an
n-dimensional MVNXPB instance to includemadditional random variables.- Parameters:
covariance_matrix (Tensor)
bounds (Tensor)
cross_covariance_matrix (Tensor)
disable_pivoting (bool)
jitter (float | None)
max_tries (int | None)
- Return type:
Truncated Multivariate Normal Distribution
- class botorch.utils.probability.truncated_multivariate_normal.TruncatedMultivariateNormal(loc, covariance_matrix=None, precision_matrix=None, scale_tril=None, bounds=None, solver=None, sampler=None, validate_args=None)[source]
Bases:
MultivariateNormalInitializes an instance of a TruncatedMultivariateNormal distribution.
Let
x ~ N(0, K)be ann-dimensional Gaussian random vector. This class represents the distribution of the truncated Multivariate normal random vectorx | a <= x <= b.- Parameters:
loc (Tensor) – A mean vector for the distribution,
batch_shape x event_shape.covariance_matrix (Tensor | None) – Covariance matrix distribution parameter.
precision_matrix (Tensor | None) – Inverse covariance matrix distribution parameter.
scale_tril (Tensor | None) – Lower triangular, square-root covariance matrix distribution parameter.
bounds (Tensor) – A
batch_shape x event_shape x 2tensor of strictly increasing bounds forxso thatbounds[..., 0] < bounds[..., 1]everywhere.solver (MVNXPB | None) – A pre-solved MVNXPB instance used to approximate the log partition.
sampler (LinearEllipticalSliceSampler | None) – A LinearEllipticalSliceSampler instance used for sample generation.
validate_args (bool | None) – Optional argument to super().__init__.
- log_prob(value)[source]
Approximates the true log probability.
- Parameters:
value (Tensor)
- Return type:
Tensor
- rsample(sample_shape=())[source]
Draw samples from the Truncated Multivariate Normal.
- Parameters:
sample_shape (Size) – The shape of the samples.
- Returns:
The (sample_shape x batch_shape x event_shape) tensor of samples.
- Return type:
Tensor
- property log_partition: Tensor
- property sampler: LinearEllipticalSliceSampler
- expand(batch_shape, _instance=None)[source]
Returns a new distribution instance (or populates an existing instance provided by a derived class) with batch dimensions expanded to
batch_shape. This method callsexpandon the distribution’s parameters. As such, this does not allocate new memory for the expanded distribution instance. Additionally, this does not repeat any args checking or parameter broadcasting in__init__.py, when an instance is first created.- Parameters:
batch_shape (torch.Size) – the desired expanded size.
_instance (TruncatedMultivariateNormal) – new instance provided by subclasses that need to override
.expand.
- Returns:
New distribution instance with batch dimensions expanded to
batch_size.- Return type:
Unified Skew Normal Distribution
- class botorch.utils.probability.unified_skew_normal.UnifiedSkewNormal(trunc, gauss, cross_covariance_matrix, validate_args=None)[source]
Bases:
DistributionUnified Skew Normal distribution of
Y | a < X < bfor jointly Gaussian random vectorsX ∈ R^mandY ∈ R^n.Batch shapes
trunc.batch_shapeandgauss.batch_shapemust be broadcastable. Care should be taken when choosingtrunc.batch_shape. Whentruncis of lower batch dimensionality thangauss, the user should consider expandingtruncto hastenUnifiedSkewNormal.log_prob. In these cases, it is suggested that the user invoketrunc.solverbefore callingtrunc.expandto avoid paying for multiple, identical solves.- Parameters:
trunc (TruncatedMultivariateNormal) – Distribution of
Z = (X | a < X < b) ∈ R^m.gauss (MultivariateNormal) – Distribution of
Y ∈ R^n.cross_covariance_matrix (Tensor | LinearOperator) – Cross-covariance
Cov(X, Y) ∈ R^{m x n}.validate_args (bool | None) – Optional argument to super().__init__.
- arg_constraints = {}
- log_prob(value)[source]
Computes the log probability
ln p(Y = value | a < X < b).- Parameters:
value (Tensor)
- Return type:
Tensor
- rsample(sample_shape=())[source]
Draw samples from the Unified Skew Normal.
- Parameters:
sample_shape (Size) – The shape of the samples.
- Returns:
The (sample_shape x batch_shape x event_shape) tensor of samples.
- Return type:
Tensor
- expand(batch_shape, _instance=None)[source]
Returns a new distribution instance (or populates an existing instance provided by a derived class) with batch dimensions expanded to
batch_shape. This method callsexpandon the distribution’s parameters. As such, this does not allocate new memory for the expanded distribution instance. Additionally, this does not repeat any args checking or parameter broadcasting in__init__.py, when an instance is first created.- Parameters:
batch_shape (torch.Size) – the desired expanded size.
_instance (UnifiedSkewNormal) – new instance provided by subclasses that need to override
.expand.
- Returns:
New distribution instance with batch dimensions expanded to
batch_size.- Return type:
- property covariance_matrix: Tensor
- property scale_tril: Tensor
Bivariate Normal Probabilities and Statistics
Methods for computing bivariate normal probabilities and statistics.
A. Genz. Numerical computation of rectangular bivariate and trivariate normal and t probabilities. Statistics and Computing, 2004.
B. Muthen. Moments of the censored and truncated bivariate normal distribution. British Journal of Mathematical and Statistical Psychology, 1990.
- botorch.utils.probability.bvn.bvn(r, xl, yl, xu, yu)[source]
A function for computing bivariate normal probabilities.
Calculates
P(xl < x < xu, yl < y < yu)wherexandyare bivariate normal with unit variance and correlation coefficientr. See Section 2.4 of [Genz2004bvnt].This method uses a sign flip trick to improve numerical performance. Many of
bvnu``s internal branches rely on evaluations ``Phi(-bound). Fora < b < 0, the termPhi(-a) - Phi(-b)goes to zero faster thanPhi(b) - Phi(a)becausefinfo(dtype).epsnegis typically much larger thanfinfo(dtype).tiny. In these cases, flipping the sign can prevent situations wherebvnu(...) - bvnu(...)would otherwise be zero due to round-off error.- Parameters:
r (Tensor) – Tensor of correlation coefficients.
xl (Tensor) – Tensor of lower bounds for
x, same shape asr.yl (Tensor) – Tensor of lower bounds for
y, same shape asr.xu (Tensor) – Tensor of upper bounds for
x, same shape asr.yu (Tensor) – Tensor of upper bounds for
y, same shape asr.
- Returns:
Tensor of probabilities
P(xl < x < xu, yl < y < yu).- Return type:
Tensor
- botorch.utils.probability.bvn.bvnu(r, h, k)[source]
Solves for
P(x > h, y > k)wherexandyare standard bivariate normal random variables with correlation coefficientr. In [Genz2004bvnt], this is (1)L(h, k, r) = P(x < -h, y < -k) = 1/(a 2pi) int_{h}^{infty} int_{k}^{infty} f(x, y, r) dy dx,where
f(x, y, r) = e^{-1/(2a^2) (x^2 - 2rxy + y^2)}anda = (1 - r^2)^{1/2}.[Genz2004bvnt] report the following integation scheme incurs a maximum of 5e-16 error when run in double precision: if
|r| >= 0.925, use a 20-point quadrature rule on a 5th order Taylor expansion; else, numerically integrate in polar coordinates using no more than 20 quadrature points.- Parameters:
r (Tensor) – Tensor of correlation coefficients.
h (Tensor) – Tensor of negative upper bounds for
x, same shape asr.k (Tensor) – Tensor of negative upper bounds for
y, same shape asr.
- Returns:
A tensor of probabilities
P(x > h, y > k).- Return type:
Tensor
- botorch.utils.probability.bvn.bvnmom(r, xl, yl, xu, yu, p=None)[source]
Computes the expected values of truncated, bivariate normal random variables.
Let
xandybe a pair of standard bivariate normal random variables having correlationr. This function computesE([x,y] \| [xl,yl] < [x,y] < [xu,yu]).Following [Muthen1990moments] equations (4) and (5), we have
E(x | [xl, yl] < [x, y] < [xu, yu]) = Z^{-1} phi(xl) P(yl < y < yu | x=xl) - phi(xu) P(yl < y < yu | x=xu),where
Z = P([xl, yl] < [x, y] < [xu, yu])and\phiis the standard normal PDF.- Parameters:
r (Tensor) – Tensor of correlation coefficients.
xl (Tensor) – Tensor of lower bounds for
x, same shape asr.xu (Tensor) – Tensor of upper bounds for
x, same shape asr.yl (Tensor) – Tensor of lower bounds for
y, same shape asr.yu (Tensor) – Tensor of upper bounds for
y, same shape asr.p (Tensor | None) – Tensor of probabilities
P(xl < x < xu, yl < y < yu), same shape asr.
- Returns:
E(x \| [xl, yl] < [x, y] < [xu, yu])andE(y \| [xl, yl] < [x, y] < [xu, yu]).- Return type:
tuple[Tensor, Tensor]
Elliptic Slice Sampler with Linear Constraints
Linear Elliptical Slice Sampler.
References
A. Gessner, O. Kanjilal, and P. Hennig. Integrals over gaussians under linear domain constraints. AISTATS 2020.
K. Wu, and J. Gardner. A Fast, Robust Elliptical Slice Sampling Implementation for Linearly Truncated Multivariate Normal Distributions. arXiv:2407.10449. 2024.
This implementation is based (with multiple changes / optimiations) on the following implementations based on the algorithm in [Gessner2020]: - https://github.com/alpiges/LinConGauss - https://github.com/wjmaddox/pytorch_ess
In addition, the active intervals (from which the angle is sampled) are computed using the improved algorithm described in [Wu2024]: https://github.com/kayween/linear-ess
The implementation here differentiates itself from the original implementations with: 1) Support for fixed feature equality constraints. 2) Support for non-standard Normal distributions. 3) Numerical stability improvements, especially relevant for high-dimensional cases. 4) Support multiple Markov chains running in parallel.
- class botorch.utils.probability.lin_ess.LinearEllipticalSliceSampler(inequality_constraints=None, bounds=None, interior_point=None, fixed_indices=None, mean=None, covariance_matrix=None, covariance_root=None, check_feasibility=False, burnin=0, thinning=0, num_chains=1)[source]
Bases:
PolytopeSamplerLinear Elliptical Slice Sampler.
Ideas: - Optimize computations if possible, potentially with torch.compile. - Extend fixed features constraint to general linear equality constraints.
Initialize LinearEllipticalSliceSampler.
- Parameters:
inequality_constraints (tuple[Tensor, Tensor] | None) – Tensors
(A, b)describing inequality constraintsA @ x <= b, whereAis ann_ineq_con x d-dim Tensor andbis ann_ineq_con x 1-dim Tensor, withn_ineq_conthe number of inequalities anddthe dimension of the sample space. If omitted, must provideboundsinstead.bounds (Tensor | None) – A
2 x d-dim tensor of box bounds. If omitted, must provideinequality_constraintsinstead.interior_point (Tensor | None) – A
d x 1-dim Tensor presenting a point in the (relative) interior of the polytope. If omitted, an interior point is determined automatically by solving a Linear Program. Note: It is crucial that the point lie in the interior of the feasible set (rather than on the boundary), otherwise the sampler will produce invalid samples.fixed_indices (list[int] | Tensor | None) – Integer list or
d-dim Tensor representing the indices of dimensions that are constrained to be fixed to the values specified in theinterior_point, which is required to be passed in conjunction withfixed_indices.mean (Tensor | None) – The
d x 1-dim mean of the MVN distribution (if omitted, use zero).covariance_matrix (Tensor | LinearOperator | None) – The
d x d-dim covariance matrix of the MVN distribution (if omitted, use the identity).covariance_root (Tensor | LinearOperator | None) – A
d x d-dim root of the covariance matrix such that covariance_root @ covariance_root.T = covariance_matrix. NOTE: This matrix is assumed to be lower triangular. covariance_root can only be passed in conjunction with fixed_indices if covariance_root is a DiagLinearOperator. Otherwise the factorization would need to be re- computed, as we need to solve instandardize.check_feasibility (bool) – If True, raise an error if the sampling results in an infeasible sample. This creates some overhead and so is switched off by default.
burnin (int) – Number of samples to generate upon initialization to warm up the sampler.
thinning (int) – Number of samples to skip before returning a sample in
draw.num_chains (int) – Number of Markov chains to run in parallel.
This sampler samples from a multivariate Normal
N(mean, covariance_matrix)subject to linear domain constraintsA x <= b(intersected with box bounds, if provided).- property lifetime_samples: int
The total number of samples generated by the sampler during its lifetime.
- botorch.utils.probability.lin_ess.get_index_tensors(fixed_indices, d)[source]
Converts
fixed_indicesto ad-dim integral Tensor that is True at indices that are contained infixed_indicesand False otherwise.- Parameters:
fixed_indices (list[int] | Tensor) – A list or Tensor of integer indices to fix.
d (int) – The dimensionality of the Tensors to be indexed.
- Returns:
A Tuple of integral Tensors partitioning [1, d] into indices that are fixed (first tensor) and non-fixed (second tensor).
- Return type:
tuple[Tensor, Tensor]
Linear Algebra Helpers
- botorch.utils.probability.linalg.block_matrix_concat(blocks)[source]
- Parameters:
blocks (Sequence[Sequence[Tensor]])
- Return type:
Tensor
- botorch.utils.probability.linalg.augment_cholesky(Laa, Kbb, Kba=None, Lba=None, jitter=None)[source]
Computes the Cholesky factor of a block matrix
K = [[Kaa, Kab], [Kba, Kbb]]based on a precomputed Cholesky factorKaa = Laa Laa^T.- Parameters:
Laa (Tensor) – Cholesky factor of K’s upper left block.
Kbb (Tensor) – Lower-right block of K.
Kba (Tensor | None) – Lower-left block of K.
Lba (Tensor | None) – Precomputed solve
Kba Laa^{-T}.jitter (float | None) – Optional nugget to be added to the diagonal of Kbb.
- Return type:
Tensor
- class botorch.utils.probability.linalg.PivotedCholesky(step: 'int', tril: 'Tensor', perm: 'LongTensor', diag: 'Tensor | None' = None, validate_init: 'InitVar[bool]' = True)[source]
Bases:
object- Parameters:
step (int)
tril (Tensor)
perm (LongTensor)
diag (Tensor | None)
validate_init (dataclasses.InitVar[bool])
- step: int
- tril: Tensor
- perm: LongTensor
- diag: Tensor | None
- validate_init: dataclasses.InitVar[bool] = True
- update_(eps=1e-10)[source]
Performs a single matrix decomposition step.
- Parameters:
eps (float)
- Return type:
None
- concat(other, dim=0)[source]
- Parameters:
other (PivotedCholesky)
dim (int)
- Return type:
Probability Helpers
- botorch.utils.probability.utils.case_dispatcher(out, cases=(), default=None)[source]
Basic implementation of a tensorized switching case statement.
- Parameters:
out (Tensor) – Tensor to which case outcomes are written.
cases (Iterable[tuple[Callable[[], BoolTensor], Callable[[BoolTensor], Tensor]]]) – Iterable of function pairs (pred, func), where
mask=pred()specifies whetherfuncis applicable for each entry inout. Note that cases are resolved first-come, first-serve.default (Callable[[BoolTensor], Tensor]) – Optional
functo which all unclaimed entries ofoutare dispatched.
- Return type:
Tensor
- botorch.utils.probability.utils.gen_positional_indices(shape, dim, device=None)[source]
- Parameters:
shape (Size)
dim (int)
device (device | None)
- Return type:
Iterator[LongTensor]
- botorch.utils.probability.utils.build_positional_indices(shape, dim, device=None)[source]
- Parameters:
shape (Size)
dim (int)
device (device | None)
- Return type:
LongTensor
- botorch.utils.probability.utils.leggauss(deg, **tkwargs)[source]
- Parameters:
deg (int)
tkwargs (Any)
- Return type:
tuple[Tensor, Tensor]
- botorch.utils.probability.utils.ndtr(x)[source]
Standard normal CDF.
- Parameters:
x (Tensor)
- Return type:
Tensor
- botorch.utils.probability.utils.phi(x)[source]
Standard normal PDF.
- Parameters:
x (Tensor)
- Return type:
Tensor
- botorch.utils.probability.utils.log_phi(x)[source]
Logarithm of standard normal pdf
- Parameters:
x (Tensor)
- Return type:
Tensor
- botorch.utils.probability.utils.log_ndtr(x)[source]
Implementation of log_ndtr that remedies problems of torch.special’s version for large negative x, where the torch implementation yields Inf or NaN gradients.
- Parameters:
x (Tensor) – An input tensor with dtype torch.float32 or torch.float64.
- Returns:
A tensor of values of the same type and shape as x containing log(ndtr(x)).
- Return type:
Tensor
- botorch.utils.probability.utils.log_erfc(x)[source]
Computes the logarithm of the complementary error function in a numerically stable manner. The GitHub issue https://github.com/pytorch/pytorch/issues/31945 tracks progress toward moving this feature into PyTorch in C++.
- Parameters:
x (Tensor) – An input tensor with dtype torch.float32 or torch.float64.
- Returns:
A tensor of values of the same type and shape as x containing log(erfc(x)).
- Return type:
Tensor
- botorch.utils.probability.utils.log_erfcx(x)[source]
Computes the logarithm of the complementary scaled error function in a numerically stable manner. The GitHub issue tracks progress toward moving this feature into PyTorch in C++: https://github.com/pytorch/pytorch/issues/31945.
- Parameters:
x (Tensor) – An input tensor with dtype torch.float32 or torch.float64.
- Returns:
A tensor of values of the same type and shape as x containing log(erfcx(x)).
- Return type:
Tensor
- botorch.utils.probability.utils.standard_normal_log_hazard(x)[source]
Computes the logarithm of the hazard function of the standard normal distribution, i.e.
log(phi(x) / Phi(-x)).- Parameters:
x (Tensor) – A tensor of any shape, with either float32 or float64 dtypes.
- Returns:
A Tensor of the same shape
x, containing the values of the logarithm of the hazard function evaluated atx.- Return type:
Tensor
- botorch.utils.probability.utils.log_prob_normal_in(a, b)[source]
Computes the probability that a standard normal random variable takes a value in [a, b], i.e. log(Phi(b) - Phi(a)), where Phi is the standard normal CDF. Returns accurate values and permits numerically stable backward passes for inputs in [-1e100, 1e100] for double precision and [-1e20, 1e20] for single precision. In contrast, a naive approach is not numerically accurate beyond [-10, 10].
- Parameters:
a (Tensor) – Tensor of lower integration bounds of the Gaussian probability measure.
b (Tensor) – Tensor of upper integration bounds of the Gaussian probability measure.
- Returns:
Tensor of the log probabilities.
- Return type:
Tensor
- botorch.utils.probability.utils.swap_along_dim_(values, i, j, dim, buffer=None)[source]
Swaps Tensor slices in-place along dimension
dim.When passed as Tensors,
i(andj) should bedim-dimensional tensors with the same shape asvalues.shape[:dim]. The exception to this rule occurs whendim=0, in which casei(andj) should be (at most) one-dimensional when passed as a Tensor.- Parameters:
values (Tensor) – Tensor whose values are to be swapped.
i (int | LongTensor) – Indices for slices along dimension
dim.j (int | LongTensor) – Indices for slices along dimension
dim.dim (int) – The dimension of
valuesalong which to swap slices.buffer (Tensor | None) – Optional buffer used internally to store copied values.
- Returns:
The original
valuestensor.- Return type:
Tensor
- botorch.utils.probability.utils.compute_log_prob_feas_from_bounds(con_lower_inds, con_upper_inds, con_both_inds, con_lower, con_upper, con_both, means, sigmas)[source]
Compute logarithm of the feasibility probability for each batch of mean/sigma.
- Parameters:
means (Tensor) – A
(b) x m-dim Tensor of means.sigmas (Tensor) – A
(b) x m-dim Tensor of standard deviations.con_lower_inds (Tensor) – 1d Tensor of indices con_lower applies to in the second dimension of means and sigmas.
con_upper_inds (Tensor) – 1d Tensor of indices con_upper applies to in the second dimension of means and sigmas.
con_both_inds (Tensor) – 1d Tensor of indices con_both applies to in the second dimension of means and sigmas.
con_lower (Tensor) – 1d Tensor of lower bounds on the constraints equal in dimension to con_lower_inds.
con_upper (Tensor) – 1d Tensor of upper bounds on the constraints equal in dimension to con_upper_inds.
con_both (Tensor) – 2d Tensor of “both” bounds on the constraints equal in length to con_both_inds.
- Returns:
A
(b)-dim tensor of log feasibility probabilities- Return type:
Tensor
- botorch.utils.probability.utils.percentile_of_score(data, score, dim=-1)[source]
Compute the percentile rank of
scorerelative todata. For example, if this function returns 70 then 70% of the values indataare belowscore.This implementation is based on
scipy.stats.percentileofscore, withkind='rank'andnan_policy='propagate', which is the default.- Parameters:
data (Tensor) – A
... x n x output_shape-dim Tensor of data.score (Tensor) – A
... x 1 x output_shape-dim Tensor of scores.dim (int)
- Returns:
A
... x output_shape-dim Tensor of percentile ranks.- Return type:
Tensor