Welcome to Perturb-OT’s documentation!

perturbot.match.cot_numpy(X1, X2, w1=None, w2=None, v1=None, v2=None, niter=10, algo='emd', reg=0, algo2='emd', reg2=0, verbose=True, log=False, random_init=False, C_lin=None)

Returns COOT between two datasets X1,X2 (see [1]), Sinkhorn reimplemented with OTT

The function solves the following optimization problem:

\[COOT = \min_{Ts,Tv} \sum_{i,j,k,l} |X1_{i,k}-X2_{j,l}|^{2}*Ts_{i,j}*Tv_{k,l}\]

Where : - X1 : The source dataset - X2 : The target dataset - w1,w2 : weights (histograms) on the samples (rows) of resp. X1 and X2 - v1,v2 : weights (histograms) on the features (columns) of resp. X1 and X2

Parameters:
  • X1 (numpy array, shape (n, d)) – Source dataset

  • X2 (numpy array, shape (n', d')) – Target dataset

  • w1 (numpy array, shape (n,)) – Weight (histogram) on the samples of X1. If None uniform distribution is considered.

  • w2 (numpy array, shape (n',)) – Weight (histogram) on the samples of X2. If None uniform distribution is considered.

  • v1 (numpy array, shape (d,)) – Weight (histogram) on the features of X1. If None uniform distribution is considered.

  • v2 (numpy array, shape (d',)) – Weight (histogram) on the features of X2. If None uniform distribution is considered.

  • niter (integer) – Number max of iterations of the BCD for solving COOT.

  • algo (string) – Choice of algorithm for solving OT problems on samples each iteration. Choice [‘emd’,’sinkhorn’]. If ‘emd’ returns sparse solution If ‘sinkhorn’ returns regularized solution

  • algo2 (string) – Choice of algorithm for solving OT problems on features each iteration. Choice [‘emd’,’sinkhorn’]. If ‘emd’ returns sparse solution If ‘sinkhorn’ returns regularized solution

  • reg (float) – Regularization parameter for samples coupling matrix. Ignored if algo=’emd’

  • reg2 (float) – Regularization parameter for features coupling matrix. Ignored if algo=’emd’

  • eps (float) – Threshold for the convergence

  • random_init (bool) – Wether to use random initialization for the coupling matrices. If false identity couplings are considered.

  • log (bool, optional) – record log if True

  • C_lin (numpy array, shape (n, n')) – Prior on the sample correspondences. Added to the cost for the samples transport

Returns:

  • Ts (numpy array, shape (n,n’)) – Optimal Transport coupling between the samples

  • Tv (numpy array, shape (d,d’)) – Optimal Transport coupling between the features

  • cost (float) – Optimization value after convergence

  • log (dict) – convergence information and coupling marices

References

Examples

import numpy as np
from cot import cot_numpy

n_samples=300
Xs=np.random.rand(n_samples,2)
Xt=np.random.rand(n_samples,1)
cot_numpy(Xs,Xt)
perturbot.match.cotl_numpy(X_dict: Dict[Number, ndarray], Y_dict: Dict[Number, ndarray], w1: Dict[Number, ndarray] = None, w2: Dict[Number, ndarray] = None, v1: ndarray | None = None, v2: ndarray | None = None, niter: int = 100, algo: str = 'emd', reg: float = 0.1, algo2: str = 'emd', reg2: float = 10.0, verbose: bool = True, log: bool = False, random_init: bool = False, C_lin: bool = None)

Returns COOT between two datasets X, Y given labels.

The function solves the following optimization problem:

\[COOTL = \min_{Ts^1,..Ts^{L},Tv} \sum_{t=1}^L \sum_{i,k \in {i|l_{x_i}=t}, j,l \in {j|l_{y_j}=t}} |X1_{i,k}-X2_{j,l}|^{2}*Ts_{i,j}^l*Tv_{k,l}\]
Parameters:
  • X1 (numpy array, shape (n, d)) – Source dataset

  • X2 (numpy array, shape (n', d')) – Target dataset

  • y1 (numpy array, shape (n,))

  • y2 (numpy array, shape (n',))

  • w1 (numpy array, shape (n,)) – Weight (histogram) on the samples of X1. If None uniform distribution is considered.

  • w2 (Ditionary of numpy array, shape (n',)) – Weight (histogram) on the samples of X2. If None uniform distribution is considered.

  • v1 (numpy array, shape (d,)) – Weight (histogram) on the features of X1. If None uniform distribution is considered.

  • v2 (numpy array, shape (d',)) – Weight (histogram) on the features of X2. If None uniform distribution is considered.

  • niter (integer) – Number max of iterations of the BCD for solving COOT.

  • algo (string) – Choice of algorithm for solving OT problems on samples each iteration. Choice [‘emd’,’sinkhorn’]. If ‘emd’ returns sparse solution If ‘sinkhorn’ returns regularized solution

  • algo2 (string) – Choice of algorithm for solving OT problems on features each iteration. Choice [‘emd’,’sinkhorn’]. If ‘emd’ returns sparse solution If ‘sinkhorn’ returns regularized solution

  • reg (float) – Regularization parameter for samples coupling matrix. Ignored if algo=’emd’

  • reg2 (float) – Regularization parameter for features coupling matrix. Ignored if algo=’emd’

  • eps (float) – Threshold for the convergence

  • random_init (bool) – Wether to use random initialization for the coupling matrices. If false identity couplings are considered.

  • log (bool, optional) – record log if True

  • C_lin (numpy array, shape (n, n')) – Prior on the sample correspondences. Added to the cost for the samples transport

Returns:

  • Ts (numpy array, shape (n,n’)) – Optimal Transport coupling between the samples

  • Tv (numpy array, shape (d,d’)) – Optimal Transport coupling between the features

  • cost (float) – Optimization value after convergence

  • log (dict) – convergence information and coupling marices

References

Example

import numpy as np
from perturbot.match import cotl_numpy

n_samples = 300
labels = [0,1,2,3]
Xs_dict = {k: np.random.rand(n_samples,2) for k in labels}
Xt_dict = {k: np.random.rand(n_samples,1) for k in labels}
cotl_numpy(Xs_dict, Xt_dict)
perturbot.match.get_coupling_cot(data: Tuple[Dict[Number, array], Dict[Number, array]]) Tuple[int | Dict[Number, array], int | Dict]

Returns sample coupling between two datasets X, Y given the labels, disregarding label information.

The function solves the following optimization problem:

\[COOT = \min_{Ts,Tv} \sum_{i,j,k,l} |X1_{i,k}-X2_{j,l}|^{2}*Ts_{i,j}^l*Tv_{k,l}\]
Parameters:
  • data – (source dataset, target dataset) where source and target datasets are the dictionaries mapping label to np.ndarray with matched labels.

  • eps – Regularization parameter, relative to the max cost.

Returns:

  • T_dict – Optimal Transport coupling between the samples per label

  • log – Running log

Example

import numpy as np
from perturbot.match import get_coupling_cot

n_samples = 300
labels = [0,1,2,3]
Xs_dict = {k: np.random.rand(n_samples,1) for k in labels}
Xt_dict = {k: np.random.rand(n_samples,2) for k in labels}
get_coupling_cot((Xs_dict, Xt_dict))
perturbot.match.get_coupling_cot_sinkhorn(data: Tuple[Dict[Number, array], Dict[Number, array]], eps: float = 0.005, eps2: float | None = None) Tuple[int | Dict[Number, array], int | Dict]

Returns sample coupling between two datasets X, Y given the labels, disregarding label information.

The function solves the following optimization problem:

\[ECOOT = \min_{Ts,Tv} \sum_{i,j,k,l} |X1_{i,k}-X2_{j,l}|^{2}*Ts_{i,j}^l*Tv_{k,l} - \epsilon H(T_s)\]
Parameters:
  • data – (source dataset, target dataset) where source and target datasets are the dictionaries mapping label to np.ndarray with matched labels.

  • eps – Regularization parameter, relative to the max cost.

Returns:

  • T_dict – Optimal Transport coupling between the samples per label

  • log – Running log

Example

import numpy as np
from perturbot.match import get_coupling_cot_sinkhorn

n_samples = 300
labels = [0,1,2,3]
Xs_dict = {k: np.random.rand(n_samples,1) for k in labels}
Xt_dict = {k: np.random.rand(n_samples,2) for k in labels}
get_coupling_cot_sinkhorn((Xs_dict, Xt_dict), 0.05)
perturbot.match.get_coupling_cotl(data: Tuple[Dict[Number, array], Dict[Number, array]]) Tuple[int | Dict[Number, array], int | Dict]

Returns sample coupling between two datasets X, Y given the labels, disregarding label information.

The function solves the following optimization problem:

\[COOTL = \min_{Ts^1,..Ts^{L},Tv} \sum_{t=1}^L \sum_{i,k \in {i|l_{x_i}=t}, j,l \in {j|l_{y_j}=t}} |X1_{i,k}-X2_{j,l}|^{2}*Ts_{i,j}^l*Tv_{k,l}\]
Parameters:
  • data – (source dataset, target dataset) where source and target datasets are the dictionaries mapping label to np.ndarray with matched labels.

  • eps – Regularization parameter, relative to the max cost.

Returns:

  • T_dict – Optimal Transport coupling between the samples per label

  • log – Running log

Example

import numpy as np
from perturbot.match import get_coupling_eot_ott

n_samples = 300
labels = [0,1,2,3]
Xs_dict = {k: np.random.rand(n_samples,1) for k in labels}
Xt_dict = {k: np.random.rand(n_samples,1) for k in labels}
get_coupling_eot_ott((Xs_dict, Xt_dict), 0.05)
perturbot.match.get_coupling_cotl_sinkhorn(data: Tuple[Dict[Number, array], Dict[Number, array]], eps: float = 0.005, eps2: float = None) Tuple[Dict[Number, array], Dict]

Returns sample coupling between two datasets X, Y given the labels.

The function solves the following optimization problem:

\[COOTL = \min_{Ts^1,..Ts^{L},Tv} \sum_{t=1}^L \sum_{i,k \in {i|l_{x_i}=t}, j,l \in {j|l_{y_j}=t}} |X1_{i,k}-X2_{j,l}|^{2}*Ts_{i,j}^l*Tv_{k,l} - \epsilon_1 H(Ts) -\epsilon_2 H(Tv)\]
Parameters:
  • data – (source dataset, target dataset) where source and target datasets are the dictionaries mapping label to np.ndarray with matched labels.

  • eps – Regularization parameter, relative to the max cost.

Returns:

  • T_dict – Optimal Transport coupling between the samples per label

  • log – Running log

Example

import numpy as np
from perturbot.match import get_coupling_eot_ott

n_samples = 300
labels = [0,1,2,3]
Xs_dict = {k: np.random.rand(n_samples,1) for k in labels}
Xt_dict = {k: np.random.rand(n_samples,1) for k in labels}
get_coupling_eot_ott((Xs_dict, Xt_dict), 0.05)
perturbot.match.get_coupling_egw_all_ott(data: Tuple[Dict[Number, array], Dict[Number, array]], eps: float = 0.005) Tuple[Dict[Number, array], Dict]

Returns GW coupling between two datasets X, Y, all-to-all manner disregarding labels.

The function solves the following optimization problem:

\[GW = \min_{T\in C_{p,q}} \sum_{i,j,k,l} |(x_i-x_k)^2 - (y_j-y_l)^2|^{2}*T_{i,j}T_{k,l} - \epsilon H(T)\]
Parameters:
  • data – (source dataset, target dataset) where source and target datasets are the dictionaries mapping label to np.ndarray with matched labels.

  • eps – Regularization parameter, relative to the max cost.

Returns:

  • T_dict – Optimal Transport coupling between the samples per label

  • log – Running log

Example

import numpy as np
from perturbot.match import get_coupling_egw_all_ott

n_samples = 300
labels = [0,1,2,3]
Xs_dict = {k: np.random.rand(n_samples,2) for k in labels}
Xt_dict = {k: np.random.rand(n_samples,1) for k in labels}
get_coupling_egw_all_ott((Xs_dict, Xt_dict), 0.05)
perturbot.match.get_coupling_egw_labels_ott(data: Tuple[Dict[Number, array], Dict[Number, array]], eps: float = 0.005) Tuple[Dict[Number, array], Dict]

Returns GW coupling between two datasets X, Y given the labels.

The function solves the following optimization problem:

\[\begin{split}EGWL = \min_{T\in C_{p,q}^\ell} \sum_{i,k \in \{i|l_{x_i}=t\}, j,l \in \{j|l_{y_j}=t\}} |(x_i-x_k)^2 - (y_j-y_l)^2|^{2}*T_{i,j}T_{k,l} - \epsilon H(T)\\ C_{p,q}^\ell = \{T | T \in C{p,q}, T_{ij} > 0 \implies l_{x_i} = l_{y_j}\}\end{split}\]
Parameters:
  • data – (source dataset, target dataset) where source and target datasets are the dictionaries mapping label to np.ndarray with matched labels.

  • eps – Regularization parameter, relative to the max cost.

Returns:

  • T_dict – Optimal Transport coupling between the samples per label

  • log – Running log

Example

import numpy as np
from perturbot.match import get_coupling_egw_labels_ott

n_samples = 300
labels = [0,1,2,3]
Xs_dict = {k: np.random.rand(n_samples,2) for k in labels}
Xt_dict = {k: np.random.rand(n_samples,1) for k in labels}
get_coupling_egw_labels_ott((Xs_dict, Xt_dict), 0.05)
perturbot.match.get_coupling_egw_ott(data: Tuple[Dict[Number, array], Dict[Number, array]], eps: float = 0.005) Tuple[Dict[Number, array], Dict]

Returns GW coupling between two datasets X, Y per label.

The function solves the following optimization problem:

\[\begin{split}GW^l = \min_{T^l} \sum_{i,j,k,l} |(x_i-x_k)^2 - (y_j-y_l)^2|^{2}*T^l_{i,j}T^l_{k,l} - \epsilon H(T^l)\\\end{split}\]
Parameters:
  • data – (source dataset, target dataset) where source and target datasets are the dictionaries mapping label to np.ndarray with matched labels.

  • eps – Regularization parameter, relative to the max cost.

Returns:

  • T_dict – Optimal Transport coupling between the samples per label

  • log – Running log

Example

import numpy as np
from perturbot.match import get_coupling_egw_ott

n_samples = 300
labels = [0,1,2,3]
Xs_dict = {k: np.random.rand(n_samples,2) for k in labels}
Xt_dict = {k: np.random.rand(n_samples,1) for k in labels}
get_coupling_egw_ott((Xs_dict, Xt_dict), 0.05)
perturbot.match.get_coupling_eot_ott(data: Tuple[Dict[Number, array], Dict[Number, array]], eps: float = 0.005) Tuple[Dict[Number, array], Dict]

Returns OT coupling between two datasets X, Y given the labels, disregarding label information.

The function solves the following optimization problem:

\[\begin{split}EOT = \min_{T\in C_{p,q}} \sum_{i,j} (x_i-y_j)^2 T_{i,j} - \epsilon H(T)\\\end{split}\]
Parameters:
  • data – (source dataset, target dataset) where source and target datasets are the dictionaries mapping label to np.ndarray with matched labels.

  • eps – Regularization parameter, relative to the max cost.

Returns:

  • T_dict – Optimal Transport coupling between the samples per label

  • log – Running log

Example

import numpy as np
from perturbot.match import get_coupling_eot_ott

n_samples = 300
labels = [0,1,2,3]
Xs_dict = {k: np.random.rand(n_samples,1) for k in labels}
Xt_dict = {k: np.random.rand(n_samples,1) for k in labels}
get_coupling_eot_ott((Xs_dict, Xt_dict), 0.05)
perturbot.match.get_coupling_fot(data: Tuple[Dict[Number, ndarray], Dict[Number, ndarray]], Ts: Dict[Number, ndarray] | ndarray, eps=0.005)

Returns GW coupling between features given two datasets X, Y and the sample coupling.

The function solves the following optimization problem:

\[FOT = \min_{Tv} \sum_{i,j,k,l} |X1_{i,k}-X2_{j,l}|^{2}*Ts_{i,j}^l*Tv_{k,l} - \epsilon H(T_v)\]
Parameters:
  • data – (source dataset, target dataset) where source and target datasets are the dictionaries mapping label to np.ndarray with matched labels.

  • Ts – Sample-to-sample transport. Per-label transport matched with source dataset, target dataset or a global coupling matrix where the samples are concatenated by the order of labels in data[0].keys().

  • eps – Regularization parameter, relative to the max cost.

Returns:

  • Tv – Feature-to-feature coupling.

  • log – Running log

Example

import numpy as np
from perturbot.match import get_coupling_egw_labels_ott, get_coupling_fot

n_samples = 300
labels = [0,1,2,3]
Xs_dict = {k: np.random.rand(n_samples,2) for k in labels}
Xt_dict = {k: np.random.rand(n_samples,1) for k in labels}
Ts, log = get_coupling_egw_labels_ott((Xs_dict, Xt_dict), 0.05)
Tv, feature_matching_log = get_coupling_fot((Xs_dict, Xt_dict), Ts, 0.05)
perturbot.match.get_coupling_gw_labels(data: Tuple[Dict[Number, array], Dict[Number, array]]) Tuple[Dict[Number, array], Dict]

Returns GW coupling between two datasets X, Y given the labels.

The function solves the following optimization problem:

\[\begin{split}GWL = \min_{T\in C_{p,q}^\ell} \sum_{i,k \in \{i|l_{x_i}=t\}, j,l \in \{j|l_{y_j}=t}\} |(x_i-x_k)^2 - (y_j-y_l)^2|^{2}*T_{i,j}T_{k,l} \\ C_{p,q}^\ell = \{T | T \in C{p,q}, T_{ij} > 0 \implies l_{x_i} = l_{y_j}\}\end{split}\]
Parameters:

data – (source dataset, target dataset) where source and target datasets are the dictionaries mapping label to np.ndarray with matched labels.

Returns:

  • T_dict – Optimal Transport coupling between the samples per label

  • log – Running log

Example

import numpy as np
from perturbot.match import get_coupling_gw_labels

n_samples = 300
labels = [0,1,2,3]
Xs_dict = {k: np.random.rand(n_samples,2) for k in labels}
Xt_dict = {k: np.random.rand(n_samples,1) for k in labels}
get_coupling_gw_labels((Xs_dict, Xt_dict))
perturbot.match.get_coupling_leot_ott(data: Tuple[Dict[Number, array], Dict[Number, array]], eps: float = 0.005) Tuple[Dict[Number, array], Dict]

Returns OT coupling between two datasets X, Y per label.

The function solves the following optimization problem:

\[\begin{split}EOT^l = \min_{T^l} \sum_{i,j} (x_i-y_j)^2 T^l_{i,j} - \epsilon H(T^l)\\\end{split}\]
Parameters:
  • data – (source dataset, target dataset) where source and target datasets are the dictionaries mapping label to np.ndarray with matched labels.

  • eps – Regularization parameter, relative to the max cost.

Returns:

  • T_dict – Optimal Transport coupling between the samples per label

  • log – Running log

Example

import numpy as np
from perturbot.match import get_coupling_leot_ott

n_samples = 300
labels = [0,1,2,3]
Xs_dict = {k: np.random.rand(n_samples,1) for k in labels}
Xt_dict = {k: np.random.rand(n_samples,1) for k in labels}
get_coupling_leot_ott((Xs_dict, Xt_dict), 0.05)
perturbot.predict.train_mlp(train_data: Tuple[Dict[Number, array], Dict[Number, array]], T_dict: Dict[Number, array]) Tuple[Module, Dict]

Trains MLP predicting Y from X given the labeled sample-matching T_dict.

Parameters:
  • data – (source dataset, target dataset) where source and target datasets are the dictionaries mapping label to np.ndarray with matched labels.

  • T_dict – Optimal Transport coupling between the samples per label

Returns:

  • model – Trained predictor

  • log – Training log

Indices and tables