Welcome to Perturb-OT’s documentation!¶
- perturbot.match.cot_numpy(X1, X2, w1=None, w2=None, v1=None, v2=None, niter=10, algo='emd', reg=0, algo2='emd', reg2=0, verbose=True, log=False, random_init=False, C_lin=None)¶
Returns COOT between two datasets X1,X2 (see [1]), Sinkhorn reimplemented with OTT
The function solves the following optimization problem:
\[COOT = \min_{Ts,Tv} \sum_{i,j,k,l} |X1_{i,k}-X2_{j,l}|^{2}*Ts_{i,j}*Tv_{k,l}\]Where : - X1 : The source dataset - X2 : The target dataset - w1,w2 : weights (histograms) on the samples (rows) of resp. X1 and X2 - v1,v2 : weights (histograms) on the features (columns) of resp. X1 and X2
- Parameters:
X1 (numpy array, shape (n, d)) – Source dataset
X2 (numpy array, shape (n', d')) – Target dataset
w1 (numpy array, shape (n,)) – Weight (histogram) on the samples of X1. If None uniform distribution is considered.
w2 (numpy array, shape (n',)) – Weight (histogram) on the samples of X2. If None uniform distribution is considered.
v1 (numpy array, shape (d,)) – Weight (histogram) on the features of X1. If None uniform distribution is considered.
v2 (numpy array, shape (d',)) – Weight (histogram) on the features of X2. If None uniform distribution is considered.
niter (integer) – Number max of iterations of the BCD for solving COOT.
algo (string) – Choice of algorithm for solving OT problems on samples each iteration. Choice [‘emd’,’sinkhorn’]. If ‘emd’ returns sparse solution If ‘sinkhorn’ returns regularized solution
algo2 (string) – Choice of algorithm for solving OT problems on features each iteration. Choice [‘emd’,’sinkhorn’]. If ‘emd’ returns sparse solution If ‘sinkhorn’ returns regularized solution
reg (float) – Regularization parameter for samples coupling matrix. Ignored if algo=’emd’
reg2 (float) – Regularization parameter for features coupling matrix. Ignored if algo=’emd’
eps (float) – Threshold for the convergence
random_init (bool) – Wether to use random initialization for the coupling matrices. If false identity couplings are considered.
log (bool, optional) – record log if True
C_lin (numpy array, shape (n, n')) – Prior on the sample correspondences. Added to the cost for the samples transport
- Returns:
Ts (numpy array, shape (n,n’)) – Optimal Transport coupling between the samples
Tv (numpy array, shape (d,d’)) – Optimal Transport coupling between the features
cost (float) – Optimization value after convergence
log (dict) – convergence information and coupling marices
References
Examples
import numpy as np from cot import cot_numpy n_samples=300 Xs=np.random.rand(n_samples,2) Xt=np.random.rand(n_samples,1) cot_numpy(Xs,Xt)
- perturbot.match.cotl_numpy(X_dict: Dict[Number, ndarray], Y_dict: Dict[Number, ndarray], w1: Dict[Number, ndarray] = None, w2: Dict[Number, ndarray] = None, v1: ndarray | None = None, v2: ndarray | None = None, niter: int = 100, algo: str = 'emd', reg: float = 0.1, algo2: str = 'emd', reg2: float = 10.0, verbose: bool = True, log: bool = False, random_init: bool = False, C_lin: bool = None)¶
Returns COOT between two datasets X, Y given labels.
The function solves the following optimization problem:
\[COOTL = \min_{Ts^1,..Ts^{L},Tv} \sum_{t=1}^L \sum_{i,k \in {i|l_{x_i}=t}, j,l \in {j|l_{y_j}=t}} |X1_{i,k}-X2_{j,l}|^{2}*Ts_{i,j}^l*Tv_{k,l}\]- Parameters:
X1 (numpy array, shape (n, d)) – Source dataset
X2 (numpy array, shape (n', d')) – Target dataset
y1 (numpy array, shape (n,))
y2 (numpy array, shape (n',))
w1 (numpy array, shape (n,)) – Weight (histogram) on the samples of X1. If None uniform distribution is considered.
w2 (Ditionary of numpy array, shape (n',)) – Weight (histogram) on the samples of X2. If None uniform distribution is considered.
v1 (numpy array, shape (d,)) – Weight (histogram) on the features of X1. If None uniform distribution is considered.
v2 (numpy array, shape (d',)) – Weight (histogram) on the features of X2. If None uniform distribution is considered.
niter (integer) – Number max of iterations of the BCD for solving COOT.
algo (string) – Choice of algorithm for solving OT problems on samples each iteration. Choice [‘emd’,’sinkhorn’]. If ‘emd’ returns sparse solution If ‘sinkhorn’ returns regularized solution
algo2 (string) – Choice of algorithm for solving OT problems on features each iteration. Choice [‘emd’,’sinkhorn’]. If ‘emd’ returns sparse solution If ‘sinkhorn’ returns regularized solution
reg (float) – Regularization parameter for samples coupling matrix. Ignored if algo=’emd’
reg2 (float) – Regularization parameter for features coupling matrix. Ignored if algo=’emd’
eps (float) – Threshold for the convergence
random_init (bool) – Wether to use random initialization for the coupling matrices. If false identity couplings are considered.
log (bool, optional) – record log if True
C_lin (numpy array, shape (n, n')) – Prior on the sample correspondences. Added to the cost for the samples transport
- Returns:
Ts (numpy array, shape (n,n’)) – Optimal Transport coupling between the samples
Tv (numpy array, shape (d,d’)) – Optimal Transport coupling between the features
cost (float) – Optimization value after convergence
log (dict) – convergence information and coupling marices
References
[1] Redko Ievgen, Vayer Titouan, Flamary R{'e}mi and Courty Nicolas “CO-Optimal Transport”
Example
import numpy as np from perturbot.match import cotl_numpy n_samples = 300 labels = [0,1,2,3] Xs_dict = {k: np.random.rand(n_samples,2) for k in labels} Xt_dict = {k: np.random.rand(n_samples,1) for k in labels} cotl_numpy(Xs_dict, Xt_dict)
- perturbot.match.get_coupling_cot(data: Tuple[Dict[Number, array], Dict[Number, array]]) Tuple[int | Dict[Number, array], int | Dict] ¶
Returns sample coupling between two datasets X, Y given the labels, disregarding label information.
The function solves the following optimization problem:
\[COOT = \min_{Ts,Tv} \sum_{i,j,k,l} |X1_{i,k}-X2_{j,l}|^{2}*Ts_{i,j}^l*Tv_{k,l}\]- Parameters:
data – (source dataset, target dataset) where source and target datasets are the dictionaries mapping label to np.ndarray with matched labels.
eps – Regularization parameter, relative to the max cost.
- Returns:
T_dict – Optimal Transport coupling between the samples per label
log – Running log
Example
import numpy as np from perturbot.match import get_coupling_cot n_samples = 300 labels = [0,1,2,3] Xs_dict = {k: np.random.rand(n_samples,1) for k in labels} Xt_dict = {k: np.random.rand(n_samples,2) for k in labels} get_coupling_cot((Xs_dict, Xt_dict))
- perturbot.match.get_coupling_cot_sinkhorn(data: Tuple[Dict[Number, array], Dict[Number, array]], eps: float = 0.005, eps2: float | None = None) Tuple[int | Dict[Number, array], int | Dict] ¶
Returns sample coupling between two datasets X, Y given the labels, disregarding label information.
The function solves the following optimization problem:
\[ECOOT = \min_{Ts,Tv} \sum_{i,j,k,l} |X1_{i,k}-X2_{j,l}|^{2}*Ts_{i,j}^l*Tv_{k,l} - \epsilon H(T_s)\]- Parameters:
data – (source dataset, target dataset) where source and target datasets are the dictionaries mapping label to np.ndarray with matched labels.
eps – Regularization parameter, relative to the max cost.
- Returns:
T_dict – Optimal Transport coupling between the samples per label
log – Running log
Example
import numpy as np from perturbot.match import get_coupling_cot_sinkhorn n_samples = 300 labels = [0,1,2,3] Xs_dict = {k: np.random.rand(n_samples,1) for k in labels} Xt_dict = {k: np.random.rand(n_samples,2) for k in labels} get_coupling_cot_sinkhorn((Xs_dict, Xt_dict), 0.05)
- perturbot.match.get_coupling_cotl(data: Tuple[Dict[Number, array], Dict[Number, array]]) Tuple[int | Dict[Number, array], int | Dict] ¶
Returns sample coupling between two datasets X, Y given the labels, disregarding label information.
The function solves the following optimization problem:
\[COOTL = \min_{Ts^1,..Ts^{L},Tv} \sum_{t=1}^L \sum_{i,k \in {i|l_{x_i}=t}, j,l \in {j|l_{y_j}=t}} |X1_{i,k}-X2_{j,l}|^{2}*Ts_{i,j}^l*Tv_{k,l}\]- Parameters:
data – (source dataset, target dataset) where source and target datasets are the dictionaries mapping label to np.ndarray with matched labels.
eps – Regularization parameter, relative to the max cost.
- Returns:
T_dict – Optimal Transport coupling between the samples per label
log – Running log
Example
import numpy as np from perturbot.match import get_coupling_eot_ott n_samples = 300 labels = [0,1,2,3] Xs_dict = {k: np.random.rand(n_samples,1) for k in labels} Xt_dict = {k: np.random.rand(n_samples,1) for k in labels} get_coupling_eot_ott((Xs_dict, Xt_dict), 0.05)
- perturbot.match.get_coupling_cotl_sinkhorn(data: Tuple[Dict[Number, array], Dict[Number, array]], eps: float = 0.005, eps2: float = None) Tuple[Dict[Number, array], Dict] ¶
Returns sample coupling between two datasets X, Y given the labels.
The function solves the following optimization problem:
\[COOTL = \min_{Ts^1,..Ts^{L},Tv} \sum_{t=1}^L \sum_{i,k \in {i|l_{x_i}=t}, j,l \in {j|l_{y_j}=t}} |X1_{i,k}-X2_{j,l}|^{2}*Ts_{i,j}^l*Tv_{k,l} - \epsilon_1 H(Ts) -\epsilon_2 H(Tv)\]- Parameters:
data – (source dataset, target dataset) where source and target datasets are the dictionaries mapping label to np.ndarray with matched labels.
eps – Regularization parameter, relative to the max cost.
- Returns:
T_dict – Optimal Transport coupling between the samples per label
log – Running log
Example
import numpy as np from perturbot.match import get_coupling_eot_ott n_samples = 300 labels = [0,1,2,3] Xs_dict = {k: np.random.rand(n_samples,1) for k in labels} Xt_dict = {k: np.random.rand(n_samples,1) for k in labels} get_coupling_eot_ott((Xs_dict, Xt_dict), 0.05)
- perturbot.match.get_coupling_egw_all_ott(data: Tuple[Dict[Number, array], Dict[Number, array]], eps: float = 0.005) Tuple[Dict[Number, array], Dict] ¶
Returns GW coupling between two datasets X, Y, all-to-all manner disregarding labels.
The function solves the following optimization problem:
\[GW = \min_{T\in C_{p,q}} \sum_{i,j,k,l} |(x_i-x_k)^2 - (y_j-y_l)^2|^{2}*T_{i,j}T_{k,l} - \epsilon H(T)\]- Parameters:
data – (source dataset, target dataset) where source and target datasets are the dictionaries mapping label to np.ndarray with matched labels.
eps – Regularization parameter, relative to the max cost.
- Returns:
T_dict – Optimal Transport coupling between the samples per label
log – Running log
Example
import numpy as np from perturbot.match import get_coupling_egw_all_ott n_samples = 300 labels = [0,1,2,3] Xs_dict = {k: np.random.rand(n_samples,2) for k in labels} Xt_dict = {k: np.random.rand(n_samples,1) for k in labels} get_coupling_egw_all_ott((Xs_dict, Xt_dict), 0.05)
- perturbot.match.get_coupling_egw_labels_ott(data: Tuple[Dict[Number, array], Dict[Number, array]], eps: float = 0.005) Tuple[Dict[Number, array], Dict] ¶
Returns GW coupling between two datasets X, Y given the labels.
The function solves the following optimization problem:
\[\begin{split}EGWL = \min_{T\in C_{p,q}^\ell} \sum_{i,k \in \{i|l_{x_i}=t\}, j,l \in \{j|l_{y_j}=t\}} |(x_i-x_k)^2 - (y_j-y_l)^2|^{2}*T_{i,j}T_{k,l} - \epsilon H(T)\\ C_{p,q}^\ell = \{T | T \in C{p,q}, T_{ij} > 0 \implies l_{x_i} = l_{y_j}\}\end{split}\]- Parameters:
data – (source dataset, target dataset) where source and target datasets are the dictionaries mapping label to np.ndarray with matched labels.
eps – Regularization parameter, relative to the max cost.
- Returns:
T_dict – Optimal Transport coupling between the samples per label
log – Running log
Example
import numpy as np from perturbot.match import get_coupling_egw_labels_ott n_samples = 300 labels = [0,1,2,3] Xs_dict = {k: np.random.rand(n_samples,2) for k in labels} Xt_dict = {k: np.random.rand(n_samples,1) for k in labels} get_coupling_egw_labels_ott((Xs_dict, Xt_dict), 0.05)
- perturbot.match.get_coupling_egw_ott(data: Tuple[Dict[Number, array], Dict[Number, array]], eps: float = 0.005) Tuple[Dict[Number, array], Dict] ¶
Returns GW coupling between two datasets X, Y per label.
The function solves the following optimization problem:
\[\begin{split}GW^l = \min_{T^l} \sum_{i,j,k,l} |(x_i-x_k)^2 - (y_j-y_l)^2|^{2}*T^l_{i,j}T^l_{k,l} - \epsilon H(T^l)\\\end{split}\]- Parameters:
data – (source dataset, target dataset) where source and target datasets are the dictionaries mapping label to np.ndarray with matched labels.
eps – Regularization parameter, relative to the max cost.
- Returns:
T_dict – Optimal Transport coupling between the samples per label
log – Running log
Example
import numpy as np from perturbot.match import get_coupling_egw_ott n_samples = 300 labels = [0,1,2,3] Xs_dict = {k: np.random.rand(n_samples,2) for k in labels} Xt_dict = {k: np.random.rand(n_samples,1) for k in labels} get_coupling_egw_ott((Xs_dict, Xt_dict), 0.05)
- perturbot.match.get_coupling_eot_ott(data: Tuple[Dict[Number, array], Dict[Number, array]], eps: float = 0.005) Tuple[Dict[Number, array], Dict] ¶
Returns OT coupling between two datasets X, Y given the labels, disregarding label information.
The function solves the following optimization problem:
\[\begin{split}EOT = \min_{T\in C_{p,q}} \sum_{i,j} (x_i-y_j)^2 T_{i,j} - \epsilon H(T)\\\end{split}\]- Parameters:
data – (source dataset, target dataset) where source and target datasets are the dictionaries mapping label to np.ndarray with matched labels.
eps – Regularization parameter, relative to the max cost.
- Returns:
T_dict – Optimal Transport coupling between the samples per label
log – Running log
Example
import numpy as np from perturbot.match import get_coupling_eot_ott n_samples = 300 labels = [0,1,2,3] Xs_dict = {k: np.random.rand(n_samples,1) for k in labels} Xt_dict = {k: np.random.rand(n_samples,1) for k in labels} get_coupling_eot_ott((Xs_dict, Xt_dict), 0.05)
- perturbot.match.get_coupling_fot(data: Tuple[Dict[Number, ndarray], Dict[Number, ndarray]], Ts: Dict[Number, ndarray] | ndarray, eps=0.005)¶
Returns GW coupling between features given two datasets X, Y and the sample coupling.
The function solves the following optimization problem:
\[FOT = \min_{Tv} \sum_{i,j,k,l} |X1_{i,k}-X2_{j,l}|^{2}*Ts_{i,j}^l*Tv_{k,l} - \epsilon H(T_v)\]- Parameters:
data – (source dataset, target dataset) where source and target datasets are the dictionaries mapping label to np.ndarray with matched labels.
Ts – Sample-to-sample transport. Per-label transport matched with source dataset, target dataset or a global coupling matrix where the samples are concatenated by the order of labels in data[0].keys().
eps – Regularization parameter, relative to the max cost.
- Returns:
Tv – Feature-to-feature coupling.
log – Running log
Example
import numpy as np from perturbot.match import get_coupling_egw_labels_ott, get_coupling_fot n_samples = 300 labels = [0,1,2,3] Xs_dict = {k: np.random.rand(n_samples,2) for k in labels} Xt_dict = {k: np.random.rand(n_samples,1) for k in labels} Ts, log = get_coupling_egw_labels_ott((Xs_dict, Xt_dict), 0.05) Tv, feature_matching_log = get_coupling_fot((Xs_dict, Xt_dict), Ts, 0.05)
- perturbot.match.get_coupling_gw_labels(data: Tuple[Dict[Number, array], Dict[Number, array]]) Tuple[Dict[Number, array], Dict] ¶
Returns GW coupling between two datasets X, Y given the labels.
The function solves the following optimization problem:
\[\begin{split}GWL = \min_{T\in C_{p,q}^\ell} \sum_{i,k \in \{i|l_{x_i}=t\}, j,l \in \{j|l_{y_j}=t}\} |(x_i-x_k)^2 - (y_j-y_l)^2|^{2}*T_{i,j}T_{k,l} \\ C_{p,q}^\ell = \{T | T \in C{p,q}, T_{ij} > 0 \implies l_{x_i} = l_{y_j}\}\end{split}\]- Parameters:
data – (source dataset, target dataset) where source and target datasets are the dictionaries mapping label to np.ndarray with matched labels.
- Returns:
T_dict – Optimal Transport coupling between the samples per label
log – Running log
Example
import numpy as np from perturbot.match import get_coupling_gw_labels n_samples = 300 labels = [0,1,2,3] Xs_dict = {k: np.random.rand(n_samples,2) for k in labels} Xt_dict = {k: np.random.rand(n_samples,1) for k in labels} get_coupling_gw_labels((Xs_dict, Xt_dict))
- perturbot.match.get_coupling_leot_ott(data: Tuple[Dict[Number, array], Dict[Number, array]], eps: float = 0.005) Tuple[Dict[Number, array], Dict] ¶
Returns OT coupling between two datasets X, Y per label.
The function solves the following optimization problem:
\[\begin{split}EOT^l = \min_{T^l} \sum_{i,j} (x_i-y_j)^2 T^l_{i,j} - \epsilon H(T^l)\\\end{split}\]- Parameters:
data – (source dataset, target dataset) where source and target datasets are the dictionaries mapping label to np.ndarray with matched labels.
eps – Regularization parameter, relative to the max cost.
- Returns:
T_dict – Optimal Transport coupling between the samples per label
log – Running log
Example
import numpy as np from perturbot.match import get_coupling_leot_ott n_samples = 300 labels = [0,1,2,3] Xs_dict = {k: np.random.rand(n_samples,1) for k in labels} Xt_dict = {k: np.random.rand(n_samples,1) for k in labels} get_coupling_leot_ott((Xs_dict, Xt_dict), 0.05)
- perturbot.predict.train_mlp(train_data: Tuple[Dict[Number, array], Dict[Number, array]], T_dict: Dict[Number, array]) Tuple[Module, Dict] ¶
Trains MLP predicting Y from X given the labeled sample-matching T_dict.
- Parameters:
data – (source dataset, target dataset) where source and target datasets are the dictionaries mapping label to np.ndarray with matched labels.
T_dict – Optimal Transport coupling between the samples per label
- Returns:
model – Trained predictor
log – Training log