Fine-tuning Borzoi to create a Decima model

import glob
import anndata
import scanpy as sc
import pandas as pd
import bioframe as bf
import os
inputdir = "./data"
outdir = "./example"
ad_file_path = os.path.join(inputdir, "data.h5ad")
h5_file_path = os.path.join(outdir, "data.h5")

1. Load input anndata file

The input anndata file needs to be in the format (pseudobulks x genes).

ad = sc.read(ad_file_path)
ad
AnnData object with n_obs × n_vars = 50 × 921
    obs: 'cell_type', 'tissue', 'disease', 'study'
    var: 'chrom', 'start', 'end', 'strand', 'gene_start', 'gene_end', 'gene_length', 'gene_mask_start', 'gene_mask_end', 'dataset'
    uns: 'log1p'

.obs should be a dataframe with a unique index per pseudobulk. You can also include other columns with metadata about the pseudobulks, e.g. cell type, tissue, disease, study, number of cells, total counts.

Note that the original Decima model does NOT separate pseudobulks by sample, i.e. different samples from the same cell type, tissue, disease and study were merged. We also recommend filtering out pseudobulks with few cells or low read count.

ad.obs.head()
cell_type tissue disease study
pseudobulk_0 ct_0 t_0 d_0 st_0
pseudobulk_1 ct_0 t_0 d_1 st_0
pseudobulk_2 ct_0 t_0 d_2 st_1
pseudobulk_3 ct_0 t_0 d_0 st_1
pseudobulk_4 ct_0 t_0 d_1 st_2

.var should be a dataframe with a unique index per gene. The index can be the gene name or Ensembl ID, as long as it is unique. Other essential columns are: chrom, start, end and strand (the gene coordinates).

You can also include other columns with metadata about the genes, e.g. Ensembl ID, type of gene.

ad.var.head()
chrom start end strand gene_start gene_end gene_length gene_mask_start gene_mask_end dataset
gene_0 chr1 26354840 26879128 + 26518680 27042968 524288 163840 524288 train
gene_1 chr19 41111417 41635705 - 40947577 41471865 524288 163840 524288 train
gene_2 chr1 79774026 80298314 - 79610186 80134474 524288 163840 524288 train
gene_4 chr16 3741368 4265656 - 3577528 4101816 524288 163840 524288 train
gene_5 chr10 22659481 23183769 + 22823321 23347609 524288 163840 524288 train

.X should contain the total counts per gene and pseudobulk. These should be non-negative integers.

ad.X[:5, :5]
array([[0.       , 7.2926292, 7.2926292, 7.2926292, 7.2926292],
       [7.3133874, 7.3133874, 0.       , 7.3133874, 7.3133874],
       [7.299993 , 7.299993 , 7.299993 , 7.299993 , 0.       ],
       [7.299993 , 0.       , 7.299993 , 7.299993 , 0.       ],
       [7.3376517, 7.3376517, 0.       , 7.3376517, 7.3376517]],
      dtype=float32)

2. Normalize and log transform data

We first transform the counts to log(CPM+1) values. CPM = Counts Per Million.

sc.pp.normalize_total(ad, target_sum=1e6)
sc.pp.log1p(ad)
WARNING: adata.X seems to be already log-transformed.
ad.X[:5, :5]
array([[0.       , 7.295568 , 7.295568 , 7.295568 , 7.295568 ],
       [7.316388 , 7.316388 , 0.       , 7.316388 , 7.316388 ],
       [7.3014727, 7.3014727, 7.3014727, 7.3014727, 0.       ],
       [7.3014727, 0.       , 7.3014727, 7.3014727, 0.       ],
       [7.3407264, 7.3407264, 0.       , 7.3407264, 7.3407264]],
      dtype=float32)

3. Create intervals surrounding genes

Decima is trained on 524,288 bp sequence surrounding the genes. Therefore, we have to take the given gene coordinates and extend them to create intervals of this length.

from decima.data.preprocess import var_to_intervals
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
ad.var.head()
chrom start end strand gene_start gene_end gene_length gene_mask_start gene_mask_end dataset
gene_0 chr1 26354840 26879128 + 26518680 27042968 524288 163840 524288 train
gene_1 chr19 41111417 41635705 - 40947577 41471865 524288 163840 524288 train
gene_2 chr1 79774026 80298314 - 79610186 80134474 524288 163840 524288 train
gene_4 chr16 3741368 4265656 - 3577528 4101816 524288 163840 524288 train
gene_5 chr10 22659481 23183769 + 22823321 23347609 524288 163840 524288 train

First, we copy the start and end columns to gene_start and gene_end. We also create a new column gene_length.

ad.var["gene_start"] = ad.var.start.tolist()
ad.var["gene_end"] = ad.var.end.tolist()
ad.var["gene_length"] = ad.var["gene_end"] - ad.var["gene_start"]
ad.var.head()
chrom start end strand gene_start gene_end gene_length gene_mask_start gene_mask_end dataset
gene_0 chr1 26354840 26879128 + 26354840 26879128 524288 163840 524288 train
gene_1 chr19 41111417 41635705 - 41111417 41635705 524288 163840 524288 train
gene_2 chr1 79774026 80298314 - 79774026 80298314 524288 163840 524288 train
gene_4 chr16 3741368 4265656 - 3741368 4265656 524288 163840 524288 train
gene_5 chr10 22659481 23183769 + 22659481 23183769 524288 163840 524288 train

Now, we extend the gene coordinates to create enclosing intervals:

ad = var_to_intervals(ad, chr_end_pad=10000, genome="hg38")
# Replace genome name if necessary
The interval size is 524288 bases. Of these, 163840 will be upstream of the gene start and 360448 will be downstream of the gene start.
0 intervals extended beyond the chromosome start and have been shifted
1 intervals extended beyond the chromosome end and have been shifted
1 intervals did not extend far enough upstream of the TSS and have been dropped
ad.var.head()
chrom start end strand gene_start gene_end gene_length gene_mask_start gene_mask_end dataset
gene_0 chr1 26191000 26715288 + 26354840 26879128 524288 163840 524288 train
gene_1 chr19 41275257 41799545 - 41111417 41635705 524288 163840 524288 train
gene_2 chr1 79937866 80462154 - 79774026 80298314 524288 163840 524288 train
gene_4 chr16 3905208 4429496 - 3741368 4265656 524288 163840 524288 train
gene_5 chr10 22495641 23019929 + 22659481 23183769 524288 163840 524288 train

You see that the columns start and end now contain the start and end coordinates for the 524,288 bp intervals.

3. Split genes into training, validation and test sets

We load the coordinates of the genomic regions used to train Borzoi:

splits_file = "https://raw.githubusercontent.com/calico/borzoi/main/data/sequences_human.bed.gz"
# replace human with mouse for mm10 splits
splits = pd.read_table(splits_file, header=None, names=["chrom", "start", "end", "fold"])
splits.head()
chrom start end fold
0 chr4 82524421 82721029 fold0
1 chr13 18604798 18801406 fold0
2 chr2 189923408 190120016 fold0
3 chr10 59875743 60072351 fold0
4 chr1 117109467 117306075 fold0

Now, we overlap our gene intervals with these regions:

overlaps = bf.overlap(ad.var.reset_index(names="gene"), splits, how="left")
overlaps = overlaps[["gene", "fold_"]].drop_duplicates().astype(str)
overlaps.head()
gene fold_
0 gene_0 fold5
15 gene_1 fold0
30 gene_2 fold0
44 gene_4 fold2
59 gene_5 fold2

Based on the overlap, we divide our gene intervals into training, validation and test sets.

test_genes = overlaps.gene[overlaps.fold_ == "fold3"].tolist()
val_genes = overlaps.gene[overlaps.fold_ == "fold4"].tolist()
train_genes = set(overlaps.gene).difference(set(test_genes).union(val_genes))

And add this information back to ad.var.

ad.var["dataset"] = "test"
ad.var.loc[ad.var.index.isin(val_genes), "dataset"] = "val"
ad.var.loc[ad.var.index.isin(train_genes), "dataset"] = "train"
/tmp/slurmjob.14477843/ipykernel_3516462/3109841685.py:1: ImplicitModificationWarning: Trying to modify attribute `.var` of view, initializing view as actual.
ad.var.head()
chrom start end strand gene_start gene_end gene_length gene_mask_start gene_mask_end dataset
gene_0 chr1 26191000 26715288 + 26354840 26879128 524288 163840 524288 train
gene_1 chr19 41275257 41799545 - 41111417 41635705 524288 163840 524288 train
gene_2 chr1 79937866 80462154 - 79774026 80298314 524288 163840 524288 train
gene_4 chr16 3905208 4429496 - 3741368 4265656 524288 163840 524288 train
gene_5 chr10 22495641 23019929 + 22659481 23183769 524288 163840 524288 train
ad.var.dataset.value_counts()
dataset
train    766
test      83
val       71
Name: count, dtype: int64

We have now divided the 1000 genes in our dataset into separate sets to be used for training, validation and testing.

4. Save processed anndata

We will save the processed anndata file containing these intervals and data splits.

ad.write_h5ad(ad_file_path)

5. Create an hdf5 file

To train Decima, we need to extract the genomic sequences for all the intervals and convert them to one-hot encoded format. We save these one-hot encoded inputs to an hdf5 file.

from decima.data.write_hdf5 import write_hdf5
! mkdir -p example
write_hdf5(file=h5_file_path, ad=ad, pad=5000, genome="hg38")
# Change genome name if necessary
Writing metadata
Writing task indices
Writing genes array of shape: (920, 2)
Writing labels array of shape: (920, 50, 1)
Making gene masks
Writing mask array of shape: (920, 534288)
Encoding sequences
Writing sequence array of shape: (920, 534288)
Done!

6. Set training parameters

# Learning rate default=0.001
lr = 5e-5
# Total weight parameter for the loss function
total_weight = 1e-4
# Gradient accumulation steps
grad = 5
# batch-size. default=4
bs = 4
# max-seq-shift. default=5000
shift = 5000
# Number of epochs. Default 1
epochs = 15

# logger
logger = "wandb"  # Change to csv to save logs locally

# Number of workers default=16
workers = 16

7. Generate training commands

cmds = []

for model in range(4):
    name = f"finetune_test_{model}"
    device = model

    cmd = (
        f"decima finetune --name {name} "
        + f"--model {model} --device {device} "
        + f"--matrix-file {ad_file_path} --h5-file {h5_file_path} "
        + f"--outdir {outdir} --learning-rate {lr} "
        + f"--loss-total-weight {total_weight} --gradient-accumulation {grad} "
        + f"--batch-size {bs} --max-seq-shift {shift} "
        + f"--epochs {epochs} --logger {logger} --num-workers {workers}"
    )
    cmds.append(cmd)
for cmd in cmds:
    print(cmd)
decima finetune --name finetune_test_0 --model 0 --device 0 --matrix-file ./data/data.h5ad --h5-file ./example/data.h5 --outdir ./example --learning-rate 5e-05 --loss-total-weight 0.0001 --gradient-accumulation 5 --batch-size 4 --max-seq-shift 5000 --epochs 15 --logger wandb --num-workers 16
decima finetune --name finetune_test_1 --model 1 --device 1 --matrix-file ./data/data.h5ad --h5-file ./example/data.h5 --outdir ./example --learning-rate 5e-05 --loss-total-weight 0.0001 --gradient-accumulation 5 --batch-size 4 --max-seq-shift 5000 --epochs 15 --logger wandb --num-workers 16
decima finetune --name finetune_test_2 --model 2 --device 2 --matrix-file ./data/data.h5ad --h5-file ./example/data.h5 --outdir ./example --learning-rate 5e-05 --loss-total-weight 0.0001 --gradient-accumulation 5 --batch-size 4 --max-seq-shift 5000 --epochs 15 --logger wandb --num-workers 16
decima finetune --name finetune_test_3 --model 3 --device 3 --matrix-file ./data/data.h5ad --h5-file ./example/data.h5 --outdir ./example --learning-rate 5e-05 --loss-total-weight 0.0001 --gradient-accumulation 5 --batch-size 4 --max-seq-shift 5000 --epochs 15 --logger wandb --num-workers 16

Here, we train the model for 1 epoch for quick progressing in tutorial. Run the training for more epochs in your training.

! CUDA_VISIBLE_DEVICES=0 decima finetune \
--name finetune_test_0 \
--model 0 \
--device 0 \
--matrix-file {ad_file_path} \
--h5-file {h5_file_path} \
--outdir {outdir} \
--learning-rate {lr} \
--loss-total-weight {total_weight} \
--gradient-accumulation {grad} \
--batch-size 1 \
--max-seq-shift {shift} \
--epochs 1 \
--logger {logger} \
--num-workers {workers}
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
decima - INFO - Data paths: matrix_file=./data/data.h5ad, h5_file=./example/data.h5
decima - INFO - Reading anndata
decima - INFO - Making dataset objects
decima - INFO - train_params: {'batch_size': 1, 'num_workers': 16, 'devices': 0, 'logger': 'wandb', 'save_dir': './example', 'max_epochs': 1, 'lr': 5e-05, 'total_weight': 0.0001, 'accumulate_grad_batches': 5, 'loss': 'poisson_multinomial', 'clip': 0.0, 'save_top_k': 1, 'pin_memory': True}
decima - INFO - model_params: {'n_tasks': 50, 'init_borzoi': True, 'replicate': '0'}
decima - INFO - Initializing model
decima - INFO - Initializing weights from Borzoi model using wandb for replicate: 0
wandb: Currently logged in as: mhcelik (mhcw) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Downloading large artifact 'human_state_dict_fold0:latest', 709.30MB. 1 files...
wandb:   1 of 1 files downloaded.  
Done. 00:00:01.7 (406.1MB/s)
decima - INFO - Connecting to wandb.
wandb: Currently logged in as: mhcelik (mhcw) to https://genentech.wandb.io. Use `wandb login --relogin` to force relogin
wandb:  Waiting for wandb.init()...
m
wandb:  setting up run g20ya0al (0.2s)
m
wandb: Tracking run with wandb version 0.22.2
wandb: Run data is saved locally in finetune_test_0/wandb/run-20251121_143055-g20ya0al
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run finetune_test_0
wandb: ⭐️ View project at https://genentech.wandb.io/grelu/decima
wandb: 🚀 View run at https://genentech.wandb.io/grelu/decima/runs/g20ya0al
decima - INFO - Training
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/torch/__init__.py:1617: UserWarning: Please use the new API settings to control TF32 behavior, such as torch.backends.cudnn.conv.fp32_precision = 'tf32' or torch.backends.cuda.matmul.fp32_precision = 'ieee'. Old settings, e.g, torch.backends.cuda.matmul.allow_tf32 = True, torch.backends.cudnn.allow_tf32 = True, allowTF32CuDNN() and allowTF32CuBLAS() will be deprecated after Pytorch 2.9. Please see https://pytorch.org/docs/main/notes/cuda.html#tensorfloat-32-tf32-on-ampere-and-later-devices (Triggered internally at /pytorch/aten/src/ATen/Context.cpp:80.)
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/torch/utils/data/dataloader.py:627: UserWarning: This DataLoader will create 16 worker processes in total. Our suggested max number of worker in current system is 4, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pytorch_lightning/loggers/wandb.py:397: UserWarning: There is a wandb run already in progress and newly created instances of `WandbLogger` will reuse this run. If this is not desired, call `wandb.finish()` before instantiating `WandbLogger`.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
SLURM auto-requeueing enabled. Setting signal handlers.
Validation: |                                             | 0/? [00:00<?, ?it/s]
Validation: |                                             | 0/? [00:00<?, ?it/s]
Validation DataLoader 0:   0%|                           | 0/71 [00:00<?, ?it/s]
Multinomial: 17.704072952270508, Poisson: -0.08451984077692032
Validation DataLoader 0:   1%|▎                  | 1/71 [00:04<05:11,  0.22it/s]
Multinomial: 17.50640296936035, Poisson: -0.081619992852211

Validation DataLoader 0:   3%|▌                  | 2/71 [00:04<02:37,  0.44it/s]
Multinomial: 23.562318801879883, Poisson: -0.11422758549451828

Validation DataLoader 0:   4%|▊                  | 3/71 [00:04<01:45,  0.65it/s]
Multinomial: 18.25124168395996, Poisson: -0.08771996945142746

Validation DataLoader 0:   6%|█                  | 4/71 [00:04<01:19,  0.84it/s]
Multinomial: 22.056081771850586, Poisson: -0.10517071187496185

Validation DataLoader 0:   7%|█▎                 | 5/71 [00:04<01:03,  1.04it/s]
Multinomial: 18.16107940673828, Poisson: -0.08438651263713837

Validation DataLoader 0:   8%|█▌                 | 6/71 [00:04<00:53,  1.22it/s]
Multinomial: 17.731626510620117, Poisson: -0.084895059466362

Validation DataLoader 0:  10%|█▊                 | 7/71 [00:04<00:45,  1.40it/s]
Multinomial: 22.352930068969727, Poisson: -0.10843678563833237

Validation DataLoader 0:  11%|██▏                | 8/71 [00:05<00:40,  1.57it/s]
Multinomial: 23.15188217163086, Poisson: -0.1109374463558197

Validation DataLoader 0:  13%|██▍                | 9/71 [00:05<00:35,  1.74it/s]
Multinomial: 20.722728729248047, Poisson: -0.09945794194936752

Validation DataLoader 0:  14%|██▌               | 10/71 [00:05<00:32,  1.90it/s]
Multinomial: 17.389652252197266, Poisson: -0.0819193497300148

Validation DataLoader 0:  15%|██▊               | 11/71 [00:05<00:29,  2.06it/s]
Multinomial: 21.801376342773438, Poisson: -0.10459177196025848

Validation DataLoader 0:  17%|███               | 12/71 [00:05<00:26,  2.21it/s]
Multinomial: 20.208261489868164, Poisson: -0.09648436307907104

Validation DataLoader 0:  18%|███▎              | 13/71 [00:05<00:24,  2.36it/s]
Multinomial: 21.97159194946289, Poisson: -0.10451767593622208

Validation DataLoader 0:  20%|███▌              | 14/71 [00:05<00:22,  2.50it/s]
Multinomial: 24.06745719909668, Poisson: -0.11637863516807556

Validation DataLoader 0:  21%|███▊              | 15/71 [00:05<00:21,  2.64it/s]
Multinomial: 20.875829696655273, Poisson: -0.09984245151281357

Validation DataLoader 0:  23%|████              | 16/71 [00:05<00:19,  2.77it/s]
Multinomial: 16.660158157348633, Poisson: -0.07892918586730957

Validation DataLoader 0:  24%|████▎             | 17/71 [00:05<00:18,  2.90it/s]
Multinomial: 23.05860137939453, Poisson: -0.11131791770458221

Validation DataLoader 0:  25%|████▌             | 18/71 [00:05<00:17,  3.02it/s]
Multinomial: 20.243940353393555, Poisson: -0.09636309742927551

Validation DataLoader 0:  27%|████▊             | 19/71 [00:06<00:16,  3.15it/s]
Multinomial: 24.765872955322266, Poisson: -0.11982230842113495

Validation DataLoader 0:  28%|█████             | 20/71 [00:06<00:15,  3.27it/s]
Multinomial: 20.902503967285156, Poisson: -0.09954728931188583

Validation DataLoader 0:  30%|█████▎            | 21/71 [00:06<00:14,  3.38it/s]
Multinomial: 24.31157875061035, Poisson: -0.11637787520885468

Validation DataLoader 0:  31%|█████▌            | 22/71 [00:06<00:14,  3.49it/s]
Multinomial: 19.60178565979004, Poisson: -0.09363999217748642

Validation DataLoader 0:  32%|█████▊            | 23/71 [00:06<00:13,  3.60it/s]
Multinomial: 19.007802963256836, Poisson: -0.09034327417612076

Validation DataLoader 0:  34%|██████            | 24/71 [00:06<00:12,  3.71it/s]
Multinomial: 20.0886173248291, Poisson: -0.09662938863039017

Validation DataLoader 0:  35%|██████▎           | 25/71 [00:06<00:12,  3.81it/s]
Multinomial: 19.589750289916992, Poisson: -0.09355800598859787

Validation DataLoader 0:  37%|██████▌           | 26/71 [00:06<00:11,  3.91it/s]
Multinomial: 21.216665267944336, Poisson: -0.10239789634943008

Validation DataLoader 0:  38%|██████▊           | 27/71 [00:06<00:10,  4.01it/s]
Multinomial: 17.116254806518555, Poisson: -0.08170726150274277

Validation DataLoader 0:  39%|███████           | 28/71 [00:06<00:10,  4.10it/s]
Multinomial: 20.821630477905273, Poisson: -0.09940409660339355

Validation DataLoader 0:  41%|███████▎          | 29/71 [00:06<00:10,  4.20it/s]
Multinomial: 23.517545700073242, Poisson: -0.113502636551857

Validation DataLoader 0:  42%|███████▌          | 30/71 [00:06<00:09,  4.29it/s]
Multinomial: 22.03712272644043, Poisson: -0.10526878386735916

Validation DataLoader 0:  44%|███████▊          | 31/71 [00:07<00:09,  4.38it/s]
Multinomial: 19.719432830810547, Poisson: -0.09343377500772476

Validation DataLoader 0:  45%|████████          | 32/71 [00:07<00:08,  4.46it/s]
Multinomial: 21.70380401611328, Poisson: -0.10528752207756042

Validation DataLoader 0:  46%|████████▎         | 33/71 [00:07<00:08,  4.55it/s]
Multinomial: 19.4345760345459, Poisson: -0.0937180295586586

Validation DataLoader 0:  48%|████████▌         | 34/71 [00:07<00:07,  4.63it/s]
Multinomial: 19.56287384033203, Poisson: -0.0933140367269516

Validation DataLoader 0:  49%|████████▊         | 35/71 [00:07<00:07,  4.71it/s]
Multinomial: 24.247249603271484, Poisson: -0.11643800884485245

Validation DataLoader 0:  51%|█████████▏        | 36/71 [00:07<00:07,  4.79it/s]
Multinomial: 16.163373947143555, Poisson: -0.07618056982755661

Validation DataLoader 0:  52%|█████████▍        | 37/71 [00:07<00:06,  4.87it/s]
Multinomial: 23.00337028503418, Poisson: -0.11091884225606918

Validation DataLoader 0:  54%|█████████▋        | 38/71 [00:07<00:06,  4.94it/s]
Multinomial: 20.206920623779297, Poisson: -0.09684920310974121

Validation DataLoader 0:  55%|█████████▉        | 39/71 [00:07<00:06,  5.01it/s]
Multinomial: 24.414379119873047, Poisson: -0.11682412773370743

Validation DataLoader 0:  56%|██████████▏       | 40/71 [00:07<00:06,  5.09it/s]
Multinomial: 18.447242736816406, Poisson: -0.08776233345270157

Validation DataLoader 0:  58%|██████████▍       | 41/71 [00:07<00:05,  5.16it/s]
Multinomial: 15.436498641967773, Poisson: -0.07314550876617432

Validation DataLoader 0:  59%|██████████▋       | 42/71 [00:08<00:05,  5.23it/s]
Multinomial: 22.46692657470703, Poisson: -0.10728458315134048

Validation DataLoader 0:  61%|██████████▉       | 43/71 [00:08<00:05,  5.29it/s]
Multinomial: 24.299116134643555, Poisson: -0.11672092229127884

Validation DataLoader 0:  62%|███████████▏      | 44/71 [00:08<00:05,  5.36it/s]
Multinomial: 22.967220306396484, Poisson: -0.11124279350042343

Validation DataLoader 0:  63%|███████████▍      | 45/71 [00:08<00:04,  5.43it/s]
Multinomial: 21.898027420043945, Poisson: -0.10534106194972992
Validation DataLoader 0:  65%|███████████▋      | 46/71 [00:08<00:04,  5.49it/s]
Multinomial: 21.183794021606445, Poisson: -0.10242842882871628

Validation DataLoader 0:  66%|███████████▉      | 47/71 [00:08<00:04,  5.55it/s]
Multinomial: 24.188167572021484, Poisson: -0.11698035895824432

Validation DataLoader 0:  68%|████████████▏     | 48/71 [00:08<00:04,  5.61it/s]
Multinomial: 19.33690071105957, Poisson: -0.09073223918676376

Validation DataLoader 0:  69%|████████████▍     | 49/71 [00:08<00:03,  5.67it/s]
Multinomial: 19.042579650878906, Poisson: -0.09028337150812149

Validation DataLoader 0:  70%|████████████▋     | 50/71 [00:08<00:03,  5.73it/s]
Multinomial: 20.37116241455078, Poisson: -0.09634491056203842

Validation DataLoader 0:  72%|████████████▉     | 51/71 [00:08<00:03,  5.79it/s]
Multinomial: 21.943159103393555, Poisson: -0.10504303872585297

Validation DataLoader 0:  73%|█████████████▏    | 52/71 [00:08<00:03,  5.85it/s]
Multinomial: 23.517465591430664, Poisson: -0.11390405148267746

Validation DataLoader 0:  75%|█████████████▍    | 53/71 [00:08<00:03,  5.90it/s]
Multinomial: 21.979843139648438, Poisson: -0.10537681728601456

Validation DataLoader 0:  76%|█████████████▋    | 54/71 [00:09<00:02,  5.96it/s]
Multinomial: 21.226818084716797, Poisson: -0.10194623470306396

Validation DataLoader 0:  77%|█████████████▉    | 55/71 [00:09<00:02,  6.01it/s]
Multinomial: 18.263525009155273, Poisson: -0.08776170760393143
Validation DataLoader 0:  79%|██████████████▏   | 56/71 [00:09<00:02,  6.06it/s]
Multinomial: 20.565263748168945, Poisson: -0.09930194169282913

Validation DataLoader 0:  80%|██████████████▍   | 57/71 [00:09<00:02,  6.11it/s]
Multinomial: 20.983007431030273, Poisson: -0.09955137223005295

Validation DataLoader 0:  82%|██████████████▋   | 58/71 [00:09<00:02,  6.16it/s]
Multinomial: 24.88779640197754, Poisson: -0.1202164888381958

Validation DataLoader 0:  83%|██████████████▉   | 59/71 [00:09<00:01,  6.21it/s]
Multinomial: 23.5961856842041, Poisson: -0.11393663287162781

Validation DataLoader 0:  85%|███████████████▏  | 60/71 [00:09<00:01,  6.26it/s]
Multinomial: 21.301002502441406, Poisson: -0.10222788155078888

Validation DataLoader 0:  86%|███████████████▍  | 61/71 [00:09<00:01,  6.31it/s]
Multinomial: 16.259353637695312, Poisson: -0.07613859325647354

Validation DataLoader 0:  87%|███████████████▋  | 62/71 [00:09<00:01,  6.36it/s]
Multinomial: 20.09466552734375, Poisson: -0.09604513645172119

Validation DataLoader 0:  89%|███████████████▉  | 63/71 [00:09<00:01,  6.40it/s]
Multinomial: 20.736059188842773, Poisson: -0.09930893778800964

Validation DataLoader 0:  90%|████████████████▏ | 64/71 [00:09<00:01,  6.45it/s]
Multinomial: 21.481731414794922, Poisson: -0.10254371166229248

Validation DataLoader 0:  92%|████████████████▍ | 65/71 [00:10<00:00,  6.49it/s]
Multinomial: 22.471792221069336, Poisson: -0.10787025094032288

Validation DataLoader 0:  93%|████████████████▋ | 66/71 [00:10<00:00,  6.54it/s]
Multinomial: 20.083730697631836, Poisson: -0.09609249979257584

Validation DataLoader 0:  94%|████████████████▉ | 67/71 [00:10<00:00,  6.58it/s]
Multinomial: 17.917104721069336, Poisson: -0.08452223241329193

Validation DataLoader 0:  96%|█████████████████▏| 68/71 [00:10<00:00,  6.62it/s]
Multinomial: 21.31960678100586, Poisson: -0.10256922245025635

Validation DataLoader 0:  97%|█████████████████▍| 69/71 [00:10<00:00,  6.66it/s]
Multinomial: 20.782426834106445, Poisson: -0.09927929937839508

Validation DataLoader 0:  99%|█████████████████▋| 70/71 [00:10<00:00,  6.70it/s]
Multinomial: 24.292436599731445, Poisson: -0.11702897399663925

Validation DataLoader 0: 100%|██████████████████| 71/71 [00:10<00:00,  6.75it/s]
Validation DataLoader 0: 100%|██████████████████| 71/71 [00:11<00:00,  6.41it/s]
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃      Validate metric             DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│     val_gene_pearson          0.0249176025390625     │
│         val_loss              20.776832580566406     │
│          val_mse               28.61081886291504     │
│     val_task_pearson         0.019344473257660866    │
└───────────────────────────┴───────────────────────────┘
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pytorch_lightning/utilities/model_summary/model_summary.py:231: UserWarning: Precision 16-mixed is not supported by the model summary.  Estimated model size in MB will not be accurate. Using 32 bits instead.

  | Name            | Type                           | Params | Mode 
---------------------------------------------------------------------------
0 | model           | DecimaModel                    | 171 M  | train
1 | loss            | TaskWisePoissonMultinomialLoss | 0      | train
2 | val_metrics     | MetricCollection               | 0      | train
3 | test_metrics    | MetricCollection               | 0      | train
4 | warning_counter | WarningCounter                 | 0      | train
5 | transform       | Identity                       | 0      | train
---------------------------------------------------------------------------
171 M     Trainable params
0         Non-trainable params
171 M     Total params
685.503   Total estimated model params size (MB)
401       Modules in train mode
0         Modules in eval mode
SLURM auto-requeueing enabled. Setting signal handlers.

Sanity Checking: |                                        | 0/? [00:00<?, ?it/s]/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/torch/utils/data/dataloader.py:627: UserWarning: This DataLoader will create 16 worker processes in total. Our suggested max number of worker in current system is 4, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
Sanity Checking: |                                        | 0/? [00:00<?, ?it/s]
Sanity Checking DataLoader 0:   0%|                       | 0/2 [00:00<?, ?it/s]
Multinomial: 17.704072952270508, Poisson: -0.08451984077692032
Sanity Checking DataLoader 0:  50%|███████▌       | 1/2 [00:00<00:00,  3.99it/s]
Multinomial: 17.50640296936035, Poisson: -0.081619992852211

Sanity Checking DataLoader 0: 100%|███████████████| 2/2 [00:00<00:00,  5.90it/s]
                                                                                
Training: |                                               | 0/? [00:00<?, ?it/s]
Training: |                                               | 0/? [00:00<?, ?it/s]
Epoch 0:   0%|                                          | 0/766 [00:00<?, ?it/s]
Multinomial: 19.381006240844727, Poisson: -0.09250339865684509
Epoch 0:   0%|                                  | 1/766 [00:02<34:37,  0.37it/s]
Epoch 0:   0%| | 1/766 [00:02<34:44,  0.37it/s, v_num=a0al, train_loss_step=19.3
Multinomial: 20.657838821411133, Poisson: -0.09813400357961655
Epoch 0:   0%| | 2/766 [00:02<18:03,  0.71it/s, v_num=a0al, train_loss_step=19.3
Epoch 0:   0%| | 2/766 [00:02<18:52,  0.67it/s, v_num=a0al, train_loss_step=20.6
Multinomial: 17.885753631591797, Poisson: -0.08482938259840012
Epoch 0:   0%| | 3/766 [00:03<13:01,  0.98it/s, v_num=a0al, train_loss_step=20.6
Epoch 0:   0%| | 3/766 [00:03<13:35,  0.94it/s, v_num=a0al, train_loss_step=17.8
Multinomial: 21.289833068847656, Poisson: -0.10162309557199478

Epoch 0:   1%| | 4/766 [00:03<10:30,  1.21it/s, v_num=a0al, train_loss_step=17.8
Epoch 0:   1%| | 4/766 [00:03<10:56,  1.16it/s, v_num=a0al, train_loss_step=21.2
Multinomial: 19.540626525878906, Poisson: -0.0922895073890686
Epoch 0:   1%| | 5/766 [00:03<09:53,  1.28it/s, v_num=a0al, train_loss_step=21.2
Epoch 0:   1%| | 5/766 [00:03<09:53,  1.28it/s, v_num=a0al, train_loss_step=19.4
Multinomial: 19.395416259765625, Poisson: -0.09235671162605286

Epoch 0:   1%| | 6/766 [00:04<08:27,  1.50it/s, v_num=a0al, train_loss_step=19.4
Epoch 0:   1%| | 6/766 [00:04<08:44,  1.45it/s, v_num=a0al, train_loss_step=19.3
Multinomial: 23.51851463317871, Poisson: -0.11232350766658783
Epoch 0:   1%| | 7/766 [00:04<07:40,  1.65it/s, v_num=a0al, train_loss_step=19.3
Epoch 0:   1%| | 7/766 [00:04<07:54,  1.60it/s, v_num=a0al, train_loss_step=23.4
Multinomial: 22.36002540588379, Poisson: -0.10656161606311798
Epoch 0:   1%| | 8/766 [00:04<07:04,  1.78it/s, v_num=a0al, train_loss_step=23.4
Epoch 0:   1%| | 8/766 [00:04<07:17,  1.73it/s, v_num=a0al, train_loss_step=22.3
Multinomial: 24.08615493774414, Poisson: -0.11497366428375244

Epoch 0:   1%| | 9/766 [00:04<06:37,  1.91it/s, v_num=a0al, train_loss_step=22.3
Epoch 0:   1%| | 9/766 [00:04<06:48,  1.85it/s, v_num=a0al, train_loss_step=24.0
Multinomial: 18.516836166381836, Poisson: -0.08695206046104431
Epoch 0:   1%| | 10/766 [00:05<06:25,  1.96it/s, v_num=a0al, train_loss_step=24.
Epoch 0:   1%| | 10/766 [00:05<06:26,  1.96it/s, v_num=a0al, train_loss_step=18.
Multinomial: 22.552202224731445, Poisson: -0.10729885846376419

Epoch 0:   1%| | 11/766 [00:05<05:57,  2.11it/s, v_num=a0al, train_loss_step=18.
Epoch 0:   1%| | 11/766 [00:05<06:06,  2.06it/s, v_num=a0al, train_loss_step=22.
Multinomial: 18.964632034301758, Poisson: -0.0897383913397789
Epoch 0:   2%| | 12/766 [00:05<05:42,  2.20it/s, v_num=a0al, train_loss_step=22.
Epoch 0:   2%| | 12/766 [00:05<05:50,  2.15it/s, v_num=a0al, train_loss_step=18.
Multinomial: 18.290241241455078, Poisson: -0.08695797622203827
Epoch 0:   2%| | 13/766 [00:05<05:29,  2.28it/s, v_num=a0al, train_loss_step=18.
Epoch 0:   2%| | 13/766 [00:05<05:37,  2.23it/s, v_num=a0al, train_loss_step=18.
Multinomial: 22.30130958557129, Poisson: -0.10726857930421829

Epoch 0:   2%| | 14/766 [00:05<05:18,  2.36it/s, v_num=a0al, train_loss_step=18.
Epoch 0:   2%| | 14/766 [00:06<05:25,  2.31it/s, v_num=a0al, train_loss_step=22.
Multinomial: 19.5349063873291, Poisson: -0.09275592118501663
Epoch 0:   2%| | 15/766 [00:06<05:15,  2.38it/s, v_num=a0al, train_loss_step=22.
Epoch 0:   2%| | 15/766 [00:06<05:16,  2.38it/s, v_num=a0al, train_loss_step=19.
Multinomial: 19.408832550048828, Poisson: -0.09224316477775574

Epoch 0:   2%| | 16/766 [00:06<05:00,  2.49it/s, v_num=a0al, train_loss_step=19.
Epoch 0:   2%| | 16/766 [00:06<05:07,  2.44it/s, v_num=a0al, train_loss_step=19.
Multinomial: 18.880659103393555, Poisson: -0.08932992070913315
Epoch 0:   2%| | 17/766 [00:06<04:53,  2.55it/s, v_num=a0al, train_loss_step=19.
Epoch 0:   2%| | 17/766 [00:06<04:59,  2.50it/s, v_num=a0al, train_loss_step=18.
Multinomial: 19.011474609375, Poisson: -0.08990071713924408
Epoch 0:   2%| | 18/766 [00:06<04:46,  2.61it/s, v_num=a0al, train_loss_step=18.
Epoch 0:   2%| | 18/766 [00:07<04:52,  2.56it/s, v_num=a0al, train_loss_step=18.
Multinomial: 20.70027732849121, Poisson: -0.09867019951343536

Epoch 0:   2%| | 19/766 [00:07<04:40,  2.66it/s, v_num=a0al, train_loss_step=18.
Epoch 0:   2%| | 19/766 [00:07<04:45,  2.61it/s, v_num=a0al, train_loss_step=20.
Multinomial: 17.821165084838867, Poisson: -0.08399660140275955
Epoch 0:   3%| | 20/766 [00:07<04:40,  2.66it/s, v_num=a0al, train_loss_step=20.
Epoch 0:   3%| | 20/766 [00:07<04:40,  2.66it/s, v_num=a0al, train_loss_step=17.
Multinomial: 16.62529945373535, Poisson: -0.07828851789236069

Epoch 0:   3%| | 21/766 [00:07<04:30,  2.75it/s, v_num=a0al, train_loss_step=17.
Epoch 0:   3%| | 21/766 [00:07<04:35,  2.71it/s, v_num=a0al, train_loss_step=16.
Multinomial: 21.265472412109375, Poisson: -0.10090325772762299
Epoch 0:   3%| | 22/766 [00:07<04:25,  2.80it/s, v_num=a0al, train_loss_step=16.
Epoch 0:   3%| | 22/766 [00:07<04:30,  2.75it/s, v_num=a0al, train_loss_step=21.
Multinomial: 20.13052749633789, Poisson: -0.09563688188791275

Epoch 0:   3%| | 23/766 [00:08<04:21,  2.84it/s, v_num=a0al, train_loss_step=21.
Epoch 0:   3%| | 23/766 [00:08<04:26,  2.79it/s, v_num=a0al, train_loss_step=20.
Multinomial: 20.649946212768555, Poisson: -0.09828009456396103

Epoch 0:   3%| | 24/766 [00:08<04:17,  2.88it/s, v_num=a0al, train_loss_step=20.
Epoch 0:   3%| | 24/766 [00:08<04:22,  2.83it/s, v_num=a0al, train_loss_step=20.
Multinomial: 25.186647415161133, Poisson: -0.12111172825098038
Epoch 0:   3%| | 25/766 [00:08<04:18,  2.87it/s, v_num=a0al, train_loss_step=20.
Epoch 0:   3%| | 25/766 [00:08<04:18,  2.87it/s, v_num=a0al, train_loss_step=25.
Multinomial: 18.360719680786133, Poisson: -0.08687090128660202
Epoch 0:   3%| | 26/766 [00:08<04:11,  2.94it/s, v_num=a0al, train_loss_step=25.
Epoch 0:   3%| | 26/766 [00:08<04:15,  2.90it/s, v_num=a0al, train_loss_step=18.
Multinomial: 20.07268524169922, Poisson: -0.0955045148730278
Epoch 0:   4%| | 27/766 [00:09<04:08,  2.98it/s, v_num=a0al, train_loss_step=18.
Epoch 0:   4%| | 27/766 [00:09<04:11,  2.93it/s, v_num=a0al, train_loss_step=20.
Multinomial: 23.431581497192383, Poisson: -0.11249936372041702

Epoch 0:   4%| | 28/766 [00:09<04:05,  3.01it/s, v_num=a0al, train_loss_step=20.
Epoch 0:   4%| | 28/766 [00:09<04:08,  2.97it/s, v_num=a0al, train_loss_step=23.
Multinomial: 21.752777099609375, Poisson: -0.10413458943367004

Epoch 0:   4%| | 29/766 [00:09<04:02,  3.04it/s, v_num=a0al, train_loss_step=23.
Epoch 0:   4%| | 29/766 [00:09<04:06,  3.00it/s, v_num=a0al, train_loss_step=21.
Multinomial: 18.950761795043945, Poisson: -0.08966774493455887
Epoch 0:   4%| | 30/766 [00:09<04:03,  3.02it/s, v_num=a0al, train_loss_step=21.
Epoch 0:   4%| | 30/766 [00:09<04:03,  3.02it/s, v_num=a0al, train_loss_step=18.
Multinomial: 24.61734962463379, Poisson: -0.11915773898363113
Epoch 0:   4%| | 31/766 [00:10<03:57,  3.09it/s, v_num=a0al, train_loss_step=18.
Epoch 0:   4%| | 31/766 [00:10<04:01,  3.05it/s, v_num=a0al, train_loss_step=24.
Multinomial: 19.473047256469727, Poisson: -0.09275110810995102
Epoch 0:   4%| | 32/766 [00:10<03:55,  3.11it/s, v_num=a0al, train_loss_step=24.
Epoch 0:   4%| | 32/766 [00:10<03:58,  3.07it/s, v_num=a0al, train_loss_step=19.
Multinomial: 21.206684112548828, Poisson: -0.10135509073734283

Epoch 0:   4%| | 33/766 [00:10<03:53,  3.14it/s, v_num=a0al, train_loss_step=19.
Epoch 0:   4%| | 33/766 [00:10<03:56,  3.10it/s, v_num=a0al, train_loss_step=21.
Multinomial: 19.45479965209961, Poisson: -0.09279609471559525

Epoch 0:   4%| | 34/766 [00:10<03:51,  3.16it/s, v_num=a0al, train_loss_step=21.
Epoch 0:   4%| | 34/766 [00:10<03:54,  3.12it/s, v_num=a0al, train_loss_step=19.
Multinomial: 21.72089385986328, Poisson: -0.10419032722711563
Epoch 0:   5%| | 35/766 [00:11<03:52,  3.14it/s, v_num=a0al, train_loss_step=19.
Epoch 0:   5%| | 35/766 [00:11<03:52,  3.14it/s, v_num=a0al, train_loss_step=21.
Multinomial: 22.369564056396484, Poisson: -0.10727277398109436
Epoch 0:   5%| | 36/766 [00:11<03:47,  3.20it/s, v_num=a0al, train_loss_step=21.
Epoch 0:   5%| | 36/766 [00:11<03:50,  3.17it/s, v_num=a0al, train_loss_step=22.
Multinomial: 21.176250457763672, Poisson: -0.1012745052576065

Epoch 0:   5%| | 37/766 [00:11<03:46,  3.22it/s, v_num=a0al, train_loss_step=22.
Epoch 0:   5%| | 37/766 [00:11<03:48,  3.19it/s, v_num=a0al, train_loss_step=21.
Multinomial: 18.37683868408203, Poisson: -0.08704456686973572

Epoch 0:   5%| | 38/766 [00:11<03:44,  3.24it/s, v_num=a0al, train_loss_step=21.
Epoch 0:   5%| | 38/766 [00:11<03:47,  3.21it/s, v_num=a0al, train_loss_step=18.
Multinomial: 22.340030670166016, Poisson: -0.10708311200141907

Epoch 0:   5%| | 39/766 [00:11<03:42,  3.26it/s, v_num=a0al, train_loss_step=18.
Epoch 0:   5%| | 39/766 [00:12<03:45,  3.22it/s, v_num=a0al, train_loss_step=22.
Multinomial: 21.2115478515625, Poisson: -0.1010795533657074
Epoch 0:   5%| | 40/766 [00:12<03:43,  3.24it/s, v_num=a0al, train_loss_step=22.
Epoch 0:   5%| | 40/766 [00:12<03:44,  3.24it/s, v_num=a0al, train_loss_step=21.
Multinomial: 21.17957878112793, Poisson: -0.10130106657743454
Epoch 0:   5%| | 41/766 [00:12<03:40,  3.29it/s, v_num=a0al, train_loss_step=21.
Epoch 0:   5%| | 41/766 [00:12<03:42,  3.26it/s, v_num=a0al, train_loss_step=21.
Multinomial: 17.70407485961914, Poisson: -0.08396982401609421
Epoch 0:   5%| | 42/766 [00:12<03:38,  3.31it/s, v_num=a0al, train_loss_step=21.
Epoch 0:   5%| | 42/766 [00:12<03:41,  3.28it/s, v_num=a0al, train_loss_step=17.
Multinomial: 19.499862670898438, Poisson: -0.09266598522663116

Epoch 0:   6%| | 43/766 [00:12<03:37,  3.33it/s, v_num=a0al, train_loss_step=17.
Epoch 0:   6%| | 43/766 [00:13<03:39,  3.29it/s, v_num=a0al, train_loss_step=19.
Multinomial: 20.606935501098633, Poisson: -0.09828896075487137

Epoch 0:   6%| | 44/766 [00:13<03:36,  3.34it/s, v_num=a0al, train_loss_step=19.
Epoch 0:   6%| | 44/766 [00:13<03:38,  3.31it/s, v_num=a0al, train_loss_step=20.
Multinomial: 22.871383666992188, Poisson: -0.11045264452695847
Epoch 0:   6%| | 45/766 [00:13<03:37,  3.32it/s, v_num=a0al, train_loss_step=20.
Epoch 0:   6%| | 45/766 [00:13<03:37,  3.32it/s, v_num=a0al, train_loss_step=22.
Multinomial: 24.033437728881836, Poisson: -0.11557681858539581
Epoch 0:   6%| | 46/766 [00:13<03:33,  3.37it/s, v_num=a0al, train_loss_step=22.
Epoch 0:   6%| | 46/766 [00:13<03:35,  3.34it/s, v_num=a0al, train_loss_step=23.
Multinomial: 18.879009246826172, Poisson: -0.09021121263504028
Epoch 0:   6%| | 47/766 [00:13<03:32,  3.38it/s, v_num=a0al, train_loss_step=23.
Epoch 0:   6%| | 47/766 [00:14<03:34,  3.35it/s, v_num=a0al, train_loss_step=18.
Multinomial: 18.95680809020996, Poisson: -0.08978604525327682

Epoch 0:   6%| | 48/766 [00:14<03:31,  3.39it/s, v_num=a0al, train_loss_step=18.
Epoch 0:   6%| | 48/766 [00:14<03:33,  3.36it/s, v_num=a0al, train_loss_step=18.
Multinomial: 17.858497619628906, Poisson: -0.08382056653499603

Epoch 0:   6%| | 49/766 [00:14<03:30,  3.41it/s, v_num=a0al, train_loss_step=18.
Epoch 0:   6%| | 49/766 [00:14<03:32,  3.38it/s, v_num=a0al, train_loss_step=17.
Multinomial: 21.18074607849121, Poisson: -0.10173416137695312
Epoch 0:   7%| | 50/766 [00:14<03:31,  3.39it/s, v_num=a0al, train_loss_step=17.
Epoch 0:   7%| | 50/766 [00:14<03:31,  3.39it/s, v_num=a0al, train_loss_step=21.
Multinomial: 19.48756980895996, Poisson: -0.09269016981124878
Epoch 0:   7%| | 51/766 [00:14<03:28,  3.43it/s, v_num=a0al, train_loss_step=21.
Epoch 0:   7%| | 51/766 [00:15<03:30,  3.40it/s, v_num=a0al, train_loss_step=19.
Multinomial: 20.046504974365234, Poisson: -0.09527648985385895
Epoch 0:   7%| | 52/766 [00:15<03:27,  3.44it/s, v_num=a0al, train_loss_step=19.
Epoch 0:   7%| | 52/766 [00:15<03:29,  3.41it/s, v_num=a0al, train_loss_step=20.
Multinomial: 21.808837890625, Poisson: -0.10404733568429947

Epoch 0:   7%| | 53/766 [00:15<03:26,  3.45it/s, v_num=a0al, train_loss_step=20.
Epoch 0:   7%| | 53/766 [00:15<03:28,  3.42it/s, v_num=a0al, train_loss_step=21.
Multinomial: 18.97328758239746, Poisson: -0.08934041857719421

Epoch 0:   7%| | 54/766 [00:15<03:25,  3.46it/s, v_num=a0al, train_loss_step=21.
Epoch 0:   7%| | 54/766 [00:15<03:27,  3.43it/s, v_num=a0al, train_loss_step=18.
Multinomial: 21.240169525146484, Poisson: -0.10145271569490433
Epoch 0:   7%| | 55/766 [00:15<03:26,  3.45it/s, v_num=a0al, train_loss_step=18.
Epoch 0:   7%| | 55/766 [00:15<03:26,  3.44it/s, v_num=a0al, train_loss_step=21.
Multinomial: 21.870256423950195, Poisson: -0.1042548418045044
Epoch 0:   7%| | 56/766 [00:16<03:23,  3.48it/s, v_num=a0al, train_loss_step=21.
Epoch 0:   7%| | 56/766 [00:16<03:25,  3.45it/s, v_num=a0al, train_loss_step=21.
Multinomial: 21.184789657592773, Poisson: -0.10091016441583633
Epoch 0:   7%| | 57/766 [00:16<03:22,  3.49it/s, v_num=a0al, train_loss_step=21.
Epoch 0:   7%| | 57/766 [00:16<03:24,  3.46it/s, v_num=a0al, train_loss_step=21.
Multinomial: 21.16849136352539, Poisson: -0.10139136761426926

Epoch 0:   8%| | 58/766 [00:16<03:22,  3.50it/s, v_num=a0al, train_loss_step=21.
Epoch 0:   8%| | 58/766 [00:16<03:23,  3.47it/s, v_num=a0al, train_loss_step=21.
Multinomial: 24.697975158691406, Poisson: -0.11847725510597229

Epoch 0:   8%| | 59/766 [00:16<03:21,  3.51it/s, v_num=a0al, train_loss_step=21.
Epoch 0:   8%| | 59/766 [00:16<03:22,  3.48it/s, v_num=a0al, train_loss_step=24.
Multinomial: 21.23836898803711, Poisson: -0.10157377272844315
Epoch 0:   8%| | 60/766 [00:17<03:22,  3.49it/s, v_num=a0al, train_loss_step=24.
Epoch 0:   8%| | 60/766 [00:17<03:22,  3.49it/s, v_num=a0al, train_loss_step=21.
Multinomial: 21.151546478271484, Poisson: -0.10148127377033234
Epoch 0:   8%| | 61/766 [00:17<03:19,  3.53it/s, v_num=a0al, train_loss_step=21.
Epoch 0:   8%| | 61/766 [00:17<03:21,  3.50it/s, v_num=a0al, train_loss_step=21.
Multinomial: 22.36435890197754, Poisson: -0.10736225545406342
Epoch 0:   8%| | 62/766 [00:17<03:19,  3.54it/s, v_num=a0al, train_loss_step=21.
Epoch 0:   8%| | 62/766 [00:17<03:20,  3.51it/s, v_num=a0al, train_loss_step=22.
Multinomial: 18.459867477416992, Poisson: -0.08722960948944092

Epoch 0:   8%| | 63/766 [00:17<03:18,  3.55it/s, v_num=a0al, train_loss_step=22.
Epoch 0:   8%| | 63/766 [00:17<03:19,  3.52it/s, v_num=a0al, train_loss_step=18.
Multinomial: 20.105688095092773, Poisson: -0.09594902396202087

Epoch 0:   8%| | 64/766 [00:18<03:17,  3.55it/s, v_num=a0al, train_loss_step=18.
Epoch 0:   8%| | 64/766 [00:18<03:18,  3.53it/s, v_num=a0al, train_loss_step=20.
Multinomial: 21.239574432373047, Poisson: -0.10175144672393799
Epoch 0:   8%| | 65/766 [00:18<03:18,  3.54it/s, v_num=a0al, train_loss_step=20.
Epoch 0:   8%| | 65/766 [00:18<03:18,  3.53it/s, v_num=a0al, train_loss_step=21.
Multinomial: 18.30000877380371, Poisson: -0.08693327754735947
Epoch 0:   9%| | 66/766 [00:18<03:16,  3.57it/s, v_num=a0al, train_loss_step=21.
Epoch 0:   9%| | 66/766 [00:18<03:17,  3.54it/s, v_num=a0al, train_loss_step=18.
Multinomial: 19.48712921142578, Poisson: -0.09302585572004318
Epoch 0:   9%| | 67/766 [00:18<03:15,  3.58it/s, v_num=a0al, train_loss_step=18.
Epoch 0:   9%| | 67/766 [00:18<03:16,  3.55it/s, v_num=a0al, train_loss_step=19.
Multinomial: 23.451393127441406, Poisson: -0.11258357018232346

Epoch 0:   9%| | 68/766 [00:18<03:14,  3.58it/s, v_num=a0al, train_loss_step=19.
Epoch 0:   9%| | 68/766 [00:19<03:16,  3.56it/s, v_num=a0al, train_loss_step=23.
Multinomial: 20.058046340942383, Poisson: -0.09530574083328247

Epoch 0:   9%| | 69/766 [00:19<03:14,  3.59it/s, v_num=a0al, train_loss_step=23.
Epoch 0:   9%| | 69/766 [00:19<03:15,  3.57it/s, v_num=a0al, train_loss_step=20.
Multinomial: 18.952863693237305, Poisson: -0.08996559679508209
Epoch 0:   9%| | 70/766 [00:19<03:14,  3.57it/s, v_num=a0al, train_loss_step=20.
Epoch 0:   9%| | 70/766 [00:19<03:14,  3.57it/s, v_num=a0al, train_loss_step=18.
Multinomial: 17.13503646850586, Poisson: -0.08106084913015366
Epoch 0:   9%| | 71/766 [00:19<03:12,  3.60it/s, v_num=a0al, train_loss_step=18.
Epoch 0:   9%| | 71/766 [00:19<03:14,  3.58it/s, v_num=a0al, train_loss_step=17.
Multinomial: 20.00311279296875, Poisson: -0.09557122737169266
Epoch 0:   9%| | 72/766 [00:19<03:12,  3.61it/s, v_num=a0al, train_loss_step=17.
Epoch 0:   9%| | 72/766 [00:20<03:13,  3.59it/s, v_num=a0al, train_loss_step=19.
Multinomial: 19.45271110534668, Poisson: -0.09286917746067047

Epoch 0:  10%| | 73/766 [00:20<03:11,  3.62it/s, v_num=a0al, train_loss_step=19.
Epoch 0:  10%| | 73/766 [00:20<03:12,  3.59it/s, v_num=a0al, train_loss_step=19.
Multinomial: 19.410478591918945, Poisson: -0.09247767925262451

Epoch 0:  10%| | 74/766 [00:20<03:10,  3.62it/s, v_num=a0al, train_loss_step=19.
Epoch 0:  10%| | 74/766 [00:20<03:12,  3.60it/s, v_num=a0al, train_loss_step=19.
Multinomial: 16.642332077026367, Poisson: -0.0786345899105072
Epoch 0:  10%| | 75/766 [00:20<03:11,  3.61it/s, v_num=a0al, train_loss_step=19.
Epoch 0:  10%| | 75/766 [00:20<03:11,  3.61it/s, v_num=a0al, train_loss_step=16.
Multinomial: 22.897323608398438, Poisson: -0.11011520773172379
Epoch 0:  10%| | 76/766 [00:20<03:09,  3.64it/s, v_num=a0al, train_loss_step=16.
Epoch 0:  10%| | 76/766 [00:21<03:11,  3.61it/s, v_num=a0al, train_loss_step=22.
Multinomial: 17.767396926879883, Poisson: -0.08402471244335175

Epoch 0:  10%| | 77/766 [00:21<03:09,  3.64it/s, v_num=a0al, train_loss_step=22.
Epoch 0:  10%| | 77/766 [00:21<03:10,  3.62it/s, v_num=a0al, train_loss_step=17.
Multinomial: 20.062463760375977, Poisson: -0.0955692008137703

Epoch 0:  10%| | 78/766 [00:21<03:08,  3.65it/s, v_num=a0al, train_loss_step=17.
Epoch 0:  10%| | 78/766 [00:21<03:09,  3.62it/s, v_num=a0al, train_loss_step=20.
Multinomial: 18.32487678527832, Poisson: -0.0871487483382225

Epoch 0:  10%| | 79/766 [00:21<03:08,  3.65it/s, v_num=a0al, train_loss_step=20.
Epoch 0:  10%| | 79/766 [00:21<03:09,  3.63it/s, v_num=a0al, train_loss_step=18.
Multinomial: 21.206655502319336, Poisson: -0.10164565593004227
Epoch 0:  10%| | 80/766 [00:22<03:08,  3.64it/s, v_num=a0al, train_loss_step=18.
Epoch 0:  10%| | 80/766 [00:22<03:08,  3.64it/s, v_num=a0al, train_loss_step=21.
Multinomial: 22.280014038085938, Poisson: -0.10702759772539139
Epoch 0:  11%| | 81/766 [00:22<03:07,  3.66it/s, v_num=a0al, train_loss_step=21.
Epoch 0:  11%| | 81/766 [00:22<03:08,  3.64it/s, v_num=a0al, train_loss_step=22.
Multinomial: 21.242645263671875, Poisson: -0.10192742943763733

Epoch 0:  11%| | 82/766 [00:22<03:06,  3.67it/s, v_num=a0al, train_loss_step=22.
Epoch 0:  11%| | 82/766 [00:22<03:07,  3.65it/s, v_num=a0al, train_loss_step=21.
Multinomial: 21.142255783081055, Poisson: -0.10121983289718628

Epoch 0:  11%| | 83/766 [00:22<03:05,  3.67it/s, v_num=a0al, train_loss_step=21.
Epoch 0:  11%| | 83/766 [00:22<03:07,  3.65it/s, v_num=a0al, train_loss_step=21.
Multinomial: 22.358478546142578, Poisson: -0.1070261299610138
Epoch 0:  11%| | 84/766 [00:22<03:05,  3.68it/s, v_num=a0al, train_loss_step=21.
Epoch 0:  11%| | 84/766 [00:22<03:06,  3.66it/s, v_num=a0al, train_loss_step=22.
Multinomial: 21.18360137939453, Poisson: -0.10107354819774628
Epoch 0:  11%| | 85/766 [00:23<03:05,  3.66it/s, v_num=a0al, train_loss_step=22.
Epoch 0:  11%| | 85/766 [00:23<03:05,  3.66it/s, v_num=a0al, train_loss_step=21.
Multinomial: 20.60392951965332, Poisson: -0.09856819361448288
Epoch 0:  11%| | 86/766 [00:23<03:04,  3.69it/s, v_num=a0al, train_loss_step=21.
Epoch 0:  11%| | 86/766 [00:23<03:05,  3.67it/s, v_num=a0al, train_loss_step=20.
Multinomial: 19.474123001098633, Poisson: -0.09277226030826569

Epoch 0:  11%| | 87/766 [00:23<03:03,  3.69it/s, v_num=a0al, train_loss_step=20.
Epoch 0:  11%| | 87/766 [00:23<03:04,  3.67it/s, v_num=a0al, train_loss_step=19.
Multinomial: 21.81633949279785, Poisson: -0.10398300737142563

Epoch 0:  11%| | 88/766 [00:23<03:03,  3.70it/s, v_num=a0al, train_loss_step=19.
Epoch 0:  11%| | 88/766 [00:23<03:04,  3.68it/s, v_num=a0al, train_loss_step=21.
Multinomial: 22.952714920043945, Poisson: -0.1099216490983963
Epoch 0:  12%| | 89/766 [00:24<03:02,  3.70it/s, v_num=a0al, train_loss_step=21.
Epoch 0:  12%| | 89/766 [00:24<03:03,  3.68it/s, v_num=a0al, train_loss_step=22.
Multinomial: 20.675338745117188, Poisson: -0.09857542812824249
Epoch 0:  12%| | 90/766 [00:24<03:03,  3.69it/s, v_num=a0al, train_loss_step=22.
Epoch 0:  12%| | 90/766 [00:24<03:03,  3.68it/s, v_num=a0al, train_loss_step=20.
Multinomial: 20.54332733154297, Poisson: -0.09850569814443588
Epoch 0:  12%| | 91/766 [00:24<03:01,  3.71it/s, v_num=a0al, train_loss_step=20.
Epoch 0:  12%| | 91/766 [00:24<03:02,  3.69it/s, v_num=a0al, train_loss_step=20.
Multinomial: 17.736204147338867, Poisson: -0.083879254758358

Epoch 0:  12%| | 92/766 [00:24<03:01,  3.71it/s, v_num=a0al, train_loss_step=20.
Epoch 0:  12%| | 92/766 [00:24<03:02,  3.69it/s, v_num=a0al, train_loss_step=17.
Multinomial: 18.93655014038086, Poisson: -0.09000393003225327

Epoch 0:  12%| | 93/766 [00:25<03:00,  3.72it/s, v_num=a0al, train_loss_step=17.
Epoch 0:  12%| | 93/766 [00:25<03:01,  3.70it/s, v_num=a0al, train_loss_step=18.
Multinomial: 23.51058006286621, Poisson: -0.11284295469522476
Epoch 0:  12%| | 94/766 [00:25<03:00,  3.72it/s, v_num=a0al, train_loss_step=18.
Epoch 0:  12%| | 94/766 [00:25<03:01,  3.70it/s, v_num=a0al, train_loss_step=23.
Multinomial: 22.92452621459961, Poisson: -0.10993973165750504
Epoch 0:  12%| | 95/766 [00:25<03:01,  3.71it/s, v_num=a0al, train_loss_step=23.
Epoch 0:  12%| | 95/766 [00:25<03:01,  3.71it/s, v_num=a0al, train_loss_step=22.
Multinomial: 18.880817413330078, Poisson: -0.08968962728977203
Epoch 0:  13%|▏| 96/766 [00:25<02:59,  3.73it/s, v_num=a0al, train_loss_step=22.
Epoch 0:  13%|▏| 96/766 [00:25<03:00,  3.71it/s, v_num=a0al, train_loss_step=18.
Multinomial: 18.968830108642578, Poisson: -0.08979591727256775

Epoch 0:  13%|▏| 97/766 [00:25<02:59,  3.73it/s, v_num=a0al, train_loss_step=18.
Epoch 0:  13%|▏| 97/766 [00:26<03:00,  3.71it/s, v_num=a0al, train_loss_step=18.
Multinomial: 17.777538299560547, Poisson: -0.08393401652574539

Epoch 0:  13%|▏| 98/766 [00:26<02:58,  3.74it/s, v_num=a0al, train_loss_step=18.
Epoch 0:  13%|▏| 98/766 [00:26<02:59,  3.72it/s, v_num=a0al, train_loss_step=17.
Multinomial: 22.880767822265625, Poisson: -0.10982605814933777
Epoch 0:  13%|▏| 99/766 [00:26<02:58,  3.74it/s, v_num=a0al, train_loss_step=17.
Epoch 0:  13%|▏| 99/766 [00:26<02:59,  3.72it/s, v_num=a0al, train_loss_step=22.
Multinomial: 19.429824829101562, Poisson: -0.09266551584005356
Epoch 0:  13%|▏| 100/766 [00:26<02:58,  3.73it/s, v_num=a0al, train_loss_step=22
Epoch 0:  13%|▏| 100/766 [00:26<02:58,  3.73it/s, v_num=a0al, train_loss_step=19
Multinomial: 21.114593505859375, Poisson: -0.10141497850418091
Epoch 0:  13%|▏| 101/766 [00:26<02:57,  3.75it/s, v_num=a0al, train_loss_step=19
Epoch 0:  13%|▏| 101/766 [00:27<02:58,  3.73it/s, v_num=a0al, train_loss_step=21
Multinomial: 24.00738525390625, Poisson: -0.11572451889514923

Epoch 0:  13%|▏| 102/766 [00:27<02:56,  3.75it/s, v_num=a0al, train_loss_step=21
Epoch 0:  13%|▏| 102/766 [00:27<02:57,  3.73it/s, v_num=a0al, train_loss_step=23
Multinomial: 17.775775909423828, Poisson: -0.0842253789305687

Epoch 0:  13%|▏| 103/766 [00:27<02:56,  3.76it/s, v_num=a0al, train_loss_step=23
Epoch 0:  13%|▏| 103/766 [00:27<02:57,  3.74it/s, v_num=a0al, train_loss_step=17
Multinomial: 22.294315338134766, Poisson: -0.10698876529932022
Epoch 0:  14%|▏| 104/766 [00:27<02:56,  3.76it/s, v_num=a0al, train_loss_step=17
Epoch 0:  14%|▏| 104/766 [00:27<02:56,  3.74it/s, v_num=a0al, train_loss_step=22
Multinomial: 22.36329460144043, Poisson: -0.10711447149515152
Epoch 0:  14%|▏| 105/766 [00:28<02:56,  3.74it/s, v_num=a0al, train_loss_step=22
Epoch 0:  14%|▏| 105/766 [00:28<02:56,  3.74it/s, v_num=a0al, train_loss_step=22
Multinomial: 20.062660217285156, Poisson: -0.09556890279054642

Epoch 0:  14%|▏| 106/766 [00:28<02:55,  3.77it/s, v_num=a0al, train_loss_step=22
Epoch 0:  14%|▏| 106/766 [00:28<02:56,  3.75it/s, v_num=a0al, train_loss_step=20
Multinomial: 22.25997543334961, Poisson: -0.10673705488443375

Epoch 0:  14%|▏| 107/766 [00:28<02:54,  3.77it/s, v_num=a0al, train_loss_step=20
Epoch 0:  14%|▏| 107/766 [00:28<02:55,  3.75it/s, v_num=a0al, train_loss_step=22
Multinomial: 21.74195098876953, Poisson: -0.10431475937366486

Epoch 0:  14%|▏| 108/766 [00:28<02:54,  3.77it/s, v_num=a0al, train_loss_step=22
Epoch 0:  14%|▏| 108/766 [00:28<02:55,  3.75it/s, v_num=a0al, train_loss_step=21
Multinomial: 19.445833206176758, Poisson: -0.09263034164905548
Epoch 0:  14%|▏| 109/766 [00:28<02:54,  3.78it/s, v_num=a0al, train_loss_step=21
Epoch 0:  14%|▏| 109/766 [00:29<02:54,  3.76it/s, v_num=a0al, train_loss_step=19
Multinomial: 22.96492576599121, Poisson: -0.10998839139938354
Epoch 0:  14%|▏| 110/766 [00:29<02:54,  3.76it/s, v_num=a0al, train_loss_step=19
Epoch 0:  14%|▏| 110/766 [00:29<02:54,  3.76it/s, v_num=a0al, train_loss_step=22
Multinomial: 20.060997009277344, Poisson: -0.09550387412309647

Epoch 0:  14%|▏| 111/766 [00:29<02:53,  3.78it/s, v_num=a0al, train_loss_step=22
Epoch 0:  14%|▏| 111/766 [00:29<02:54,  3.76it/s, v_num=a0al, train_loss_step=20
Multinomial: 19.398094177246094, Poisson: -0.09251043945550919

Epoch 0:  15%|▏| 112/766 [00:29<02:52,  3.78it/s, v_num=a0al, train_loss_step=20
Epoch 0:  15%|▏| 112/766 [00:29<02:53,  3.77it/s, v_num=a0al, train_loss_step=19
Multinomial: 17.765329360961914, Poisson: -0.08439560234546661
Epoch 0:  15%|▏| 113/766 [00:29<02:52,  3.79it/s, v_num=a0al, train_loss_step=19
Epoch 0:  15%|▏| 113/766 [00:29<02:53,  3.77it/s, v_num=a0al, train_loss_step=17
Multinomial: 22.94915008544922, Poisson: -0.11012542247772217
Epoch 0:  15%|▏| 114/766 [00:30<02:52,  3.79it/s, v_num=a0al, train_loss_step=17
Epoch 0:  15%|▏| 114/766 [00:30<02:52,  3.77it/s, v_num=a0al, train_loss_step=22
Multinomial: 16.100540161132812, Poisson: -0.07545100152492523
Epoch 0:  15%|▏| 115/766 [00:30<02:52,  3.78it/s, v_num=a0al, train_loss_step=22
Epoch 0:  15%|▏| 115/766 [00:30<02:52,  3.78it/s, v_num=a0al, train_loss_step=16
Multinomial: 22.88507843017578, Poisson: -0.11016397178173065

Epoch 0:  15%|▏| 116/766 [00:30<02:51,  3.80it/s, v_num=a0al, train_loss_step=16
Epoch 0:  15%|▏| 116/766 [00:30<02:52,  3.78it/s, v_num=a0al, train_loss_step=22
Multinomial: 21.70903968811035, Poisson: -0.10451330244541168

Epoch 0:  15%|▏| 117/766 [00:30<02:50,  3.80it/s, v_num=a0al, train_loss_step=22
Epoch 0:  15%|▏| 117/766 [00:30<02:51,  3.78it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.082307815551758, Poisson: -0.0955430418252945

Epoch 0:  15%|▏| 118/766 [00:31<02:50,  3.80it/s, v_num=a0al, train_loss_step=21
Epoch 0:  15%|▏| 118/766 [00:31<02:51,  3.78it/s, v_num=a0al, train_loss_step=20
Multinomial: 25.241302490234375, Poisson: -0.12178383767604828
Epoch 0:  16%|▏| 119/766 [00:31<02:50,  3.80it/s, v_num=a0al, train_loss_step=20
Epoch 0:  16%|▏| 119/766 [00:31<02:50,  3.79it/s, v_num=a0al, train_loss_step=25
Multinomial: 20.645946502685547, Poisson: -0.09858675301074982
Epoch 0:  16%|▏| 120/766 [00:31<02:50,  3.79it/s, v_num=a0al, train_loss_step=25
Epoch 0:  16%|▏| 120/766 [00:31<02:50,  3.79it/s, v_num=a0al, train_loss_step=20
Multinomial: 18.908796310424805, Poisson: -0.08985879272222519

Epoch 0:  16%|▏| 121/766 [00:31<02:49,  3.81it/s, v_num=a0al, train_loss_step=20
Epoch 0:  16%|▏| 121/766 [00:31<02:50,  3.79it/s, v_num=a0al, train_loss_step=18
Multinomial: 22.289188385009766, Poisson: -0.10742945224046707

Epoch 0:  16%|▏| 122/766 [00:32<02:49,  3.81it/s, v_num=a0al, train_loss_step=18
Epoch 0:  16%|▏| 122/766 [00:32<02:49,  3.79it/s, v_num=a0al, train_loss_step=22
Multinomial: 20.056671142578125, Poisson: -0.09564211219549179

Epoch 0:  16%|▏| 123/766 [00:32<02:48,  3.81it/s, v_num=a0al, train_loss_step=22
Epoch 0:  16%|▏| 123/766 [00:32<02:49,  3.80it/s, v_num=a0al, train_loss_step=20
Multinomial: 21.75560760498047, Poisson: -0.10460510104894638
Epoch 0:  16%|▏| 124/766 [00:32<02:48,  3.82it/s, v_num=a0al, train_loss_step=20
Epoch 0:  16%|▏| 124/766 [00:32<02:48,  3.80it/s, v_num=a0al, train_loss_step=21
Multinomial: 19.484085083007812, Poisson: -0.09247615933418274
Epoch 0:  16%|▏| 125/766 [00:32<02:48,  3.80it/s, v_num=a0al, train_loss_step=21
Epoch 0:  16%|▏| 125/766 [00:32<02:48,  3.80it/s, v_num=a0al, train_loss_step=19
Multinomial: 20.635929107666016, Poisson: -0.09875985234975815

Epoch 0:  16%|▏| 126/766 [00:32<02:47,  3.82it/s, v_num=a0al, train_loss_step=19
Epoch 0:  16%|▏| 126/766 [00:33<02:48,  3.80it/s, v_num=a0al, train_loss_step=20
Multinomial: 21.100229263305664, Poisson: -0.10157999396324158

Epoch 0:  17%|▏| 127/766 [00:33<02:47,  3.82it/s, v_num=a0al, train_loss_step=20
Epoch 0:  17%|▏| 127/766 [00:33<02:47,  3.81it/s, v_num=a0al, train_loss_step=21
Multinomial: 22.291488647460938, Poisson: -0.10710974782705307

Epoch 0:  17%|▏| 128/766 [00:33<02:46,  3.82it/s, v_num=a0al, train_loss_step=21
Epoch 0:  17%|▏| 128/766 [00:33<02:47,  3.81it/s, v_num=a0al, train_loss_step=22
Multinomial: 21.73076057434082, Poisson: -0.1042914167046547
Epoch 0:  17%|▏| 129/766 [00:33<02:46,  3.83it/s, v_num=a0al, train_loss_step=22
Epoch 0:  17%|▏| 129/766 [00:33<02:47,  3.81it/s, v_num=a0al, train_loss_step=21
Multinomial: 19.011499404907227, Poisson: -0.08988802134990692
Epoch 0:  17%|▏| 130/766 [00:34<02:46,  3.81it/s, v_num=a0al, train_loss_step=21
Epoch 0:  17%|▏| 130/766 [00:34<02:46,  3.81it/s, v_num=a0al, train_loss_step=18
Multinomial: 21.779977798461914, Poisson: -0.10456342250108719

Epoch 0:  17%|▏| 131/766 [00:34<02:45,  3.83it/s, v_num=a0al, train_loss_step=18
Epoch 0:  17%|▏| 131/766 [00:34<02:46,  3.82it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.7364444732666, Poisson: -0.10426057130098343

Epoch 0:  17%|▏| 132/766 [00:34<02:45,  3.83it/s, v_num=a0al, train_loss_step=21
Epoch 0:  17%|▏| 132/766 [00:34<02:46,  3.82it/s, v_num=a0al, train_loss_step=21
Multinomial: 18.27402114868164, Poisson: -0.08728134632110596

Epoch 0:  17%|▏| 133/766 [00:34<02:45,  3.84it/s, v_num=a0al, train_loss_step=21
Epoch 0:  17%|▏| 133/766 [00:34<02:45,  3.82it/s, v_num=a0al, train_loss_step=18
Multinomial: 19.547163009643555, Poisson: -0.09258746355772018
Epoch 0:  17%|▏| 134/766 [00:34<02:44,  3.84it/s, v_num=a0al, train_loss_step=18
Epoch 0:  17%|▏| 134/766 [00:35<02:45,  3.82it/s, v_num=a0al, train_loss_step=19
Multinomial: 23.47853660583496, Poisson: -0.1130625307559967
Epoch 0:  18%|▏| 135/766 [00:35<02:44,  3.83it/s, v_num=a0al, train_loss_step=19
Epoch 0:  18%|▏| 135/766 [00:35<02:44,  3.82it/s, v_num=a0al, train_loss_step=23
Multinomial: 23.483755111694336, Poisson: -0.11301064491271973

Epoch 0:  18%|▏| 136/766 [00:35<02:44,  3.84it/s, v_num=a0al, train_loss_step=23
Epoch 0:  18%|▏| 136/766 [00:35<02:44,  3.83it/s, v_num=a0al, train_loss_step=23
Multinomial: 22.89188003540039, Poisson: -0.11027445644140244

Epoch 0:  18%|▏| 137/766 [00:35<02:43,  3.84it/s, v_num=a0al, train_loss_step=23
Epoch 0:  18%|▏| 137/766 [00:35<02:44,  3.83it/s, v_num=a0al, train_loss_step=22
Multinomial: 21.216276168823242, Poisson: -0.10149399191141129

Epoch 0:  18%|▏| 138/766 [00:35<02:43,  3.85it/s, v_num=a0al, train_loss_step=22
Epoch 0:  18%|▏| 138/766 [00:36<02:43,  3.83it/s, v_num=a0al, train_loss_step=21
Multinomial: 19.98031234741211, Poisson: -0.09576455503702164
Epoch 0:  18%|▏| 139/766 [00:36<02:42,  3.85it/s, v_num=a0al, train_loss_step=21
Epoch 0:  18%|▏| 139/766 [00:36<02:43,  3.83it/s, v_num=a0al, train_loss_step=19
Multinomial: 21.223608016967773, Poisson: -0.1015629693865776
Epoch 0:  18%|▏| 140/766 [00:36<02:43,  3.84it/s, v_num=a0al, train_loss_step=19
Epoch 0:  18%|▏| 140/766 [00:36<02:43,  3.83it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.562938690185547, Poisson: -0.09860718995332718

Epoch 0:  18%|▏| 141/766 [00:36<02:42,  3.85it/s, v_num=a0al, train_loss_step=21
Epoch 0:  18%|▏| 141/766 [00:36<02:42,  3.84it/s, v_num=a0al, train_loss_step=20
Multinomial: 21.765254974365234, Poisson: -0.10466967523097992

Epoch 0:  19%|▏| 142/766 [00:36<02:41,  3.85it/s, v_num=a0al, train_loss_step=20
Epoch 0:  19%|▏| 142/766 [00:36<02:42,  3.84it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.77707290649414, Poisson: -0.10455359518527985

Epoch 0:  19%|▏| 143/766 [00:37<02:41,  3.86it/s, v_num=a0al, train_loss_step=21
Epoch 0:  19%|▏| 143/766 [00:37<02:42,  3.84it/s, v_num=a0al, train_loss_step=21
Multinomial: 22.388973236083984, Poisson: -0.10733388364315033
Epoch 0:  19%|▏| 144/766 [00:37<02:41,  3.86it/s, v_num=a0al, train_loss_step=21
Epoch 0:  19%|▏| 144/766 [00:37<02:41,  3.84it/s, v_num=a0al, train_loss_step=22
Multinomial: 20.004207611083984, Poisson: -0.09571236371994019
Epoch 0:  19%|▏| 145/766 [00:37<02:41,  3.85it/s, v_num=a0al, train_loss_step=22
Epoch 0:  19%|▏| 145/766 [00:37<02:41,  3.84it/s, v_num=a0al, train_loss_step=19
Multinomial: 22.35780906677246, Poisson: -0.10723188519477844

Epoch 0:  19%|▏| 146/766 [00:37<02:40,  3.86it/s, v_num=a0al, train_loss_step=19
Epoch 0:  19%|▏| 146/766 [00:37<02:41,  3.85it/s, v_num=a0al, train_loss_step=22
Multinomial: 21.21773338317871, Poisson: -0.10146009176969528

Epoch 0:  19%|▏| 147/766 [00:38<02:40,  3.86it/s, v_num=a0al, train_loss_step=22
Epoch 0:  19%|▏| 147/766 [00:38<02:40,  3.85it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.026350021362305, Poisson: -0.09577830880880356

Epoch 0:  19%|▏| 148/766 [00:38<02:39,  3.86it/s, v_num=a0al, train_loss_step=21
Epoch 0:  19%|▏| 148/766 [00:38<02:40,  3.85it/s, v_num=a0al, train_loss_step=19
Multinomial: 22.944095611572266, Poisson: -0.11019343137741089
Epoch 0:  19%|▏| 149/766 [00:38<02:39,  3.87it/s, v_num=a0al, train_loss_step=19
Epoch 0:  19%|▏| 149/766 [00:38<02:40,  3.85it/s, v_num=a0al, train_loss_step=22
Multinomial: 24.010438919067383, Poisson: -0.1160307228565216
Epoch 0:  20%|▏| 150/766 [00:38<02:39,  3.85it/s, v_num=a0al, train_loss_step=22
Epoch 0:  20%|▏| 150/766 [00:38<02:39,  3.85it/s, v_num=a0al, train_loss_step=23
Multinomial: 20.618486404418945, Poisson: -0.09847906231880188

Epoch 0:  20%|▏| 151/766 [00:39<02:38,  3.87it/s, v_num=a0al, train_loss_step=23
Epoch 0:  20%|▏| 151/766 [00:39<02:39,  3.86it/s, v_num=a0al, train_loss_step=20
Multinomial: 23.028602600097656, Poisson: -0.11033559590578079

Epoch 0:  20%|▏| 152/766 [00:39<02:38,  3.87it/s, v_num=a0al, train_loss_step=20
Epoch 0:  20%|▏| 152/766 [00:39<02:39,  3.86it/s, v_num=a0al, train_loss_step=22
Multinomial: 19.400075912475586, Poisson: -0.09291176497936249

Epoch 0:  20%|▏| 153/766 [00:39<02:38,  3.87it/s, v_num=a0al, train_loss_step=22
Epoch 0:  20%|▏| 153/766 [00:39<02:38,  3.86it/s, v_num=a0al, train_loss_step=19
Multinomial: 22.339847564697266, Poisson: -0.10720396786928177
Epoch 0:  20%|▏| 154/766 [00:39<02:37,  3.87it/s, v_num=a0al, train_loss_step=19
Epoch 0:  20%|▏| 154/766 [00:39<02:38,  3.86it/s, v_num=a0al, train_loss_step=22
Multinomial: 18.87708854675293, Poisson: -0.0903216078877449
Epoch 0:  20%|▏| 155/766 [00:40<02:38,  3.86it/s, v_num=a0al, train_loss_step=22
Epoch 0:  20%|▏| 155/766 [00:40<02:38,  3.86it/s, v_num=a0al, train_loss_step=18
Multinomial: 17.164257049560547, Poisson: -0.08146476745605469

Epoch 0:  20%|▏| 156/766 [00:40<02:37,  3.88it/s, v_num=a0al, train_loss_step=18
Epoch 0:  20%|▏| 156/766 [00:40<02:37,  3.86it/s, v_num=a0al, train_loss_step=17
Multinomial: 20.632192611694336, Poisson: -0.09867066890001297

Epoch 0:  20%|▏| 157/766 [00:40<02:37,  3.88it/s, v_num=a0al, train_loss_step=17
Epoch 0:  20%|▏| 157/766 [00:40<02:37,  3.87it/s, v_num=a0al, train_loss_step=20
Multinomial: 20.666051864624023, Poisson: -0.0988137349486351
Epoch 0:  21%|▏| 158/766 [00:40<02:36,  3.88it/s, v_num=a0al, train_loss_step=20
Epoch 0:  21%|▏| 158/766 [00:40<02:37,  3.87it/s, v_num=a0al, train_loss_step=20
Multinomial: 19.503517150878906, Poisson: -0.09298935532569885
Epoch 0:  21%|▏| 159/766 [00:40<02:36,  3.88it/s, v_num=a0al, train_loss_step=20
Epoch 0:  21%|▏| 159/766 [00:41<02:36,  3.87it/s, v_num=a0al, train_loss_step=19
Multinomial: 20.657108306884766, Poisson: -0.09871099889278412
Epoch 0:  21%|▏| 160/766 [00:41<02:36,  3.87it/s, v_num=a0al, train_loss_step=19
Epoch 0:  21%|▏| 160/766 [00:41<02:36,  3.87it/s, v_num=a0al, train_loss_step=20
Multinomial: 21.22437286376953, Poisson: -0.10139385610818863

Epoch 0:  21%|▏| 161/766 [00:41<02:35,  3.88it/s, v_num=a0al, train_loss_step=20
Epoch 0:  21%|▏| 161/766 [00:41<02:36,  3.87it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.209266662597656, Poisson: -0.10157406330108643

Epoch 0:  21%|▏| 162/766 [00:41<02:35,  3.89it/s, v_num=a0al, train_loss_step=21
Epoch 0:  21%|▏| 162/766 [00:41<02:35,  3.87it/s, v_num=a0al, train_loss_step=21
Multinomial: 17.142242431640625, Poisson: -0.08123155683279037
Epoch 0:  21%|▏| 163/766 [00:41<02:35,  3.89it/s, v_num=a0al, train_loss_step=21
Epoch 0:  21%|▏| 163/766 [00:42<02:35,  3.88it/s, v_num=a0al, train_loss_step=17
Multinomial: 18.951335906982422, Poisson: -0.0898410826921463

Epoch 0:  21%|▏| 164/766 [00:42<02:34,  3.89it/s, v_num=a0al, train_loss_step=17
Epoch 0:  21%|▏| 164/766 [00:42<02:35,  3.88it/s, v_num=a0al, train_loss_step=18
Multinomial: 18.9245548248291, Poisson: -0.09010511636734009
Epoch 0:  22%|▏| 165/766 [00:42<02:34,  3.88it/s, v_num=a0al, train_loss_step=18
Epoch 0:  22%|▏| 165/766 [00:42<02:34,  3.88it/s, v_num=a0al, train_loss_step=18
Multinomial: 22.871036529541016, Poisson: -0.11026760935783386

Epoch 0:  22%|▏| 166/766 [00:42<02:34,  3.89it/s, v_num=a0al, train_loss_step=18
Epoch 0:  22%|▏| 166/766 [00:42<02:34,  3.88it/s, v_num=a0al, train_loss_step=22
Multinomial: 21.756982803344727, Poisson: -0.10451257973909378
Epoch 0:  22%|▏| 167/766 [00:42<02:33,  3.89it/s, v_num=a0al, train_loss_step=22
Epoch 0:  22%|▏| 167/766 [00:43<02:34,  3.88it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.007671356201172, Poisson: -0.0958346351981163
Epoch 0:  22%|▏| 168/766 [00:43<02:33,  3.90it/s, v_num=a0al, train_loss_step=21
Epoch 0:  22%|▏| 168/766 [00:43<02:33,  3.88it/s, v_num=a0al, train_loss_step=19
Multinomial: 18.315797805786133, Poisson: -0.08711469173431396

Epoch 0:  22%|▏| 169/766 [00:43<02:33,  3.90it/s, v_num=a0al, train_loss_step=19
Epoch 0:  22%|▏| 169/766 [00:43<02:33,  3.88it/s, v_num=a0al, train_loss_step=18
Multinomial: 19.435083389282227, Poisson: -0.09278573840856552
Epoch 0:  22%|▏| 170/766 [00:43<02:33,  3.89it/s, v_num=a0al, train_loss_step=18
Epoch 0:  22%|▏| 170/766 [00:43<02:33,  3.89it/s, v_num=a0al, train_loss_step=19
Multinomial: 20.58577537536621, Poisson: -0.09869471937417984

Epoch 0:  22%|▏| 171/766 [00:43<02:32,  3.90it/s, v_num=a0al, train_loss_step=19
Epoch 0:  22%|▏| 171/766 [00:43<02:33,  3.89it/s, v_num=a0al, train_loss_step=20
Multinomial: 18.860515594482422, Poisson: -0.09017433971166611
Epoch 0:  22%|▏| 172/766 [00:44<02:32,  3.90it/s, v_num=a0al, train_loss_step=20
Epoch 0:  22%|▏| 172/766 [00:44<02:32,  3.89it/s, v_num=a0al, train_loss_step=18
Multinomial: 24.060638427734375, Poisson: -0.11591766029596329
Epoch 0:  23%|▏| 173/766 [00:44<02:31,  3.90it/s, v_num=a0al, train_loss_step=18
Epoch 0:  23%|▏| 173/766 [00:44<02:32,  3.89it/s, v_num=a0al, train_loss_step=23
Multinomial: 18.262035369873047, Poisson: -0.08708906173706055

Epoch 0:  23%|▏| 174/766 [00:44<02:31,  3.90it/s, v_num=a0al, train_loss_step=23
Epoch 0:  23%|▏| 174/766 [00:44<02:32,  3.89it/s, v_num=a0al, train_loss_step=18
Multinomial: 19.997161865234375, Poisson: -0.09561602771282196
Epoch 0:  23%|▏| 175/766 [00:44<02:31,  3.89it/s, v_num=a0al, train_loss_step=18
Epoch 0:  23%|▏| 175/766 [00:44<02:31,  3.89it/s, v_num=a0al, train_loss_step=19
Multinomial: 22.397085189819336, Poisson: -0.10748428851366043

Epoch 0:  23%|▏| 176/766 [00:45<02:31,  3.91it/s, v_num=a0al, train_loss_step=19
Epoch 0:  23%|▏| 176/766 [00:45<02:31,  3.89it/s, v_num=a0al, train_loss_step=22
Multinomial: 20.636493682861328, Poisson: -0.09865304082632065
Epoch 0:  23%|▏| 177/766 [00:45<02:30,  3.91it/s, v_num=a0al, train_loss_step=22
Epoch 0:  23%|▏| 177/766 [00:45<02:31,  3.90it/s, v_num=a0al, train_loss_step=20
Multinomial: 22.90738296508789, Poisson: -0.11019153892993927
Epoch 0:  23%|▏| 178/766 [00:45<02:30,  3.91it/s, v_num=a0al, train_loss_step=20
Epoch 0:  23%|▏| 178/766 [00:45<02:30,  3.90it/s, v_num=a0al, train_loss_step=22
Multinomial: 21.74561309814453, Poisson: -0.10456544160842896

Epoch 0:  23%|▏| 179/766 [00:45<02:30,  3.91it/s, v_num=a0al, train_loss_step=22
Epoch 0:  23%|▏| 179/766 [00:45<02:30,  3.90it/s, v_num=a0al, train_loss_step=21
Multinomial: 24.05675506591797, Poisson: -0.11581964790821075
Epoch 0:  23%|▏| 180/766 [00:46<02:30,  3.90it/s, v_num=a0al, train_loss_step=21
Epoch 0:  23%|▏| 180/766 [00:46<02:30,  3.90it/s, v_num=a0al, train_loss_step=23
Multinomial: 19.544309616088867, Poisson: -0.0925077348947525

Epoch 0:  24%|▏| 181/766 [00:46<02:29,  3.91it/s, v_num=a0al, train_loss_step=23
Epoch 0:  24%|▏| 181/766 [00:46<02:29,  3.90it/s, v_num=a0al, train_loss_step=19
Multinomial: 22.366865158081055, Poisson: -0.1072160005569458
Epoch 0:  24%|▏| 182/766 [00:46<02:29,  3.91it/s, v_num=a0al, train_loss_step=19
Epoch 0:  24%|▏| 182/766 [00:46<02:29,  3.90it/s, v_num=a0al, train_loss_step=22
Multinomial: 20.010889053344727, Poisson: -0.09560806304216385
Epoch 0:  24%|▏| 183/766 [00:46<02:28,  3.91it/s, v_num=a0al, train_loss_step=22
Epoch 0:  24%|▏| 183/766 [00:46<02:29,  3.90it/s, v_num=a0al, train_loss_step=19
Multinomial: 20.048059463500977, Poisson: -0.09576614946126938

Epoch 0:  24%|▏| 184/766 [00:46<02:28,  3.92it/s, v_num=a0al, train_loss_step=19
Epoch 0:  24%|▏| 184/766 [00:47<02:29,  3.90it/s, v_num=a0al, train_loss_step=20
Multinomial: 21.188583374023438, Poisson: -0.10139279067516327
Epoch 0:  24%|▏| 185/766 [00:47<02:28,  3.91it/s, v_num=a0al, train_loss_step=20
Epoch 0:  24%|▏| 185/766 [00:47<02:28,  3.90it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.821096420288086, Poisson: -0.10449020564556122

Epoch 0:  24%|▏| 186/766 [00:47<02:28,  3.92it/s, v_num=a0al, train_loss_step=21
Epoch 0:  24%|▏| 186/766 [00:47<02:28,  3.91it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.78985023498535, Poisson: -0.10434942692518234
Epoch 0:  24%|▏| 187/766 [00:47<02:27,  3.92it/s, v_num=a0al, train_loss_step=21
Epoch 0:  24%|▏| 187/766 [00:47<02:28,  3.91it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.14258575439453, Poisson: -0.10168063640594482
Epoch 0:  25%|▏| 188/766 [00:47<02:27,  3.92it/s, v_num=a0al, train_loss_step=21
Epoch 0:  25%|▏| 188/766 [00:48<02:27,  3.91it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.629594802856445, Poisson: -0.09855164587497711

Epoch 0:  25%|▏| 189/766 [00:48<02:27,  3.92it/s, v_num=a0al, train_loss_step=21
Epoch 0:  25%|▏| 189/766 [00:48<02:27,  3.91it/s, v_num=a0al, train_loss_step=20
Multinomial: 22.340116500854492, Poisson: -0.10711188614368439
Epoch 0:  25%|▏| 190/766 [00:48<02:27,  3.91it/s, v_num=a0al, train_loss_step=20
Epoch 0:  25%|▏| 190/766 [00:48<02:27,  3.91it/s, v_num=a0al, train_loss_step=22
Multinomial: 16.662609100341797, Poisson: -0.07837900519371033

Epoch 0:  25%|▏| 191/766 [00:48<02:26,  3.92it/s, v_num=a0al, train_loss_step=22
Epoch 0:  25%|▏| 191/766 [00:48<02:26,  3.91it/s, v_num=a0al, train_loss_step=16
Multinomial: 20.059072494506836, Poisson: -0.09556883573532104
Epoch 0:  25%|▎| 192/766 [00:48<02:26,  3.92it/s, v_num=a0al, train_loss_step=16
Epoch 0:  25%|▎| 192/766 [00:49<02:26,  3.91it/s, v_num=a0al, train_loss_step=20
Multinomial: 20.55850601196289, Poisson: -0.09870775789022446

Epoch 0:  25%|▎| 193/766 [00:49<02:25,  3.93it/s, v_num=a0al, train_loss_step=20
Epoch 0:  25%|▎| 193/766 [00:49<02:26,  3.91it/s, v_num=a0al, train_loss_step=20
Multinomial: 19.497467041015625, Poisson: -0.09287054091691971

Epoch 0:  25%|▎| 194/766 [00:49<02:25,  3.93it/s, v_num=a0al, train_loss_step=20
Epoch 0:  25%|▎| 194/766 [00:49<02:26,  3.92it/s, v_num=a0al, train_loss_step=19
Multinomial: 16.54609489440918, Poisson: -0.07833902537822723
Epoch 0:  25%|▎| 195/766 [00:49<02:25,  3.92it/s, v_num=a0al, train_loss_step=19
Epoch 0:  25%|▎| 195/766 [00:49<02:25,  3.92it/s, v_num=a0al, train_loss_step=16
Multinomial: 21.772235870361328, Poisson: -0.1042647436261177
Epoch 0:  26%|▎| 196/766 [00:49<02:25,  3.93it/s, v_num=a0al, train_loss_step=16
Epoch 0:  26%|▎| 196/766 [00:50<02:25,  3.92it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.815385818481445, Poisson: -0.10429618507623672
Epoch 0:  26%|▎| 197/766 [00:50<02:24,  3.93it/s, v_num=a0al, train_loss_step=21
Epoch 0:  26%|▎| 197/766 [00:50<02:25,  3.92it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.140241622924805, Poisson: -0.10133679211139679

Epoch 0:  26%|▎| 198/766 [00:50<02:24,  3.93it/s, v_num=a0al, train_loss_step=21
Epoch 0:  26%|▎| 198/766 [00:50<02:24,  3.92it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.803293228149414, Poisson: -0.10434843599796295

Epoch 0:  26%|▎| 199/766 [00:50<02:24,  3.93it/s, v_num=a0al, train_loss_step=21
Epoch 0:  26%|▎| 199/766 [00:50<02:24,  3.92it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.736957550048828, Poisson: -0.1044909879565239
Epoch 0:  26%|▎| 200/766 [00:50<02:24,  3.92it/s, v_num=a0al, train_loss_step=21
Epoch 0:  26%|▎| 200/766 [00:50<02:24,  3.92it/s, v_num=a0al, train_loss_step=21
Multinomial: 25.19703483581543, Poisson: -0.12156690657138824
Epoch 0:  26%|▎| 201/766 [00:51<02:23,  3.93it/s, v_num=a0al, train_loss_step=21
Epoch 0:  26%|▎| 201/766 [00:51<02:24,  3.92it/s, v_num=a0al, train_loss_step=25
Multinomial: 22.888446807861328, Poisson: -0.10990883409976959
Epoch 0:  26%|▎| 202/766 [00:51<02:23,  3.93it/s, v_num=a0al, train_loss_step=25
Epoch 0:  26%|▎| 202/766 [00:51<02:23,  3.92it/s, v_num=a0al, train_loss_step=22
Multinomial: 24.0115909576416, Poisson: -0.11590568721294403

Epoch 0:  27%|▎| 203/766 [00:51<02:23,  3.94it/s, v_num=a0al, train_loss_step=22
Epoch 0:  27%|▎| 203/766 [00:51<02:23,  3.93it/s, v_num=a0al, train_loss_step=23
Multinomial: 18.955623626708984, Poisson: -0.08990591764450073

Epoch 0:  27%|▎| 204/766 [00:51<02:22,  3.94it/s, v_num=a0al, train_loss_step=23
Epoch 0:  27%|▎| 204/766 [00:51<02:23,  3.93it/s, v_num=a0al, train_loss_step=18
Multinomial: 23.459075927734375, Poisson: -0.11290311068296432
Epoch 0:  27%|▎| 205/766 [00:52<02:22,  3.93it/s, v_num=a0al, train_loss_step=18
Epoch 0:  27%|▎| 205/766 [00:52<02:22,  3.93it/s, v_num=a0al, train_loss_step=23
Multinomial: 20.095584869384766, Poisson: -0.09583885967731476
Epoch 0:  27%|▎| 206/766 [00:52<02:22,  3.94it/s, v_num=a0al, train_loss_step=23
Epoch 0:  27%|▎| 206/766 [00:52<02:22,  3.93it/s, v_num=a0al, train_loss_step=20
Multinomial: 16.653268814086914, Poisson: -0.07830702513456345
Epoch 0:  27%|▎| 207/766 [00:52<02:21,  3.94it/s, v_num=a0al, train_loss_step=20
Epoch 0:  27%|▎| 207/766 [00:52<02:22,  3.93it/s, v_num=a0al, train_loss_step=16
Multinomial: 20.061912536621094, Poisson: -0.09579043090343475

Epoch 0:  27%|▎| 208/766 [00:52<02:21,  3.94it/s, v_num=a0al, train_loss_step=16
Epoch 0:  27%|▎| 208/766 [00:52<02:21,  3.93it/s, v_num=a0al, train_loss_step=20
Multinomial: 21.23405647277832, Poisson: -0.1015443205833435

Epoch 0:  27%|▎| 209/766 [00:53<02:21,  3.94it/s, v_num=a0al, train_loss_step=20
Epoch 0:  27%|▎| 209/766 [00:53<02:21,  3.93it/s, v_num=a0al, train_loss_step=21
Multinomial: 22.337825775146484, Poisson: -0.10725290328264236
Epoch 0:  27%|▎| 210/766 [00:53<02:21,  3.93it/s, v_num=a0al, train_loss_step=21
Epoch 0:  27%|▎| 210/766 [00:53<02:21,  3.93it/s, v_num=a0al, train_loss_step=22
Multinomial: 21.747053146362305, Poisson: -0.10438595712184906
Epoch 0:  28%|▎| 211/766 [00:53<02:20,  3.94it/s, v_num=a0al, train_loss_step=22
Epoch 0:  28%|▎| 211/766 [00:53<02:21,  3.93it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.78754997253418, Poisson: -0.1044032946228981

Epoch 0:  28%|▎| 212/766 [00:53<02:20,  3.94it/s, v_num=a0al, train_loss_step=21
Epoch 0:  28%|▎| 212/766 [00:53<02:20,  3.93it/s, v_num=a0al, train_loss_step=21
Multinomial: 17.73340606689453, Poisson: -0.08447693288326263

Epoch 0:  28%|▎| 213/766 [00:53<02:20,  3.94it/s, v_num=a0al, train_loss_step=21
Epoch 0:  28%|▎| 213/766 [00:54<02:20,  3.93it/s, v_num=a0al, train_loss_step=17
Multinomial: 22.85317039489746, Poisson: -0.11031051725149155

Epoch 0:  28%|▎| 214/766 [00:54<02:19,  3.95it/s, v_num=a0al, train_loss_step=17
Epoch 0:  28%|▎| 214/766 [00:54<02:20,  3.94it/s, v_num=a0al, train_loss_step=22
Multinomial: 22.320491790771484, Poisson: -0.1070985198020935
Epoch 0:  28%|▎| 215/766 [00:54<02:19,  3.94it/s, v_num=a0al, train_loss_step=22
Epoch 0:  28%|▎| 215/766 [00:54<02:19,  3.94it/s, v_num=a0al, train_loss_step=22
Multinomial: 21.711591720581055, Poisson: -0.10438373684883118
Epoch 0:  28%|▎| 216/766 [00:54<02:19,  3.95it/s, v_num=a0al, train_loss_step=22
Epoch 0:  28%|▎| 216/766 [00:54<02:19,  3.94it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.738370895385742, Poisson: -0.10432856529951096

Epoch 0:  28%|▎| 217/766 [00:54<02:19,  3.95it/s, v_num=a0al, train_loss_step=21
Epoch 0:  28%|▎| 217/766 [00:55<02:19,  3.94it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.16659164428711, Poisson: -0.10148350149393082

Epoch 0:  28%|▎| 218/766 [00:55<02:18,  3.95it/s, v_num=a0al, train_loss_step=21
Epoch 0:  28%|▎| 218/766 [00:55<02:19,  3.94it/s, v_num=a0al, train_loss_step=21
Multinomial: 18.867549896240234, Poisson: -0.09012211859226227

Epoch 0:  29%|▎| 219/766 [00:55<02:18,  3.95it/s, v_num=a0al, train_loss_step=21
Epoch 0:  29%|▎| 219/766 [00:55<02:18,  3.94it/s, v_num=a0al, train_loss_step=18
Multinomial: 20.603740692138672, Poisson: -0.09860372543334961
Epoch 0:  29%|▎| 220/766 [00:55<02:18,  3.94it/s, v_num=a0al, train_loss_step=18
Epoch 0:  29%|▎| 220/766 [00:55<02:18,  3.94it/s, v_num=a0al, train_loss_step=20
Multinomial: 20.064496994018555, Poisson: -0.09578673541545868
Epoch 0:  29%|▎| 221/766 [00:55<02:17,  3.95it/s, v_num=a0al, train_loss_step=20
Epoch 0:  29%|▎| 221/766 [00:56<02:18,  3.94it/s, v_num=a0al, train_loss_step=20
Multinomial: 21.167654037475586, Poisson: -0.1014397069811821

Epoch 0:  29%|▎| 222/766 [00:56<02:17,  3.95it/s, v_num=a0al, train_loss_step=20
Epoch 0:  29%|▎| 222/766 [00:56<02:17,  3.94it/s, v_num=a0al, train_loss_step=21
Multinomial: 17.180723190307617, Poisson: -0.08138734847307205

Epoch 0:  29%|▎| 223/766 [00:56<02:17,  3.95it/s, v_num=a0al, train_loss_step=21
Epoch 0:  29%|▎| 223/766 [00:56<02:17,  3.94it/s, v_num=a0al, train_loss_step=17
Multinomial: 23.466264724731445, Poisson: -0.11305644363164902
Epoch 0:  29%|▎| 224/766 [00:56<02:17,  3.95it/s, v_num=a0al, train_loss_step=17
Epoch 0:  29%|▎| 224/766 [00:56<02:17,  3.94it/s, v_num=a0al, train_loss_step=23
Multinomial: 21.176498413085938, Poisson: -0.10131478309631348
Epoch 0:  29%|▎| 225/766 [00:57<02:17,  3.95it/s, v_num=a0al, train_loss_step=23
Epoch 0:  29%|▎| 225/766 [00:57<02:17,  3.95it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.572397232055664, Poisson: -0.09865520149469376
Epoch 0:  30%|▎| 226/766 [00:57<02:16,  3.96it/s, v_num=a0al, train_loss_step=21
Epoch 0:  30%|▎| 226/766 [00:57<02:16,  3.95it/s, v_num=a0al, train_loss_step=20
Multinomial: 22.918230056762695, Poisson: -0.110383540391922

Epoch 0:  30%|▎| 227/766 [00:57<02:16,  3.96it/s, v_num=a0al, train_loss_step=20
Epoch 0:  30%|▎| 227/766 [00:57<02:16,  3.95it/s, v_num=a0al, train_loss_step=22
Multinomial: 21.728248596191406, Poisson: -0.10435037314891815

Epoch 0:  30%|▎| 228/766 [00:57<02:15,  3.96it/s, v_num=a0al, train_loss_step=22
Epoch 0:  30%|▎| 228/766 [00:57<02:16,  3.95it/s, v_num=a0al, train_loss_step=21
Multinomial: 24.025983810424805, Poisson: -0.1158100888133049
Epoch 0:  30%|▎| 229/766 [00:57<02:15,  3.96it/s, v_num=a0al, train_loss_step=21
Epoch 0:  30%|▎| 229/766 [00:57<02:15,  3.95it/s, v_num=a0al, train_loss_step=23
Multinomial: 21.17369270324707, Poisson: -0.10164433717727661
Epoch 0:  30%|▎| 230/766 [00:58<02:15,  3.95it/s, v_num=a0al, train_loss_step=23
Epoch 0:  30%|▎| 230/766 [00:58<02:15,  3.95it/s, v_num=a0al, train_loss_step=21
Multinomial: 19.44134521484375, Poisson: -0.09278088063001633
Epoch 0:  30%|▎| 231/766 [00:58<02:15,  3.96it/s, v_num=a0al, train_loss_step=21
Epoch 0:  30%|▎| 231/766 [00:58<02:15,  3.95it/s, v_num=a0al, train_loss_step=19
Multinomial: 18.792768478393555, Poisson: -0.08994679898023605

Epoch 0:  30%|▎| 232/766 [00:58<02:14,  3.96it/s, v_num=a0al, train_loss_step=19
Epoch 0:  30%|▎| 232/766 [00:58<02:15,  3.95it/s, v_num=a0al, train_loss_step=18
Multinomial: 21.13924789428711, Poisson: -0.10129155963659286

Epoch 0:  30%|▎| 233/766 [00:58<02:14,  3.96it/s, v_num=a0al, train_loss_step=18
Epoch 0:  30%|▎| 233/766 [00:58<02:14,  3.95it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.57449722290039, Poisson: -0.09882348030805588
Epoch 0:  31%|▎| 234/766 [00:59<02:14,  3.96it/s, v_num=a0al, train_loss_step=21
Epoch 0:  31%|▎| 234/766 [00:59<02:14,  3.95it/s, v_num=a0al, train_loss_step=20
Multinomial: 20.53229522705078, Poisson: -0.09858986735343933
Epoch 0:  31%|▎| 235/766 [00:59<02:14,  3.95it/s, v_num=a0al, train_loss_step=20
Epoch 0:  31%|▎| 235/766 [00:59<02:14,  3.95it/s, v_num=a0al, train_loss_step=20
Multinomial: 21.744096755981445, Poisson: -0.10428847372531891
Epoch 0:  31%|▎| 236/766 [00:59<02:13,  3.96it/s, v_num=a0al, train_loss_step=20
Epoch 0:  31%|▎| 236/766 [00:59<02:14,  3.95it/s, v_num=a0al, train_loss_step=21
Multinomial: 22.346569061279297, Poisson: -0.1072879359126091

Epoch 0:  31%|▎| 237/766 [00:59<02:13,  3.96it/s, v_num=a0al, train_loss_step=21
Epoch 0:  31%|▎| 237/766 [00:59<02:13,  3.95it/s, v_num=a0al, train_loss_step=22
Multinomial: 18.879066467285156, Poisson: -0.08986541628837585

Epoch 0:  31%|▎| 238/766 [01:00<02:13,  3.96it/s, v_num=a0al, train_loss_step=22
Epoch 0:  31%|▎| 238/766 [01:00<02:13,  3.96it/s, v_num=a0al, train_loss_step=18
Multinomial: 22.27837371826172, Poisson: -0.10729967057704926
Epoch 0:  31%|▎| 239/766 [01:00<02:12,  3.97it/s, v_num=a0al, train_loss_step=18
Epoch 0:  31%|▎| 239/766 [01:00<02:13,  3.96it/s, v_num=a0al, train_loss_step=22
Multinomial: 20.026735305786133, Poisson: -0.09572773426771164
Epoch 0:  31%|▎| 240/766 [01:00<02:12,  3.96it/s, v_num=a0al, train_loss_step=22
Epoch 0:  31%|▎| 240/766 [01:00<02:12,  3.96it/s, v_num=a0al, train_loss_step=19
Multinomial: 23.477657318115234, Poisson: -0.11297494918107986

Epoch 0:  31%|▎| 241/766 [01:00<02:12,  3.97it/s, v_num=a0al, train_loss_step=19
Epoch 0:  31%|▎| 241/766 [01:00<02:12,  3.96it/s, v_num=a0al, train_loss_step=23
Multinomial: 20.000240325927734, Poisson: -0.09568235278129578

Epoch 0:  32%|▎| 242/766 [01:01<02:12,  3.97it/s, v_num=a0al, train_loss_step=23
Epoch 0:  32%|▎| 242/766 [01:01<02:12,  3.96it/s, v_num=a0al, train_loss_step=19
Multinomial: 22.904945373535156, Poisson: -0.11020729690790176

Epoch 0:  32%|▎| 243/766 [01:01<02:11,  3.97it/s, v_num=a0al, train_loss_step=19
Epoch 0:  32%|▎| 243/766 [01:01<02:12,  3.96it/s, v_num=a0al, train_loss_step=22
Multinomial: 19.478771209716797, Poisson: -0.09279295802116394
Epoch 0:  32%|▎| 244/766 [01:01<02:11,  3.97it/s, v_num=a0al, train_loss_step=22
Epoch 0:  32%|▎| 244/766 [01:01<02:11,  3.96it/s, v_num=a0al, train_loss_step=19
Multinomial: 20.575580596923828, Poisson: -0.09879402071237564
Epoch 0:  32%|▎| 245/766 [01:01<02:11,  3.96it/s, v_num=a0al, train_loss_step=19
Epoch 0:  32%|▎| 245/766 [01:01<02:11,  3.96it/s, v_num=a0al, train_loss_step=20
Multinomial: 20.602312088012695, Poisson: -0.0987543910741806

Epoch 0:  32%|▎| 246/766 [01:01<02:10,  3.97it/s, v_num=a0al, train_loss_step=20
Epoch 0:  32%|▎| 246/766 [01:02<02:11,  3.96it/s, v_num=a0al, train_loss_step=20
Multinomial: 21.788652420043945, Poisson: -0.10461057722568512

Epoch 0:  32%|▎| 247/766 [01:02<02:10,  3.97it/s, v_num=a0al, train_loss_step=20
Epoch 0:  32%|▎| 247/766 [01:02<02:10,  3.96it/s, v_num=a0al, train_loss_step=21
Multinomial: 17.194067001342773, Poisson: -0.08147089183330536

Epoch 0:  32%|▎| 248/766 [01:02<02:10,  3.97it/s, v_num=a0al, train_loss_step=21
Epoch 0:  32%|▎| 248/766 [01:02<02:10,  3.96it/s, v_num=a0al, train_loss_step=17
Multinomial: 20.02446174621582, Poisson: -0.09592930972576141
Epoch 0:  33%|▎| 249/766 [01:02<02:10,  3.97it/s, v_num=a0al, train_loss_step=17
Epoch 0:  33%|▎| 249/766 [01:02<02:10,  3.96it/s, v_num=a0al, train_loss_step=19
Multinomial: 19.48725128173828, Poisson: -0.09317978471517563
Epoch 0:  33%|▎| 250/766 [01:03<02:10,  3.96it/s, v_num=a0al, train_loss_step=19
Epoch 0:  33%|▎| 250/766 [01:03<02:10,  3.96it/s, v_num=a0al, train_loss_step=19
Multinomial: 22.3828125, Poisson: -0.10734201222658157

Epoch 0:  33%|▎| 251/766 [01:03<02:09,  3.97it/s, v_num=a0al, train_loss_step=19
Epoch 0:  33%|▎| 251/766 [01:03<02:09,  3.96it/s, v_num=a0al, train_loss_step=22
Multinomial: 21.242563247680664, Poisson: -0.10164496302604675

Epoch 0:  33%|▎| 252/766 [01:03<02:09,  3.97it/s, v_num=a0al, train_loss_step=22
Epoch 0:  33%|▎| 252/766 [01:03<02:09,  3.97it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.66411781311035, Poisson: -0.09871603548526764

Epoch 0:  33%|▎| 253/766 [01:03<02:09,  3.97it/s, v_num=a0al, train_loss_step=21
Epoch 0:  33%|▎| 253/766 [01:03<02:09,  3.97it/s, v_num=a0al, train_loss_step=20
Multinomial: 22.409164428710938, Poisson: -0.10761073976755142
Epoch 0:  33%|▎| 254/766 [01:03<02:08,  3.98it/s, v_num=a0al, train_loss_step=20
Epoch 0:  33%|▎| 254/766 [01:04<02:09,  3.97it/s, v_num=a0al, train_loss_step=22
Multinomial: 22.343318939208984, Poisson: -0.1072956845164299
Epoch 0:  33%|▎| 255/766 [01:04<02:08,  3.97it/s, v_num=a0al, train_loss_step=22
Epoch 0:  33%|▎| 255/766 [01:04<02:08,  3.97it/s, v_num=a0al, train_loss_step=22
Multinomial: 20.058074951171875, Poisson: -0.09569811075925827

Epoch 0:  33%|▎| 256/766 [01:04<02:08,  3.98it/s, v_num=a0al, train_loss_step=22
Epoch 0:  33%|▎| 256/766 [01:04<02:08,  3.97it/s, v_num=a0al, train_loss_step=20
Multinomial: 21.18549346923828, Poisson: -0.10135468095541

Epoch 0:  34%|▎| 257/766 [01:04<02:07,  3.98it/s, v_num=a0al, train_loss_step=20
Epoch 0:  34%|▎| 257/766 [01:04<02:08,  3.97it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.718685150146484, Poisson: -0.10441819578409195
Epoch 0:  34%|▎| 258/766 [01:04<02:07,  3.98it/s, v_num=a0al, train_loss_step=21
Epoch 0:  34%|▎| 258/766 [01:04<02:07,  3.97it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.021270751953125, Poisson: -0.09564517438411713

Epoch 0:  34%|▎| 259/766 [01:05<02:07,  3.98it/s, v_num=a0al, train_loss_step=21
Epoch 0:  34%|▎| 259/766 [01:05<02:07,  3.97it/s, v_num=a0al, train_loss_step=19
Multinomial: 18.88401222229004, Poisson: -0.09017042070627213
Epoch 0:  34%|▎| 260/766 [01:05<02:07,  3.97it/s, v_num=a0al, train_loss_step=19
Epoch 0:  34%|▎| 260/766 [01:05<02:07,  3.97it/s, v_num=a0al, train_loss_step=18
Multinomial: 17.753528594970703, Poisson: -0.08439047634601593

Epoch 0:  34%|▎| 261/766 [01:05<02:06,  3.98it/s, v_num=a0al, train_loss_step=18
Epoch 0:  34%|▎| 261/766 [01:05<02:07,  3.97it/s, v_num=a0al, train_loss_step=17
Multinomial: 19.46993064880371, Poisson: -0.09300607442855835
Epoch 0:  34%|▎| 262/766 [01:05<02:06,  3.98it/s, v_num=a0al, train_loss_step=17
Epoch 0:  34%|▎| 262/766 [01:05<02:06,  3.97it/s, v_num=a0al, train_loss_step=19
Multinomial: 20.609466552734375, Poisson: -0.09894497692584991

Epoch 0:  34%|▎| 263/766 [01:06<02:06,  3.98it/s, v_num=a0al, train_loss_step=19
Epoch 0:  34%|▎| 263/766 [01:06<02:06,  3.97it/s, v_num=a0al, train_loss_step=20
Multinomial: 21.225048065185547, Poisson: -0.1015915721654892

Epoch 0:  34%|▎| 264/766 [01:06<02:06,  3.98it/s, v_num=a0al, train_loss_step=20
Epoch 0:  34%|▎| 264/766 [01:06<02:06,  3.97it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.738489151000977, Poisson: -0.10424954444169998
Epoch 0:  35%|▎| 265/766 [01:06<02:06,  3.97it/s, v_num=a0al, train_loss_step=21
Epoch 0:  35%|▎| 265/766 [01:06<02:06,  3.97it/s, v_num=a0al, train_loss_step=21
Multinomial: 18.935768127441406, Poisson: -0.08997920900583267

Epoch 0:  35%|▎| 266/766 [01:06<02:05,  3.98it/s, v_num=a0al, train_loss_step=21
Epoch 0:  35%|▎| 266/766 [01:06<02:05,  3.97it/s, v_num=a0al, train_loss_step=18
Multinomial: 18.35933494567871, Poisson: -0.08705601096153259
Epoch 0:  35%|▎| 267/766 [01:07<02:05,  3.98it/s, v_num=a0al, train_loss_step=18
Epoch 0:  35%|▎| 267/766 [01:07<02:05,  3.97it/s, v_num=a0al, train_loss_step=18
Multinomial: 16.61872673034668, Poisson: -0.07853472977876663

Epoch 0:  35%|▎| 268/766 [01:07<02:05,  3.98it/s, v_num=a0al, train_loss_step=18
Epoch 0:  35%|▎| 268/766 [01:07<02:05,  3.98it/s, v_num=a0al, train_loss_step=16
Multinomial: 21.18777084350586, Poisson: -0.10162300616502762

Epoch 0:  35%|▎| 269/766 [01:07<02:04,  3.98it/s, v_num=a0al, train_loss_step=16
Epoch 0:  35%|▎| 269/766 [01:07<02:04,  3.98it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.008358001708984, Poisson: -0.09578895568847656
Epoch 0:  35%|▎| 270/766 [01:07<02:04,  3.98it/s, v_num=a0al, train_loss_step=21
Epoch 0:  35%|▎| 270/766 [01:07<02:04,  3.98it/s, v_num=a0al, train_loss_step=19
Multinomial: 21.775907516479492, Poisson: -0.10441450029611588
Epoch 0:  35%|▎| 271/766 [01:08<02:04,  3.98it/s, v_num=a0al, train_loss_step=19
Epoch 0:  35%|▎| 271/766 [01:08<02:04,  3.98it/s, v_num=a0al, train_loss_step=21
Multinomial: 19.45305824279785, Poisson: -0.09294477105140686
Epoch 0:  36%|▎| 272/766 [01:08<02:03,  3.99it/s, v_num=a0al, train_loss_step=21
Epoch 0:  36%|▎| 272/766 [01:08<02:04,  3.98it/s, v_num=a0al, train_loss_step=19
Multinomial: 19.475934982299805, Poisson: -0.09292714297771454

Epoch 0:  36%|▎| 273/766 [01:08<02:03,  3.99it/s, v_num=a0al, train_loss_step=19
Epoch 0:  36%|▎| 273/766 [01:08<02:03,  3.98it/s, v_num=a0al, train_loss_step=19
Multinomial: 24.073951721191406, Poisson: -0.11599202454090118

Epoch 0:  36%|▎| 274/766 [01:08<02:03,  3.99it/s, v_num=a0al, train_loss_step=19
Epoch 0:  36%|▎| 274/766 [01:08<02:03,  3.98it/s, v_num=a0al, train_loss_step=24
Multinomial: 19.46885871887207, Poisson: -0.09305798262357712
Epoch 0:  36%|▎| 275/766 [01:09<02:03,  3.98it/s, v_num=a0al, train_loss_step=24
Epoch 0:  36%|▎| 275/766 [01:09<02:03,  3.98it/s, v_num=a0al, train_loss_step=19
Multinomial: 21.743139266967773, Poisson: -0.1045631468296051
Epoch 0:  36%|▎| 276/766 [01:09<02:02,  3.99it/s, v_num=a0al, train_loss_step=19
Epoch 0:  36%|▎| 276/766 [01:09<02:03,  3.98it/s, v_num=a0al, train_loss_step=21
Multinomial: 17.68687629699707, Poisson: -0.08435137569904327
Epoch 0:  36%|▎| 277/766 [01:09<02:02,  3.99it/s, v_num=a0al, train_loss_step=21
Epoch 0:  36%|▎| 277/766 [01:09<02:02,  3.98it/s, v_num=a0al, train_loss_step=17
Multinomial: 20.068389892578125, Poisson: -0.09600233286619186

Epoch 0:  36%|▎| 278/766 [01:09<02:02,  3.99it/s, v_num=a0al, train_loss_step=17
Epoch 0:  36%|▎| 278/766 [01:09<02:02,  3.98it/s, v_num=a0al, train_loss_step=20
Multinomial: 19.4620361328125, Poisson: -0.09318304061889648

Epoch 0:  36%|▎| 279/766 [01:09<02:02,  3.99it/s, v_num=a0al, train_loss_step=20
Epoch 0:  36%|▎| 279/766 [01:10<02:02,  3.98it/s, v_num=a0al, train_loss_step=19
Multinomial: 17.180936813354492, Poisson: -0.08134336769580841
Epoch 0:  37%|▎| 280/766 [01:10<02:02,  3.98it/s, v_num=a0al, train_loss_step=19
Epoch 0:  37%|▎| 280/766 [01:10<02:02,  3.98it/s, v_num=a0al, train_loss_step=17
Multinomial: 18.297426223754883, Poisson: -0.08734682202339172
Epoch 0:  37%|▎| 281/766 [01:10<02:01,  3.99it/s, v_num=a0al, train_loss_step=17
Epoch 0:  37%|▎| 281/766 [01:10<02:01,  3.98it/s, v_num=a0al, train_loss_step=18
Multinomial: 20.0560245513916, Poisson: -0.09584958106279373

Epoch 0:  37%|▎| 282/766 [01:10<02:01,  3.99it/s, v_num=a0al, train_loss_step=18
Epoch 0:  37%|▎| 282/766 [01:10<02:01,  3.98it/s, v_num=a0al, train_loss_step=20
Multinomial: 22.90555191040039, Poisson: -0.11051429063081741

Epoch 0:  37%|▎| 283/766 [01:10<02:00,  3.99it/s, v_num=a0al, train_loss_step=20
Epoch 0:  37%|▎| 283/766 [01:11<02:01,  3.98it/s, v_num=a0al, train_loss_step=22
Multinomial: 21.218358993530273, Poisson: -0.10195266455411911

Epoch 0:  37%|▎| 284/766 [01:11<02:00,  3.99it/s, v_num=a0al, train_loss_step=22
Epoch 0:  37%|▎| 284/766 [01:11<02:00,  3.98it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.60708236694336, Poisson: -0.09894034266471863
Epoch 0:  37%|▎| 285/766 [01:11<02:00,  3.99it/s, v_num=a0al, train_loss_step=21
Epoch 0:  37%|▎| 285/766 [01:11<02:00,  3.99it/s, v_num=a0al, train_loss_step=20
Multinomial: 23.4676513671875, Poisson: -0.11325454711914062
Epoch 0:  37%|▎| 286/766 [01:11<02:00,  3.99it/s, v_num=a0al, train_loss_step=20
Epoch 0:  37%|▎| 286/766 [01:11<02:00,  3.99it/s, v_num=a0al, train_loss_step=23
Multinomial: 21.798397064208984, Poisson: -0.1048145592212677

Epoch 0:  37%|▎| 287/766 [01:11<01:59,  3.99it/s, v_num=a0al, train_loss_step=23
Epoch 0:  37%|▎| 287/766 [01:11<02:00,  3.99it/s, v_num=a0al, train_loss_step=21
Multinomial: 23.460453033447266, Poisson: -0.11320748180150986

Epoch 0:  38%|▍| 288/766 [01:12<01:59,  3.99it/s, v_num=a0al, train_loss_step=21
Epoch 0:  38%|▍| 288/766 [01:12<01:59,  3.99it/s, v_num=a0al, train_loss_step=23
Multinomial: 19.974878311157227, Poisson: -0.0959002673625946
Epoch 0:  38%|▍| 289/766 [01:12<01:59,  3.99it/s, v_num=a0al, train_loss_step=23
Epoch 0:  38%|▍| 289/766 [01:12<01:59,  3.99it/s, v_num=a0al, train_loss_step=19
Multinomial: 24.074901580810547, Poisson: -0.11613652110099792
Epoch 0:  38%|▍| 290/766 [01:12<01:59,  3.99it/s, v_num=a0al, train_loss_step=19
Epoch 0:  38%|▍| 290/766 [01:12<01:59,  3.99it/s, v_num=a0al, train_loss_step=24
Multinomial: 22.31687355041504, Poisson: -0.10731612145900726
Epoch 0:  38%|▍| 291/766 [01:12<01:58,  4.00it/s, v_num=a0al, train_loss_step=24
Epoch 0:  38%|▍| 291/766 [01:12<01:59,  3.99it/s, v_num=a0al, train_loss_step=22
Multinomial: 22.34976577758789, Poisson: -0.10729110985994339

Epoch 0:  38%|▍| 292/766 [01:13<01:58,  4.00it/s, v_num=a0al, train_loss_step=22
Epoch 0:  38%|▍| 292/766 [01:13<01:58,  3.99it/s, v_num=a0al, train_loss_step=22
Multinomial: 21.7636661529541, Poisson: -0.1047850176692009

Epoch 0:  38%|▍| 293/766 [01:13<01:58,  4.00it/s, v_num=a0al, train_loss_step=22
Epoch 0:  38%|▍| 293/766 [01:13<01:58,  3.99it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.743812561035156, Poisson: -0.1044720709323883
Epoch 0:  38%|▍| 294/766 [01:13<01:58,  4.00it/s, v_num=a0al, train_loss_step=21
Epoch 0:  38%|▍| 294/766 [01:13<01:58,  3.99it/s, v_num=a0al, train_loss_step=21
Multinomial: 18.24456024169922, Poisson: -0.08739378303289413
Epoch 0:  39%|▍| 295/766 [01:13<01:58,  3.99it/s, v_num=a0al, train_loss_step=21
Epoch 0:  39%|▍| 295/766 [01:13<01:58,  3.99it/s, v_num=a0al, train_loss_step=18
Multinomial: 19.432994842529297, Poisson: -0.09291543066501617

Epoch 0:  39%|▍| 296/766 [01:14<01:57,  4.00it/s, v_num=a0al, train_loss_step=18
Epoch 0:  39%|▍| 296/766 [01:14<01:57,  3.99it/s, v_num=a0al, train_loss_step=19
Multinomial: 19.49660301208496, Poisson: -0.09277672320604324

Epoch 0:  39%|▍| 297/766 [01:14<01:57,  4.00it/s, v_num=a0al, train_loss_step=19
Epoch 0:  39%|▍| 297/766 [01:14<01:57,  3.99it/s, v_num=a0al, train_loss_step=19
Multinomial: 21.835451126098633, Poisson: -0.10490729659795761

Epoch 0:  39%|▍| 298/766 [01:14<01:57,  4.00it/s, v_num=a0al, train_loss_step=19
Epoch 0:  39%|▍| 298/766 [01:14<01:57,  3.99it/s, v_num=a0al, train_loss_step=21
Multinomial: 22.90227699279785, Poisson: -0.11049213260412216
Epoch 0:  39%|▍| 299/766 [01:14<01:56,  4.00it/s, v_num=a0al, train_loss_step=21
Epoch 0:  39%|▍| 299/766 [01:14<01:56,  3.99it/s, v_num=a0al, train_loss_step=22
Multinomial: 17.16424560546875, Poisson: -0.08146752417087555
Epoch 0:  39%|▍| 300/766 [01:15<01:56,  3.99it/s, v_num=a0al, train_loss_step=22
Epoch 0:  39%|▍| 300/766 [01:15<01:56,  3.99it/s, v_num=a0al, train_loss_step=17
Multinomial: 21.780229568481445, Poisson: -0.10456757247447968

Epoch 0:  39%|▍| 301/766 [01:15<01:56,  4.00it/s, v_num=a0al, train_loss_step=17
Epoch 0:  39%|▍| 301/766 [01:15<01:56,  3.99it/s, v_num=a0al, train_loss_step=21
Multinomial: 19.533884048461914, Poisson: -0.09327836334705353

Epoch 0:  39%|▍| 302/766 [01:15<01:55,  4.00it/s, v_num=a0al, train_loss_step=21
Epoch 0:  39%|▍| 302/766 [01:15<01:56,  3.99it/s, v_num=a0al, train_loss_step=19
Multinomial: 19.492950439453125, Poisson: -0.0929587110877037

Epoch 0:  40%|▍| 303/766 [01:15<01:55,  4.00it/s, v_num=a0al, train_loss_step=19
Epoch 0:  40%|▍| 303/766 [01:15<01:55,  3.99it/s, v_num=a0al, train_loss_step=19
Multinomial: 25.189559936523438, Poisson: -0.12199914455413818
Epoch 0:  40%|▍| 304/766 [01:15<01:55,  4.00it/s, v_num=a0al, train_loss_step=19
Epoch 0:  40%|▍| 304/766 [01:16<01:55,  3.99it/s, v_num=a0al, train_loss_step=25
Multinomial: 22.94580841064453, Poisson: -0.11033787578344345
Epoch 0:  40%|▍| 305/766 [01:16<01:55,  4.00it/s, v_num=a0al, train_loss_step=25
Epoch 0:  40%|▍| 305/766 [01:16<01:55,  3.99it/s, v_num=a0al, train_loss_step=22
Multinomial: 19.95433235168457, Poisson: -0.09568342566490173

Epoch 0:  40%|▍| 306/766 [01:16<01:54,  4.00it/s, v_num=a0al, train_loss_step=22
Epoch 0:  40%|▍| 306/766 [01:16<01:55,  4.00it/s, v_num=a0al, train_loss_step=19
Multinomial: 21.165157318115234, Poisson: -0.10151126980781555

Epoch 0:  40%|▍| 307/766 [01:16<01:54,  4.00it/s, v_num=a0al, train_loss_step=19
Epoch 0:  40%|▍| 307/766 [01:16<01:54,  4.00it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.755311965942383, Poisson: -0.10471049696207047

Epoch 0:  40%|▍| 308/766 [01:16<01:54,  4.00it/s, v_num=a0al, train_loss_step=21
Epoch 0:  40%|▍| 308/766 [01:17<01:54,  4.00it/s, v_num=a0al, train_loss_step=21
Multinomial: 23.50486946105957, Poisson: -0.11317897588014603
Epoch 0:  40%|▍| 309/766 [01:17<01:54,  4.00it/s, v_num=a0al, train_loss_step=21
Epoch 0:  40%|▍| 309/766 [01:17<01:54,  4.00it/s, v_num=a0al, train_loss_step=23
Multinomial: 20.089189529418945, Poisson: -0.09579245746135712
Epoch 0:  40%|▍| 310/766 [01:17<01:54,  4.00it/s, v_num=a0al, train_loss_step=23
Epoch 0:  40%|▍| 310/766 [01:17<01:54,  4.00it/s, v_num=a0al, train_loss_step=20
Multinomial: 19.973127365112305, Poisson: -0.09630625694990158

Epoch 0:  41%|▍| 311/766 [01:17<01:53,  4.00it/s, v_num=a0al, train_loss_step=20
Epoch 0:  41%|▍| 311/766 [01:17<01:53,  4.00it/s, v_num=a0al, train_loss_step=19
Multinomial: 19.46533203125, Poisson: -0.09327004104852676

Epoch 0:  41%|▍| 312/766 [01:17<01:53,  4.01it/s, v_num=a0al, train_loss_step=19
Epoch 0:  41%|▍| 312/766 [01:18<01:53,  4.00it/s, v_num=a0al, train_loss_step=19
Multinomial: 18.34235954284668, Poisson: -0.08725226670503616

Epoch 0:  41%|▍| 313/766 [01:18<01:53,  4.01it/s, v_num=a0al, train_loss_step=19
Epoch 0:  41%|▍| 313/766 [01:18<01:53,  4.00it/s, v_num=a0al, train_loss_step=18
Multinomial: 21.199413299560547, Poisson: -0.10223302245140076
Epoch 0:  41%|▍| 314/766 [01:18<01:52,  4.01it/s, v_num=a0al, train_loss_step=18
Epoch 0:  41%|▍| 314/766 [01:18<01:53,  4.00it/s, v_num=a0al, train_loss_step=21
Multinomial: 19.443342208862305, Poisson: -0.0930081233382225
Epoch 0:  41%|▍| 315/766 [01:18<01:52,  4.00it/s, v_num=a0al, train_loss_step=21
Epoch 0:  41%|▍| 315/766 [01:18<01:52,  4.00it/s, v_num=a0al, train_loss_step=19
Multinomial: 21.190698623657227, Poisson: -0.10157360881567001

Epoch 0:  41%|▍| 316/766 [01:18<01:52,  4.01it/s, v_num=a0al, train_loss_step=19
Epoch 0:  41%|▍| 316/766 [01:19<01:52,  4.00it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.79853057861328, Poisson: -0.1045677438378334

Epoch 0:  41%|▍| 317/766 [01:19<01:52,  4.01it/s, v_num=a0al, train_loss_step=21
Epoch 0:  41%|▍| 317/766 [01:19<01:52,  4.00it/s, v_num=a0al, train_loss_step=21
Multinomial: 17.742656707763672, Poisson: -0.08421915024518967
Epoch 0:  42%|▍| 318/766 [01:19<01:51,  4.01it/s, v_num=a0al, train_loss_step=21
Epoch 0:  42%|▍| 318/766 [01:19<01:51,  4.00it/s, v_num=a0al, train_loss_step=17
Multinomial: 18.330219268798828, Poisson: -0.08722923696041107
Epoch 0:  42%|▍| 319/766 [01:19<01:51,  4.01it/s, v_num=a0al, train_loss_step=17
Epoch 0:  42%|▍| 319/766 [01:19<01:51,  4.00it/s, v_num=a0al, train_loss_step=18
Multinomial: 23.47544288635254, Poisson: -0.11358413100242615
Epoch 0:  42%|▍| 320/766 [01:19<01:51,  4.00it/s, v_num=a0al, train_loss_step=18
Epoch 0:  42%|▍| 320/766 [01:19<01:51,  4.00it/s, v_num=a0al, train_loss_step=23
Multinomial: 19.467269897460938, Poisson: -0.09299182146787643

Epoch 0:  42%|▍| 321/766 [01:20<01:51,  4.01it/s, v_num=a0al, train_loss_step=23
Epoch 0:  42%|▍| 321/766 [01:20<01:51,  4.00it/s, v_num=a0al, train_loss_step=19
Multinomial: 20.6672306060791, Poisson: -0.09876550734043121

Epoch 0:  42%|▍| 322/766 [01:20<01:50,  4.01it/s, v_num=a0al, train_loss_step=19
Epoch 0:  42%|▍| 322/766 [01:20<01:50,  4.00it/s, v_num=a0al, train_loss_step=20
Multinomial: 21.132488250732422, Poisson: -0.1016790047287941
Epoch 0:  42%|▍| 323/766 [01:20<01:50,  4.01it/s, v_num=a0al, train_loss_step=20
Epoch 0:  42%|▍| 323/766 [01:20<01:50,  4.00it/s, v_num=a0al, train_loss_step=21
Multinomial: 18.278751373291016, Poisson: -0.08713781833648682

Epoch 0:  42%|▍| 324/766 [01:20<01:50,  4.01it/s, v_num=a0al, train_loss_step=21
Epoch 0:  42%|▍| 324/766 [01:20<01:50,  4.00it/s, v_num=a0al, train_loss_step=18
Multinomial: 21.759872436523438, Poisson: -0.1045098751783371
Epoch 0:  42%|▍| 325/766 [01:21<01:50,  4.00it/s, v_num=a0al, train_loss_step=18
Epoch 0:  42%|▍| 325/766 [01:21<01:50,  4.00it/s, v_num=a0al, train_loss_step=21
Multinomial: 24.067596435546875, Poisson: -0.11585790663957596

Epoch 0:  43%|▍| 326/766 [01:21<01:49,  4.01it/s, v_num=a0al, train_loss_step=21
Epoch 0:  43%|▍| 326/766 [01:21<01:49,  4.00it/s, v_num=a0al, train_loss_step=24
Multinomial: 18.932355880737305, Poisson: -0.08997628837823868

Epoch 0:  43%|▍| 327/766 [01:21<01:49,  4.01it/s, v_num=a0al, train_loss_step=24
Epoch 0:  43%|▍| 327/766 [01:21<01:49,  4.00it/s, v_num=a0al, train_loss_step=18
Multinomial: 18.26723861694336, Poisson: -0.08742334693670273
Epoch 0:  43%|▍| 328/766 [01:21<01:49,  4.01it/s, v_num=a0al, train_loss_step=18
Epoch 0:  43%|▍| 328/766 [01:21<01:49,  4.01it/s, v_num=a0al, train_loss_step=18
Multinomial: 21.192481994628906, Poisson: -0.10159247368574142

Epoch 0:  43%|▍| 329/766 [01:22<01:48,  4.01it/s, v_num=a0al, train_loss_step=18
Epoch 0:  43%|▍| 329/766 [01:22<01:49,  4.01it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.774616241455078, Poisson: -0.1045759841799736
Epoch 0:  43%|▍| 330/766 [01:22<01:48,  4.01it/s, v_num=a0al, train_loss_step=21
Epoch 0:  43%|▍| 330/766 [01:22<01:48,  4.01it/s, v_num=a0al, train_loss_step=21
Multinomial: 22.918109893798828, Poisson: -0.11037542670965195

Epoch 0:  43%|▍| 331/766 [01:22<01:48,  4.01it/s, v_num=a0al, train_loss_step=21
Epoch 0:  43%|▍| 331/766 [01:22<01:48,  4.01it/s, v_num=a0al, train_loss_step=22
Multinomial: 17.097972869873047, Poisson: -0.08164189010858536

Epoch 0:  43%|▍| 332/766 [01:22<01:48,  4.01it/s, v_num=a0al, train_loss_step=22
Epoch 0:  43%|▍| 332/766 [01:22<01:48,  4.01it/s, v_num=a0al, train_loss_step=17
Multinomial: 19.439931869506836, Poisson: -0.09298811107873917
Epoch 0:  43%|▍| 333/766 [01:22<01:47,  4.01it/s, v_num=a0al, train_loss_step=17
Epoch 0:  43%|▍| 333/766 [01:23<01:48,  4.01it/s, v_num=a0al, train_loss_step=19
Multinomial: 20.610124588012695, Poisson: -0.09887861460447311

Epoch 0:  44%|▍| 334/766 [01:23<01:47,  4.01it/s, v_num=a0al, train_loss_step=19
Epoch 0:  44%|▍| 334/766 [01:23<01:47,  4.01it/s, v_num=a0al, train_loss_step=20
Multinomial: 19.422632217407227, Poisson: -0.09310529381036758
Epoch 0:  44%|▍| 335/766 [01:23<01:47,  4.01it/s, v_num=a0al, train_loss_step=20
Epoch 0:  44%|▍| 335/766 [01:23<01:47,  4.01it/s, v_num=a0al, train_loss_step=19
Multinomial: 20.64097785949707, Poisson: -0.09903261810541153

Epoch 0:  44%|▍| 336/766 [01:23<01:47,  4.01it/s, v_num=a0al, train_loss_step=19
Epoch 0:  44%|▍| 336/766 [01:23<01:47,  4.01it/s, v_num=a0al, train_loss_step=20
Multinomial: 21.74676513671875, Poisson: -0.10450173169374466
Epoch 0:  44%|▍| 337/766 [01:23<01:46,  4.01it/s, v_num=a0al, train_loss_step=20
Epoch 0:  44%|▍| 337/766 [01:24<01:47,  4.01it/s, v_num=a0al, train_loss_step=21
Multinomial: 22.961679458618164, Poisson: -0.11031122505664825
Epoch 0:  44%|▍| 338/766 [01:24<01:46,  4.02it/s, v_num=a0al, train_loss_step=21
Epoch 0:  44%|▍| 338/766 [01:24<01:46,  4.01it/s, v_num=a0al, train_loss_step=22
Multinomial: 21.743120193481445, Poisson: -0.10468468070030212

Epoch 0:  44%|▍| 339/766 [01:24<01:46,  4.02it/s, v_num=a0al, train_loss_step=22
Epoch 0:  44%|▍| 339/766 [01:24<01:46,  4.01it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.76246452331543, Poisson: -0.10444938391447067
Epoch 0:  44%|▍| 340/766 [01:24<01:46,  4.01it/s, v_num=a0al, train_loss_step=21
Epoch 0:  44%|▍| 340/766 [01:24<01:46,  4.01it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.034584045410156, Poisson: -0.0958065316081047

Epoch 0:  45%|▍| 341/766 [01:24<01:45,  4.02it/s, v_num=a0al, train_loss_step=21
Epoch 0:  45%|▍| 341/766 [01:25<01:45,  4.01it/s, v_num=a0al, train_loss_step=19
Multinomial: 21.187660217285156, Poisson: -0.1017942950129509
Epoch 0:  45%|▍| 342/766 [01:25<01:45,  4.02it/s, v_num=a0al, train_loss_step=19
Epoch 0:  45%|▍| 342/766 [01:25<01:45,  4.01it/s, v_num=a0al, train_loss_step=21
Multinomial: 19.44845962524414, Poisson: -0.0929780825972557
Epoch 0:  45%|▍| 343/766 [01:25<01:45,  4.02it/s, v_num=a0al, train_loss_step=21
Epoch 0:  45%|▍| 343/766 [01:25<01:45,  4.01it/s, v_num=a0al, train_loss_step=19
Multinomial: 19.508325576782227, Poisson: -0.09298436343669891

Epoch 0:  45%|▍| 344/766 [01:25<01:45,  4.02it/s, v_num=a0al, train_loss_step=19
Epoch 0:  45%|▍| 344/766 [01:25<01:45,  4.01it/s, v_num=a0al, train_loss_step=19
Multinomial: 20.59770393371582, Poisson: -0.09871768206357956
Epoch 0:  45%|▍| 345/766 [01:26<01:44,  4.01it/s, v_num=a0al, train_loss_step=19
Epoch 0:  45%|▍| 345/766 [01:26<01:44,  4.01it/s, v_num=a0al, train_loss_step=20
Multinomial: 18.810609817504883, Poisson: -0.09023444354534149

Epoch 0:  45%|▍| 346/766 [01:26<01:44,  4.02it/s, v_num=a0al, train_loss_step=20
Epoch 0:  45%|▍| 346/766 [01:26<01:44,  4.01it/s, v_num=a0al, train_loss_step=18
Multinomial: 19.55392837524414, Poisson: -0.09318080544471741
Epoch 0:  45%|▍| 347/766 [01:26<01:44,  4.02it/s, v_num=a0al, train_loss_step=18
Epoch 0:  45%|▍| 347/766 [01:26<01:44,  4.01it/s, v_num=a0al, train_loss_step=19
Multinomial: 22.338863372802734, Poisson: -0.10776587575674057
Epoch 0:  45%|▍| 348/766 [01:26<01:44,  4.02it/s, v_num=a0al, train_loss_step=19
Epoch 0:  45%|▍| 348/766 [01:26<01:44,  4.01it/s, v_num=a0al, train_loss_step=22
Multinomial: 21.75655746459961, Poisson: -0.10445886850357056

Epoch 0:  46%|▍| 349/766 [01:26<01:43,  4.02it/s, v_num=a0al, train_loss_step=22
Epoch 0:  46%|▍| 349/766 [01:26<01:43,  4.01it/s, v_num=a0al, train_loss_step=21
Multinomial: 16.552867889404297, Poisson: -0.07851012051105499
Epoch 0:  46%|▍| 350/766 [01:27<01:43,  4.01it/s, v_num=a0al, train_loss_step=21
Epoch 0:  46%|▍| 350/766 [01:27<01:43,  4.01it/s, v_num=a0al, train_loss_step=16
Multinomial: 22.313129425048828, Poisson: -0.10756148397922516

Epoch 0:  46%|▍| 351/766 [01:27<01:43,  4.02it/s, v_num=a0al, train_loss_step=16
Epoch 0:  46%|▍| 351/766 [01:27<01:43,  4.01it/s, v_num=a0al, train_loss_step=22
Multinomial: 19.432830810546875, Poisson: -0.09281440079212189
Epoch 0:  46%|▍| 352/766 [01:27<01:42,  4.02it/s, v_num=a0al, train_loss_step=22
Epoch 0:  46%|▍| 352/766 [01:27<01:43,  4.01it/s, v_num=a0al, train_loss_step=19
Multinomial: 20.061098098754883, Poisson: -0.0959169864654541

Epoch 0:  46%|▍| 353/766 [01:27<01:42,  4.02it/s, v_num=a0al, train_loss_step=19
Epoch 0:  46%|▍| 353/766 [01:27<01:42,  4.01it/s, v_num=a0al, train_loss_step=20
Multinomial: 22.313295364379883, Poisson: -0.10738380998373032

Epoch 0:  46%|▍| 354/766 [01:28<01:42,  4.02it/s, v_num=a0al, train_loss_step=20
Epoch 0:  46%|▍| 354/766 [01:28<01:42,  4.01it/s, v_num=a0al, train_loss_step=22
Multinomial: 22.32628631591797, Poisson: -0.10728715360164642
Epoch 0:  46%|▍| 355/766 [01:28<01:42,  4.02it/s, v_num=a0al, train_loss_step=22
Epoch 0:  46%|▍| 355/766 [01:28<01:42,  4.01it/s, v_num=a0al, train_loss_step=22
Multinomial: 20.63421630859375, Poisson: -0.09855731576681137
Epoch 0:  46%|▍| 356/766 [01:28<01:41,  4.02it/s, v_num=a0al, train_loss_step=22
Epoch 0:  46%|▍| 356/766 [01:28<01:42,  4.02it/s, v_num=a0al, train_loss_step=20
Multinomial: 20.052242279052734, Poisson: -0.09578649699687958
Epoch 0:  47%|▍| 357/766 [01:28<01:41,  4.02it/s, v_num=a0al, train_loss_step=20
Epoch 0:  47%|▍| 357/766 [01:28<01:41,  4.02it/s, v_num=a0al, train_loss_step=20
Multinomial: 21.189252853393555, Poisson: -0.10169314593076706

Epoch 0:  47%|▍| 358/766 [01:29<01:41,  4.02it/s, v_num=a0al, train_loss_step=20
Epoch 0:  47%|▍| 358/766 [01:29<01:41,  4.02it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.797149658203125, Poisson: -0.10486872494220734

Epoch 0:  47%|▍| 359/766 [01:29<01:41,  4.02it/s, v_num=a0al, train_loss_step=21
Epoch 0:  47%|▍| 359/766 [01:29<01:41,  4.02it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.569082260131836, Poisson: -0.09853038191795349
Epoch 0:  47%|▍| 360/766 [01:29<01:41,  4.02it/s, v_num=a0al, train_loss_step=21
Epoch 0:  47%|▍| 360/766 [01:29<01:41,  4.02it/s, v_num=a0al, train_loss_step=20
Multinomial: 19.498085021972656, Poisson: -0.0929093137383461
Epoch 0:  47%|▍| 361/766 [01:29<01:40,  4.02it/s, v_num=a0al, train_loss_step=20
Epoch 0:  47%|▍| 361/766 [01:29<01:40,  4.02it/s, v_num=a0al, train_loss_step=19
Multinomial: 18.900672912597656, Poisson: -0.0900539830327034
Epoch 0:  47%|▍| 362/766 [01:29<01:40,  4.02it/s, v_num=a0al, train_loss_step=19
Epoch 0:  47%|▍| 362/766 [01:30<01:40,  4.02it/s, v_num=a0al, train_loss_step=18
Multinomial: 20.629474639892578, Poisson: -0.09880076348781586

Epoch 0:  47%|▍| 363/766 [01:30<01:40,  4.02it/s, v_num=a0al, train_loss_step=18
Epoch 0:  47%|▍| 363/766 [01:30<01:40,  4.02it/s, v_num=a0al, train_loss_step=20
Multinomial: 20.05840492248535, Poisson: -0.09586732089519501

Epoch 0:  48%|▍| 364/766 [01:30<01:39,  4.02it/s, v_num=a0al, train_loss_step=20
Epoch 0:  48%|▍| 364/766 [01:30<01:40,  4.02it/s, v_num=a0al, train_loss_step=20
Multinomial: 22.877946853637695, Poisson: -0.11037742346525192
Epoch 0:  48%|▍| 365/766 [01:30<01:39,  4.02it/s, v_num=a0al, train_loss_step=20
Epoch 0:  48%|▍| 365/766 [01:30<01:39,  4.02it/s, v_num=a0al, train_loss_step=22
Multinomial: 19.996482849121094, Poisson: -0.0957464724779129
Epoch 0:  48%|▍| 366/766 [01:30<01:39,  4.02it/s, v_num=a0al, train_loss_step=22
Epoch 0:  48%|▍| 366/766 [01:31<01:39,  4.02it/s, v_num=a0al, train_loss_step=19
Multinomial: 24.613861083984375, Poisson: -0.11876354366540909
Epoch 0:  48%|▍| 367/766 [01:31<01:39,  4.02it/s, v_num=a0al, train_loss_step=19
Epoch 0:  48%|▍| 367/766 [01:31<01:39,  4.02it/s, v_num=a0al, train_loss_step=24
Multinomial: 22.33321189880371, Poisson: -0.10739743709564209

Epoch 0:  48%|▍| 368/766 [01:31<01:38,  4.03it/s, v_num=a0al, train_loss_step=24
Epoch 0:  48%|▍| 368/766 [01:31<01:39,  4.02it/s, v_num=a0al, train_loss_step=22
Multinomial: 22.935331344604492, Poisson: -0.11003967374563217

Epoch 0:  48%|▍| 369/766 [01:31<01:38,  4.03it/s, v_num=a0al, train_loss_step=22
Epoch 0:  48%|▍| 369/766 [01:31<01:38,  4.02it/s, v_num=a0al, train_loss_step=22
Multinomial: 20.00196075439453, Poisson: -0.09576454013586044
Epoch 0:  48%|▍| 370/766 [01:32<01:38,  4.02it/s, v_num=a0al, train_loss_step=22
Epoch 0:  48%|▍| 370/766 [01:32<01:38,  4.02it/s, v_num=a0al, train_loss_step=19
Multinomial: 21.221132278442383, Poisson: -0.10173946619033813
Epoch 0:  48%|▍| 371/766 [01:32<01:38,  4.03it/s, v_num=a0al, train_loss_step=19
Epoch 0:  48%|▍| 371/766 [01:32<01:38,  4.02it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.060192108154297, Poisson: -0.09579496085643768

Epoch 0:  49%|▍| 372/766 [01:32<01:37,  4.03it/s, v_num=a0al, train_loss_step=21
Epoch 0:  49%|▍| 372/766 [01:32<01:37,  4.02it/s, v_num=a0al, train_loss_step=20
Multinomial: 20.589139938354492, Poisson: -0.09876739978790283

Epoch 0:  49%|▍| 373/766 [01:32<01:37,  4.03it/s, v_num=a0al, train_loss_step=20
Epoch 0:  49%|▍| 373/766 [01:32<01:37,  4.02it/s, v_num=a0al, train_loss_step=20
Multinomial: 21.740642547607422, Poisson: -0.10431542992591858

Epoch 0:  49%|▍| 374/766 [01:32<01:37,  4.03it/s, v_num=a0al, train_loss_step=20
Epoch 0:  49%|▍| 374/766 [01:33<01:37,  4.02it/s, v_num=a0al, train_loss_step=21
Multinomial: 18.341379165649414, Poisson: -0.08717776834964752
Epoch 0:  49%|▍| 375/766 [01:33<01:37,  4.02it/s, v_num=a0al, train_loss_step=21
Epoch 0:  49%|▍| 375/766 [01:33<01:37,  4.02it/s, v_num=a0al, train_loss_step=18
Multinomial: 21.19436264038086, Poisson: -0.10196632891893387
Epoch 0:  49%|▍| 376/766 [01:33<01:36,  4.03it/s, v_num=a0al, train_loss_step=18
Epoch 0:  49%|▍| 376/766 [01:33<01:36,  4.02it/s, v_num=a0al, train_loss_step=21
Multinomial: 23.506832122802734, Poisson: -0.11307775229215622
Epoch 0:  49%|▍| 377/766 [01:33<01:36,  4.03it/s, v_num=a0al, train_loss_step=21
Epoch 0:  49%|▍| 377/766 [01:33<01:36,  4.02it/s, v_num=a0al, train_loss_step=23
Multinomial: 20.549787521362305, Poisson: -0.09868539124727249

Epoch 0:  49%|▍| 378/766 [01:33<01:36,  4.03it/s, v_num=a0al, train_loss_step=23
Epoch 0:  49%|▍| 378/766 [01:33<01:36,  4.02it/s, v_num=a0al, train_loss_step=20
Multinomial: 19.48164176940918, Poisson: -0.09314829111099243

Epoch 0:  49%|▍| 379/766 [01:34<01:36,  4.03it/s, v_num=a0al, train_loss_step=20
Epoch 0:  49%|▍| 379/766 [01:34<01:36,  4.02it/s, v_num=a0al, train_loss_step=19
Multinomial: 21.741195678710938, Poisson: -0.1045418530702591
Epoch 0:  50%|▍| 380/766 [01:34<01:35,  4.02it/s, v_num=a0al, train_loss_step=19
Epoch 0:  50%|▍| 380/766 [01:34<01:35,  4.02it/s, v_num=a0al, train_loss_step=21
Multinomial: 19.425317764282227, Poisson: -0.09279096126556396
Epoch 0:  50%|▍| 381/766 [01:34<01:35,  4.03it/s, v_num=a0al, train_loss_step=21
Epoch 0:  50%|▍| 381/766 [01:34<01:35,  4.02it/s, v_num=a0al, train_loss_step=19
Multinomial: 22.347368240356445, Poisson: -0.10702072829008102
Epoch 0:  50%|▍| 382/766 [01:34<01:35,  4.03it/s, v_num=a0al, train_loss_step=19
Epoch 0:  50%|▍| 382/766 [01:34<01:35,  4.02it/s, v_num=a0al, train_loss_step=22
Multinomial: 23.530473709106445, Poisson: -0.11321685463190079

Epoch 0:  50%|▌| 383/766 [01:35<01:35,  4.03it/s, v_num=a0al, train_loss_step=22
Epoch 0:  50%|▌| 383/766 [01:35<01:35,  4.02it/s, v_num=a0al, train_loss_step=23
Multinomial: 19.423980712890625, Poisson: -0.0931314155459404

Epoch 0:  50%|▌| 384/766 [01:35<01:34,  4.03it/s, v_num=a0al, train_loss_step=23
Epoch 0:  50%|▌| 384/766 [01:35<01:34,  4.02it/s, v_num=a0al, train_loss_step=19
Multinomial: 22.317577362060547, Poisson: -0.10730933398008347
Epoch 0:  50%|▌| 385/766 [01:35<01:34,  4.02it/s, v_num=a0al, train_loss_step=19
Epoch 0:  50%|▌| 385/766 [01:35<01:34,  4.02it/s, v_num=a0al, train_loss_step=22
Multinomial: 16.54821014404297, Poisson: -0.07840119302272797
Epoch 0:  50%|▌| 386/766 [01:35<01:34,  4.03it/s, v_num=a0al, train_loss_step=22
Epoch 0:  50%|▌| 386/766 [01:35<01:34,  4.02it/s, v_num=a0al, train_loss_step=16
Multinomial: 19.455156326293945, Poisson: -0.09276847541332245
Epoch 0:  51%|▌| 387/766 [01:36<01:34,  4.03it/s, v_num=a0al, train_loss_step=16
Epoch 0:  51%|▌| 387/766 [01:36<01:34,  4.02it/s, v_num=a0al, train_loss_step=19
Multinomial: 23.45780372619629, Poisson: -0.1133873239159584

Epoch 0:  51%|▌| 388/766 [01:36<01:33,  4.03it/s, v_num=a0al, train_loss_step=19
Epoch 0:  51%|▌| 388/766 [01:36<01:33,  4.02it/s, v_num=a0al, train_loss_step=23
Multinomial: 17.704776763916016, Poisson: -0.08420784771442413

Epoch 0:  51%|▌| 389/766 [01:36<01:33,  4.03it/s, v_num=a0al, train_loss_step=23
Epoch 0:  51%|▌| 389/766 [01:36<01:33,  4.03it/s, v_num=a0al, train_loss_step=17
Multinomial: 21.756298065185547, Poisson: -0.10439086705446243
Epoch 0:  51%|▌| 390/766 [01:36<01:33,  4.03it/s, v_num=a0al, train_loss_step=17
Epoch 0:  51%|▌| 390/766 [01:36<01:33,  4.03it/s, v_num=a0al, train_loss_step=21
Multinomial: 14.827723503112793, Poisson: -0.06981196999549866
Epoch 0:  51%|▌| 391/766 [01:36<01:33,  4.03it/s, v_num=a0al, train_loss_step=21
Epoch 0:  51%|▌| 391/766 [01:37<01:33,  4.03it/s, v_num=a0al, train_loss_step=14
Multinomial: 21.138330459594727, Poisson: -0.10137390345335007
Epoch 0:  51%|▌| 392/766 [01:37<01:32,  4.03it/s, v_num=a0al, train_loss_step=14
Epoch 0:  51%|▌| 392/766 [01:37<01:32,  4.03it/s, v_num=a0al, train_loss_step=21
Multinomial: 22.355375289916992, Poisson: -0.1073036715388298

Epoch 0:  51%|▌| 393/766 [01:37<01:32,  4.03it/s, v_num=a0al, train_loss_step=21
Epoch 0:  51%|▌| 393/766 [01:37<01:32,  4.03it/s, v_num=a0al, train_loss_step=22
Multinomial: 20.599882125854492, Poisson: -0.09854143857955933

Epoch 0:  51%|▌| 394/766 [01:37<01:32,  4.03it/s, v_num=a0al, train_loss_step=22
Epoch 0:  51%|▌| 394/766 [01:37<01:32,  4.03it/s, v_num=a0al, train_loss_step=20
Multinomial: 21.186933517456055, Poisson: -0.10148259997367859
Epoch 0:  52%|▌| 395/766 [01:38<01:32,  4.03it/s, v_num=a0al, train_loss_step=20
Epoch 0:  52%|▌| 395/766 [01:38<01:32,  4.03it/s, v_num=a0al, train_loss_step=21
Multinomial: 18.297380447387695, Poisson: -0.08710375428199768
Epoch 0:  52%|▌| 396/766 [01:38<01:31,  4.03it/s, v_num=a0al, train_loss_step=21
Epoch 0:  52%|▌| 396/766 [01:38<01:31,  4.03it/s, v_num=a0al, train_loss_step=18
Multinomial: 22.292890548706055, Poisson: -0.10710746794939041
Epoch 0:  52%|▌| 397/766 [01:38<01:31,  4.03it/s, v_num=a0al, train_loss_step=18
Epoch 0:  52%|▌| 397/766 [01:38<01:31,  4.03it/s, v_num=a0al, train_loss_step=22
Multinomial: 18.311077117919922, Poisson: -0.08705893158912659

Epoch 0:  52%|▌| 398/766 [01:38<01:31,  4.03it/s, v_num=a0al, train_loss_step=22
Epoch 0:  52%|▌| 398/766 [01:38<01:31,  4.03it/s, v_num=a0al, train_loss_step=18
Multinomial: 19.50176429748535, Poisson: -0.09276818484067917

Epoch 0:  52%|▌| 399/766 [01:38<01:30,  4.03it/s, v_num=a0al, train_loss_step=18
Epoch 0:  52%|▌| 399/766 [01:39<01:31,  4.03it/s, v_num=a0al, train_loss_step=19
Multinomial: 18.905942916870117, Poisson: -0.08994127064943314
Epoch 0:  52%|▌| 400/766 [01:39<01:30,  4.03it/s, v_num=a0al, train_loss_step=19
Epoch 0:  52%|▌| 400/766 [01:39<01:30,  4.03it/s, v_num=a0al, train_loss_step=18
Multinomial: 25.189510345458984, Poisson: -0.12159111350774765
Epoch 0:  52%|▌| 401/766 [01:39<01:30,  4.03it/s, v_num=a0al, train_loss_step=18
Epoch 0:  52%|▌| 401/766 [01:39<01:30,  4.03it/s, v_num=a0al, train_loss_step=25
Multinomial: 22.315311431884766, Poisson: -0.10756457597017288

Epoch 0:  52%|▌| 402/766 [01:39<01:30,  4.03it/s, v_num=a0al, train_loss_step=25
Epoch 0:  52%|▌| 402/766 [01:39<01:30,  4.03it/s, v_num=a0al, train_loss_step=22
Multinomial: 19.505334854125977, Poisson: -0.09276691824197769

Epoch 0:  53%|▌| 403/766 [01:39<01:29,  4.03it/s, v_num=a0al, train_loss_step=22
Epoch 0:  53%|▌| 403/766 [01:40<01:30,  4.03it/s, v_num=a0al, train_loss_step=19
Multinomial: 21.212017059326172, Poisson: -0.10146214812994003

Epoch 0:  53%|▌| 404/766 [01:40<01:29,  4.03it/s, v_num=a0al, train_loss_step=19
Epoch 0:  53%|▌| 404/766 [01:40<01:29,  4.03it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.59143829345703, Poisson: -0.09869526326656342
Epoch 0:  53%|▌| 405/766 [01:40<01:29,  4.03it/s, v_num=a0al, train_loss_step=21
Epoch 0:  53%|▌| 405/766 [01:40<01:29,  4.03it/s, v_num=a0al, train_loss_step=20
Multinomial: 20.556276321411133, Poisson: -0.09864883124828339
Epoch 0:  53%|▌| 406/766 [01:40<01:29,  4.04it/s, v_num=a0al, train_loss_step=20
Epoch 0:  53%|▌| 406/766 [01:40<01:29,  4.03it/s, v_num=a0al, train_loss_step=20
Multinomial: 21.144487380981445, Poisson: -0.10163812339305878

Epoch 0:  53%|▌| 407/766 [01:40<01:28,  4.04it/s, v_num=a0al, train_loss_step=20
Epoch 0:  53%|▌| 407/766 [01:40<01:29,  4.03it/s, v_num=a0al, train_loss_step=21
Multinomial: 22.869873046875, Poisson: -0.11037392169237137

Epoch 0:  53%|▌| 408/766 [01:41<01:28,  4.04it/s, v_num=a0al, train_loss_step=21
Epoch 0:  53%|▌| 408/766 [01:41<01:28,  4.03it/s, v_num=a0al, train_loss_step=22
Multinomial: 18.9237060546875, Poisson: -0.08994999527931213
Epoch 0:  53%|▌| 409/766 [01:41<01:28,  4.04it/s, v_num=a0al, train_loss_step=22
Epoch 0:  53%|▌| 409/766 [01:41<01:28,  4.03it/s, v_num=a0al, train_loss_step=18
Multinomial: 20.603927612304688, Poisson: -0.09861285984516144
Epoch 0:  54%|▌| 410/766 [01:41<01:28,  4.03it/s, v_num=a0al, train_loss_step=18
Epoch 0:  54%|▌| 410/766 [01:41<01:28,  4.03it/s, v_num=a0al, train_loss_step=20
Multinomial: 19.39595603942871, Poisson: -0.0930929183959961

Epoch 0:  54%|▌| 411/766 [01:41<01:27,  4.04it/s, v_num=a0al, train_loss_step=20
Epoch 0:  54%|▌| 411/766 [01:41<01:28,  4.03it/s, v_num=a0al, train_loss_step=19
Multinomial: 18.3190860748291, Poisson: -0.08724711090326309

Epoch 0:  54%|▌| 412/766 [01:42<01:27,  4.04it/s, v_num=a0al, train_loss_step=19
Epoch 0:  54%|▌| 412/766 [01:42<01:27,  4.03it/s, v_num=a0al, train_loss_step=18
Multinomial: 20.606151580810547, Poisson: -0.09849604219198227

Epoch 0:  54%|▌| 413/766 [01:42<01:27,  4.04it/s, v_num=a0al, train_loss_step=18
Epoch 0:  54%|▌| 413/766 [01:42<01:27,  4.03it/s, v_num=a0al, train_loss_step=20
Multinomial: 19.42496109008789, Poisson: -0.09279781579971313
Epoch 0:  54%|▌| 414/766 [01:42<01:27,  4.04it/s, v_num=a0al, train_loss_step=20
Epoch 0:  54%|▌| 414/766 [01:42<01:27,  4.03it/s, v_num=a0al, train_loss_step=19
Multinomial: 22.367816925048828, Poisson: -0.1075163185596466
Epoch 0:  54%|▌| 415/766 [01:42<01:27,  4.03it/s, v_num=a0al, train_loss_step=19
Epoch 0:  54%|▌| 415/766 [01:42<01:27,  4.03it/s, v_num=a0al, train_loss_step=22
Multinomial: 21.783443450927734, Poisson: -0.10439547151327133

Epoch 0:  54%|▌| 416/766 [01:43<01:26,  4.04it/s, v_num=a0al, train_loss_step=22
Epoch 0:  54%|▌| 416/766 [01:43<01:26,  4.03it/s, v_num=a0al, train_loss_step=21
Multinomial: 22.364540100097656, Poisson: -0.1071932464838028

Epoch 0:  54%|▌| 417/766 [01:43<01:26,  4.04it/s, v_num=a0al, train_loss_step=21
Epoch 0:  54%|▌| 417/766 [01:43<01:26,  4.03it/s, v_num=a0al, train_loss_step=22
Multinomial: 18.86842155456543, Poisson: -0.09013441950082779
Epoch 0:  55%|▌| 418/766 [01:43<01:26,  4.04it/s, v_num=a0al, train_loss_step=22
Epoch 0:  55%|▌| 418/766 [01:43<01:26,  4.03it/s, v_num=a0al, train_loss_step=18
Multinomial: 24.042606353759766, Poisson: -0.11599228531122208
Epoch 0:  55%|▌| 419/766 [01:43<01:25,  4.04it/s, v_num=a0al, train_loss_step=18
Epoch 0:  55%|▌| 419/766 [01:43<01:26,  4.03it/s, v_num=a0al, train_loss_step=23
Multinomial: 17.2088680267334, Poisson: -0.08152053505182266
Epoch 0:  55%|▌| 420/766 [01:44<01:25,  4.03it/s, v_num=a0al, train_loss_step=23
Epoch 0:  55%|▌| 420/766 [01:44<01:25,  4.03it/s, v_num=a0al, train_loss_step=17
Multinomial: 20.612220764160156, Poisson: -0.09851660579442978

Epoch 0:  55%|▌| 421/766 [01:44<01:25,  4.04it/s, v_num=a0al, train_loss_step=17
Epoch 0:  55%|▌| 421/766 [01:44<01:25,  4.03it/s, v_num=a0al, train_loss_step=20
Multinomial: 22.886459350585938, Poisson: -0.11061673611402512

Epoch 0:  55%|▌| 422/766 [01:44<01:25,  4.04it/s, v_num=a0al, train_loss_step=20
Epoch 0:  55%|▌| 422/766 [01:44<01:25,  4.03it/s, v_num=a0al, train_loss_step=22
Multinomial: 21.761281967163086, Poisson: -0.10456900298595428
Epoch 0:  55%|▌| 423/766 [01:44<01:24,  4.04it/s, v_num=a0al, train_loss_step=22
Epoch 0:  55%|▌| 423/766 [01:44<01:25,  4.03it/s, v_num=a0al, train_loss_step=21
Multinomial: 22.932226181030273, Poisson: -0.11030212044715881

Epoch 0:  55%|▌| 424/766 [01:44<01:24,  4.04it/s, v_num=a0al, train_loss_step=21
Epoch 0:  55%|▌| 424/766 [01:45<01:24,  4.03it/s, v_num=a0al, train_loss_step=22
Multinomial: 18.940080642700195, Poisson: -0.09015578031539917
Epoch 0:  55%|▌| 425/766 [01:45<01:24,  4.04it/s, v_num=a0al, train_loss_step=22
Epoch 0:  55%|▌| 425/766 [01:45<01:24,  4.03it/s, v_num=a0al, train_loss_step=18
Multinomial: 20.045801162719727, Poisson: -0.09577429294586182

Epoch 0:  56%|▌| 426/766 [01:45<01:24,  4.04it/s, v_num=a0al, train_loss_step=18
Epoch 0:  56%|▌| 426/766 [01:45<01:24,  4.04it/s, v_num=a0al, train_loss_step=20
Multinomial: 18.947856903076172, Poisson: -0.09007912874221802

Epoch 0:  56%|▌| 427/766 [01:45<01:23,  4.04it/s, v_num=a0al, train_loss_step=20
Epoch 0:  56%|▌| 427/766 [01:45<01:24,  4.04it/s, v_num=a0al, train_loss_step=18
Multinomial: 21.19378662109375, Poisson: -0.10146069526672363
Epoch 0:  56%|▌| 428/766 [01:45<01:23,  4.04it/s, v_num=a0al, train_loss_step=18
Epoch 0:  56%|▌| 428/766 [01:46<01:23,  4.04it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.0323429107666, Poisson: -0.09605273604393005

Epoch 0:  56%|▌| 429/766 [01:46<01:23,  4.04it/s, v_num=a0al, train_loss_step=21
Epoch 0:  56%|▌| 429/766 [01:46<01:23,  4.04it/s, v_num=a0al, train_loss_step=19
Multinomial: 21.192474365234375, Poisson: -0.10159146785736084
Epoch 0:  56%|▌| 430/766 [01:46<01:23,  4.04it/s, v_num=a0al, train_loss_step=19
Epoch 0:  56%|▌| 430/766 [01:46<01:23,  4.04it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.593473434448242, Poisson: -0.09863156080245972

Epoch 0:  56%|▌| 431/766 [01:46<01:22,  4.04it/s, v_num=a0al, train_loss_step=21
Epoch 0:  56%|▌| 431/766 [01:46<01:22,  4.04it/s, v_num=a0al, train_loss_step=20
Multinomial: 23.461124420166016, Poisson: -0.11302480101585388
Epoch 0:  56%|▌| 432/766 [01:46<01:22,  4.04it/s, v_num=a0al, train_loss_step=20
Epoch 0:  56%|▌| 432/766 [01:47<01:22,  4.04it/s, v_num=a0al, train_loss_step=23
Multinomial: 18.86369514465332, Poisson: -0.08997032791376114

Epoch 0:  57%|▌| 433/766 [01:47<01:22,  4.04it/s, v_num=a0al, train_loss_step=23
Epoch 0:  57%|▌| 433/766 [01:47<01:22,  4.04it/s, v_num=a0al, train_loss_step=18
Multinomial: 23.45587158203125, Poisson: -0.11291443556547165

Epoch 0:  57%|▌| 434/766 [01:47<01:22,  4.04it/s, v_num=a0al, train_loss_step=18
Epoch 0:  57%|▌| 434/766 [01:47<01:22,  4.04it/s, v_num=a0al, train_loss_step=23
Multinomial: 21.766098022460938, Poisson: -0.10475729405879974
Epoch 0:  57%|▌| 435/766 [01:47<01:21,  4.04it/s, v_num=a0al, train_loss_step=23
Epoch 0:  57%|▌| 435/766 [01:47<01:21,  4.04it/s, v_num=a0al, train_loss_step=21
Multinomial: 22.309078216552734, Poisson: -0.10760536789894104

Epoch 0:  57%|▌| 436/766 [01:47<01:21,  4.04it/s, v_num=a0al, train_loss_step=21
Epoch 0:  57%|▌| 436/766 [01:47<01:21,  4.04it/s, v_num=a0al, train_loss_step=22
Multinomial: 22.89189338684082, Poisson: -0.11007551848888397
Epoch 0:  57%|▌| 437/766 [01:48<01:21,  4.04it/s, v_num=a0al, train_loss_step=22
Epoch 0:  57%|▌| 437/766 [01:48<01:21,  4.04it/s, v_num=a0al, train_loss_step=22
Multinomial: 23.434974670410156, Poisson: -0.11298343539237976

Epoch 0:  57%|▌| 438/766 [01:48<01:21,  4.04it/s, v_num=a0al, train_loss_step=22
Epoch 0:  57%|▌| 438/766 [01:48<01:21,  4.04it/s, v_num=a0al, train_loss_step=23
Multinomial: 22.913671493530273, Poisson: -0.1100585088133812

Epoch 0:  57%|▌| 439/766 [01:48<01:20,  4.04it/s, v_num=a0al, train_loss_step=23
Epoch 0:  57%|▌| 439/766 [01:48<01:20,  4.04it/s, v_num=a0al, train_loss_step=22
Multinomial: 21.782474517822266, Poisson: -0.10438213497400284
Epoch 0:  57%|▌| 440/766 [01:48<01:20,  4.04it/s, v_num=a0al, train_loss_step=22
Epoch 0:  57%|▌| 440/766 [01:48<01:20,  4.04it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.02216339111328, Poisson: -0.09585492312908173
Epoch 0:  58%|▌| 441/766 [01:49<01:20,  4.04it/s, v_num=a0al, train_loss_step=21
Epoch 0:  58%|▌| 441/766 [01:49<01:20,  4.04it/s, v_num=a0al, train_loss_step=19
Multinomial: 18.8275203704834, Poisson: -0.08999547362327576

Epoch 0:  58%|▌| 442/766 [01:49<01:20,  4.04it/s, v_num=a0al, train_loss_step=19
Epoch 0:  58%|▌| 442/766 [01:49<01:20,  4.04it/s, v_num=a0al, train_loss_step=18
Multinomial: 20.660303115844727, Poisson: -0.09890352189540863

Epoch 0:  58%|▌| 443/766 [01:49<01:19,  4.04it/s, v_num=a0al, train_loss_step=18
Epoch 0:  58%|▌| 443/766 [01:49<01:19,  4.04it/s, v_num=a0al, train_loss_step=20
Multinomial: 20.554729461669922, Poisson: -0.09874079376459122

Epoch 0:  58%|▌| 444/766 [01:49<01:19,  4.05it/s, v_num=a0al, train_loss_step=20
Epoch 0:  58%|▌| 444/766 [01:49<01:19,  4.04it/s, v_num=a0al, train_loss_step=20
Multinomial: 21.20897674560547, Poisson: -0.10159632563591003
Epoch 0:  58%|▌| 445/766 [01:50<01:19,  4.04it/s, v_num=a0al, train_loss_step=20
Epoch 0:  58%|▌| 445/766 [01:50<01:19,  4.04it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.04391860961914, Poisson: -0.09576905518770218
Epoch 0:  58%|▌| 446/766 [01:50<01:19,  4.05it/s, v_num=a0al, train_loss_step=21
Epoch 0:  58%|▌| 446/766 [01:50<01:19,  4.04it/s, v_num=a0al, train_loss_step=19
Multinomial: 19.425464630126953, Poisson: -0.09294290095567703

Epoch 0:  58%|▌| 447/766 [01:50<01:18,  4.05it/s, v_num=a0al, train_loss_step=19
Epoch 0:  58%|▌| 447/766 [01:50<01:18,  4.04it/s, v_num=a0al, train_loss_step=19
Multinomial: 21.152692794799805, Poisson: -0.10169364511966705

Epoch 0:  58%|▌| 448/766 [01:50<01:18,  4.05it/s, v_num=a0al, train_loss_step=19
Epoch 0:  58%|▌| 448/766 [01:50<01:18,  4.04it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.199073791503906, Poisson: -0.10210412740707397
Epoch 0:  59%|▌| 449/766 [01:50<01:18,  4.05it/s, v_num=a0al, train_loss_step=21
Epoch 0:  59%|▌| 449/766 [01:51<01:18,  4.04it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.188983917236328, Poisson: -0.10162121802568436
Epoch 0:  59%|▌| 450/766 [01:51<01:18,  4.04it/s, v_num=a0al, train_loss_step=21
Epoch 0:  59%|▌| 450/766 [01:51<01:18,  4.04it/s, v_num=a0al, train_loss_step=21
Multinomial: 22.88677215576172, Poisson: -0.10998804867267609
Epoch 0:  59%|▌| 451/766 [01:51<01:17,  4.05it/s, v_num=a0al, train_loss_step=21
Epoch 0:  59%|▌| 451/766 [01:51<01:17,  4.04it/s, v_num=a0al, train_loss_step=22
Multinomial: 21.793415069580078, Poisson: -0.10432875901460648

Epoch 0:  59%|▌| 452/766 [01:51<01:17,  4.05it/s, v_num=a0al, train_loss_step=22
Epoch 0:  59%|▌| 452/766 [01:51<01:17,  4.04it/s, v_num=a0al, train_loss_step=21
Multinomial: 22.90279769897461, Poisson: -0.11046546697616577

Epoch 0:  59%|▌| 453/766 [01:51<01:17,  4.05it/s, v_num=a0al, train_loss_step=21
Epoch 0:  59%|▌| 453/766 [01:52<01:17,  4.04it/s, v_num=a0al, train_loss_step=22
Multinomial: 22.31927490234375, Poisson: -0.10741424560546875
Epoch 0:  59%|▌| 454/766 [01:52<01:17,  4.05it/s, v_num=a0al, train_loss_step=22
Epoch 0:  59%|▌| 454/766 [01:52<01:17,  4.04it/s, v_num=a0al, train_loss_step=22
Multinomial: 21.186891555786133, Poisson: -0.10152309387922287
Epoch 0:  59%|▌| 455/766 [01:52<01:16,  4.04it/s, v_num=a0al, train_loss_step=22
Epoch 0:  59%|▌| 455/766 [01:52<01:16,  4.04it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.055156707763672, Poisson: -0.0958203449845314

Epoch 0:  60%|▌| 456/766 [01:52<01:16,  4.05it/s, v_num=a0al, train_loss_step=21
Epoch 0:  60%|▌| 456/766 [01:52<01:16,  4.04it/s, v_num=a0al, train_loss_step=20
Multinomial: 21.19700813293457, Poisson: -0.1016567125916481

Epoch 0:  60%|▌| 457/766 [01:52<01:16,  4.05it/s, v_num=a0al, train_loss_step=20
Epoch 0:  60%|▌| 457/766 [01:53<01:16,  4.04it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.08208656311035, Poisson: -0.09579449892044067

Epoch 0:  60%|▌| 458/766 [01:53<01:16,  4.05it/s, v_num=a0al, train_loss_step=21
Epoch 0:  60%|▌| 458/766 [01:53<01:16,  4.04it/s, v_num=a0al, train_loss_step=20
Multinomial: 19.43222427368164, Poisson: -0.09304789453744888
Epoch 0:  60%|▌| 459/766 [01:53<01:15,  4.05it/s, v_num=a0al, train_loss_step=20
Epoch 0:  60%|▌| 459/766 [01:53<01:15,  4.04it/s, v_num=a0al, train_loss_step=19
Multinomial: 21.26169204711914, Poisson: -0.10192114859819412
Epoch 0:  60%|▌| 460/766 [01:53<01:15,  4.04it/s, v_num=a0al, train_loss_step=19
Epoch 0:  60%|▌| 460/766 [01:53<01:15,  4.04it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.59422492980957, Poisson: -0.09882223606109619

Epoch 0:  60%|▌| 461/766 [01:53<01:15,  4.05it/s, v_num=a0al, train_loss_step=21
Epoch 0:  60%|▌| 461/766 [01:53<01:15,  4.04it/s, v_num=a0al, train_loss_step=20
Multinomial: 19.98931312561035, Poisson: -0.09591254591941833

Epoch 0:  60%|▌| 462/766 [01:54<01:15,  4.05it/s, v_num=a0al, train_loss_step=20
Epoch 0:  60%|▌| 462/766 [01:54<01:15,  4.04it/s, v_num=a0al, train_loss_step=19
Multinomial: 20.62569236755371, Poisson: -0.09887341409921646

Epoch 0:  60%|▌| 463/766 [01:54<01:14,  4.05it/s, v_num=a0al, train_loss_step=19
Epoch 0:  60%|▌| 463/766 [01:54<01:14,  4.04it/s, v_num=a0al, train_loss_step=20
Multinomial: 19.513959884643555, Poisson: -0.0929412916302681
Epoch 0:  61%|▌| 464/766 [01:54<01:14,  4.05it/s, v_num=a0al, train_loss_step=20
Epoch 0:  61%|▌| 464/766 [01:54<01:14,  4.04it/s, v_num=a0al, train_loss_step=19
Multinomial: 20.668134689331055, Poisson: -0.09903602302074432
Epoch 0:  61%|▌| 465/766 [01:54<01:14,  4.04it/s, v_num=a0al, train_loss_step=19
Epoch 0:  61%|▌| 465/766 [01:54<01:14,  4.04it/s, v_num=a0al, train_loss_step=20
Multinomial: 24.615325927734375, Poisson: -0.11899266391992569

Epoch 0:  61%|▌| 466/766 [01:55<01:14,  4.05it/s, v_num=a0al, train_loss_step=20
Epoch 0:  61%|▌| 466/766 [01:55<01:14,  4.04it/s, v_num=a0al, train_loss_step=24
Multinomial: 18.833663940429688, Poisson: -0.09003962576389313

Epoch 0:  61%|▌| 467/766 [01:55<01:13,  4.05it/s, v_num=a0al, train_loss_step=24
Epoch 0:  61%|▌| 467/766 [01:55<01:13,  4.05it/s, v_num=a0al, train_loss_step=18
Multinomial: 22.29035758972168, Poisson: -0.10729990154504776

Epoch 0:  61%|▌| 468/766 [01:55<01:13,  4.05it/s, v_num=a0al, train_loss_step=18
Epoch 0:  61%|▌| 468/766 [01:55<01:13,  4.05it/s, v_num=a0al, train_loss_step=22
Multinomial: 18.32347869873047, Poisson: -0.08711374551057816
Epoch 0:  61%|▌| 469/766 [01:55<01:13,  4.05it/s, v_num=a0al, train_loss_step=22
Epoch 0:  61%|▌| 469/766 [01:55<01:13,  4.05it/s, v_num=a0al, train_loss_step=18
Multinomial: 18.881690979003906, Poisson: -0.0901159793138504
Epoch 0:  61%|▌| 470/766 [01:56<01:13,  4.05it/s, v_num=a0al, train_loss_step=18
Epoch 0:  61%|▌| 470/766 [01:56<01:13,  4.05it/s, v_num=a0al, train_loss_step=18
Multinomial: 20.574710845947266, Poisson: -0.09879805892705917

Epoch 0:  61%|▌| 471/766 [01:56<01:12,  4.05it/s, v_num=a0al, train_loss_step=18
Epoch 0:  61%|▌| 471/766 [01:56<01:12,  4.05it/s, v_num=a0al, train_loss_step=20
Multinomial: 21.176883697509766, Poisson: -0.1018044650554657

Epoch 0:  62%|▌| 472/766 [01:56<01:12,  4.05it/s, v_num=a0al, train_loss_step=20
Epoch 0:  62%|▌| 472/766 [01:56<01:12,  4.05it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.632938385009766, Poisson: -0.09889015555381775

Epoch 0:  62%|▌| 473/766 [01:56<01:12,  4.05it/s, v_num=a0al, train_loss_step=21
Epoch 0:  62%|▌| 473/766 [01:56<01:12,  4.05it/s, v_num=a0al, train_loss_step=20
Multinomial: 20.012971878051758, Poisson: -0.09589537978172302
Epoch 0:  62%|▌| 474/766 [01:57<01:12,  4.05it/s, v_num=a0al, train_loss_step=20
Epoch 0:  62%|▌| 474/766 [01:57<01:12,  4.05it/s, v_num=a0al, train_loss_step=19
Multinomial: 18.88588523864746, Poisson: -0.09001684188842773
Epoch 0:  62%|▌| 475/766 [01:57<01:11,  4.05it/s, v_num=a0al, train_loss_step=19
Epoch 0:  62%|▌| 475/766 [01:57<01:11,  4.05it/s, v_num=a0al, train_loss_step=18
Multinomial: 20.06610679626465, Poisson: -0.0959547832608223

Epoch 0:  62%|▌| 476/766 [01:57<01:11,  4.05it/s, v_num=a0al, train_loss_step=18
Epoch 0:  62%|▌| 476/766 [01:57<01:11,  4.05it/s, v_num=a0al, train_loss_step=20
Multinomial: 18.90119171142578, Poisson: -0.0900401920080185

Epoch 0:  62%|▌| 477/766 [01:57<01:11,  4.05it/s, v_num=a0al, train_loss_step=20
Epoch 0:  62%|▌| 477/766 [01:57<01:11,  4.05it/s, v_num=a0al, train_loss_step=18
Multinomial: 21.705995559692383, Poisson: -0.10461120307445526
Epoch 0:  62%|▌| 478/766 [01:57<01:11,  4.05it/s, v_num=a0al, train_loss_step=18
Epoch 0:  62%|▌| 478/766 [01:58<01:11,  4.05it/s, v_num=a0al, train_loss_step=21
Multinomial: 22.32662010192871, Poisson: -0.10744621604681015
Epoch 0:  63%|▋| 479/766 [01:58<01:10,  4.05it/s, v_num=a0al, train_loss_step=21
Epoch 0:  63%|▋| 479/766 [01:58<01:10,  4.05it/s, v_num=a0al, train_loss_step=22
Multinomial: 20.06502914428711, Poisson: -0.09612105786800385
Epoch 0:  63%|▋| 480/766 [01:58<01:10,  4.05it/s, v_num=a0al, train_loss_step=22
Epoch 0:  63%|▋| 480/766 [01:58<01:10,  4.05it/s, v_num=a0al, train_loss_step=20
Multinomial: 18.90214729309082, Poisson: -0.0900801345705986

Epoch 0:  63%|▋| 481/766 [01:58<01:10,  4.05it/s, v_num=a0al, train_loss_step=20
Epoch 0:  63%|▋| 481/766 [01:58<01:10,  4.05it/s, v_num=a0al, train_loss_step=18
Multinomial: 22.344295501708984, Poisson: -0.10750877112150192

Epoch 0:  63%|▋| 482/766 [01:58<01:10,  4.05it/s, v_num=a0al, train_loss_step=18
Epoch 0:  63%|▋| 482/766 [01:59<01:10,  4.05it/s, v_num=a0al, train_loss_step=22
Multinomial: 22.892011642456055, Poisson: -0.1104309931397438
Epoch 0:  63%|▋| 483/766 [01:59<01:09,  4.05it/s, v_num=a0al, train_loss_step=22
Epoch 0:  63%|▋| 483/766 [01:59<01:09,  4.05it/s, v_num=a0al, train_loss_step=22
Multinomial: 18.33388328552246, Poisson: -0.0872010588645935
Epoch 0:  63%|▋| 484/766 [01:59<01:09,  4.05it/s, v_num=a0al, train_loss_step=22
Epoch 0:  63%|▋| 484/766 [01:59<01:09,  4.05it/s, v_num=a0al, train_loss_step=18
Multinomial: 21.70537757873535, Poisson: -0.10474186390638351
Epoch 0:  63%|▋| 485/766 [01:59<01:09,  4.05it/s, v_num=a0al, train_loss_step=18
Epoch 0:  63%|▋| 485/766 [01:59<01:09,  4.05it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.626005172729492, Poisson: -0.09885644912719727

Epoch 0:  63%|▋| 486/766 [01:59<01:09,  4.05it/s, v_num=a0al, train_loss_step=21
Epoch 0:  63%|▋| 486/766 [02:00<01:09,  4.05it/s, v_num=a0al, train_loss_step=20
Multinomial: 18.320764541625977, Poisson: -0.08730830252170563

Epoch 0:  64%|▋| 487/766 [02:00<01:08,  4.05it/s, v_num=a0al, train_loss_step=20
Epoch 0:  64%|▋| 487/766 [02:00<01:08,  4.05it/s, v_num=a0al, train_loss_step=18
Multinomial: 21.204618453979492, Poisson: -0.10184206813573837
Epoch 0:  64%|▋| 488/766 [02:00<01:08,  4.05it/s, v_num=a0al, train_loss_step=18
Epoch 0:  64%|▋| 488/766 [02:00<01:08,  4.05it/s, v_num=a0al, train_loss_step=21
Multinomial: 19.476268768310547, Poisson: -0.09295077621936798

Epoch 0:  64%|▋| 489/766 [02:00<01:08,  4.05it/s, v_num=a0al, train_loss_step=21
Epoch 0:  64%|▋| 489/766 [02:00<01:08,  4.05it/s, v_num=a0al, train_loss_step=19
Multinomial: 21.75274658203125, Poisson: -0.10457751154899597
Epoch 0:  64%|▋| 490/766 [02:00<01:08,  4.05it/s, v_num=a0al, train_loss_step=19
Epoch 0:  64%|▋| 490/766 [02:01<01:08,  4.05it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.167760848999023, Poisson: -0.10183519124984741

Epoch 0:  64%|▋| 491/766 [02:01<01:07,  4.05it/s, v_num=a0al, train_loss_step=21
Epoch 0:  64%|▋| 491/766 [02:01<01:07,  4.05it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.753326416015625, Poisson: -0.10484641790390015

Epoch 0:  64%|▋| 492/766 [02:01<01:07,  4.05it/s, v_num=a0al, train_loss_step=21
Epoch 0:  64%|▋| 492/766 [02:01<01:07,  4.05it/s, v_num=a0al, train_loss_step=21
Multinomial: 22.363622665405273, Poisson: -0.10764535516500473
Epoch 0:  64%|▋| 493/766 [02:01<01:07,  4.05it/s, v_num=a0al, train_loss_step=21
Epoch 0:  64%|▋| 493/766 [02:01<01:07,  4.05it/s, v_num=a0al, train_loss_step=22
Multinomial: 20.632719039916992, Poisson: -0.0988030731678009

Epoch 0:  64%|▋| 494/766 [02:01<01:07,  4.05it/s, v_num=a0al, train_loss_step=22
Epoch 0:  64%|▋| 494/766 [02:01<01:07,  4.05it/s, v_num=a0al, train_loss_step=20
Multinomial: 22.926715850830078, Poisson: -0.11051718145608902
Epoch 0:  65%|▋| 495/766 [02:02<01:06,  4.05it/s, v_num=a0al, train_loss_step=20
Epoch 0:  65%|▋| 495/766 [02:02<01:06,  4.05it/s, v_num=a0al, train_loss_step=22
Multinomial: 19.477825164794922, Poisson: -0.09308557212352753

Epoch 0:  65%|▋| 496/766 [02:02<01:06,  4.05it/s, v_num=a0al, train_loss_step=22
Epoch 0:  65%|▋| 496/766 [02:02<01:06,  4.05it/s, v_num=a0al, train_loss_step=19
Multinomial: 21.22157096862793, Poisson: -0.10167962312698364

Epoch 0:  65%|▋| 497/766 [02:02<01:06,  4.06it/s, v_num=a0al, train_loss_step=19
Epoch 0:  65%|▋| 497/766 [02:02<01:06,  4.05it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.785541534423828, Poisson: -0.10460682958364487
Epoch 0:  65%|▋| 498/766 [02:02<01:06,  4.06it/s, v_num=a0al, train_loss_step=21
Epoch 0:  65%|▋| 498/766 [02:02<01:06,  4.05it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.182655334472656, Poisson: -0.10170533508062363

Epoch 0:  65%|▋| 499/766 [02:03<01:05,  4.06it/s, v_num=a0al, train_loss_step=21
Epoch 0:  65%|▋| 499/766 [02:03<01:05,  4.05it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.012969970703125, Poisson: -0.09600904583930969
Epoch 0:  65%|▋| 500/766 [02:03<01:05,  4.05it/s, v_num=a0al, train_loss_step=21
Epoch 0:  65%|▋| 500/766 [02:03<01:05,  4.05it/s, v_num=a0al, train_loss_step=19
Multinomial: 20.6324462890625, Poisson: -0.09878172725439072

Epoch 0:  65%|▋| 501/766 [02:03<01:05,  4.06it/s, v_num=a0al, train_loss_step=19
Epoch 0:  65%|▋| 501/766 [02:03<01:05,  4.05it/s, v_num=a0al, train_loss_step=20
Multinomial: 22.375688552856445, Poisson: -0.10759124159812927

Epoch 0:  66%|▋| 502/766 [02:03<01:05,  4.06it/s, v_num=a0al, train_loss_step=20
Epoch 0:  66%|▋| 502/766 [02:03<01:05,  4.05it/s, v_num=a0al, train_loss_step=22
Multinomial: 21.72992706298828, Poisson: -0.10490220785140991
Epoch 0:  66%|▋| 503/766 [02:04<01:04,  4.06it/s, v_num=a0al, train_loss_step=22
Epoch 0:  66%|▋| 503/766 [02:04<01:04,  4.05it/s, v_num=a0al, train_loss_step=21
Multinomial: 17.729093551635742, Poisson: -0.08433445543050766

Epoch 0:  66%|▋| 504/766 [02:04<01:04,  4.06it/s, v_num=a0al, train_loss_step=21
Epoch 0:  66%|▋| 504/766 [02:04<01:04,  4.05it/s, v_num=a0al, train_loss_step=17
Multinomial: 22.33883285522461, Poisson: -0.10752850025892258
Epoch 0:  66%|▋| 505/766 [02:04<01:04,  4.05it/s, v_num=a0al, train_loss_step=17
Epoch 0:  66%|▋| 505/766 [02:04<01:04,  4.05it/s, v_num=a0al, train_loss_step=22
Multinomial: 19.393917083740234, Poisson: -0.09316592663526535

Epoch 0:  66%|▋| 506/766 [02:04<01:04,  4.06it/s, v_num=a0al, train_loss_step=22
Epoch 0:  66%|▋| 506/766 [02:04<01:04,  4.05it/s, v_num=a0al, train_loss_step=19
Multinomial: 21.15519905090332, Poisson: -0.1016923263669014

Epoch 0:  66%|▋| 507/766 [02:04<01:03,  4.06it/s, v_num=a0al, train_loss_step=19
Epoch 0:  66%|▋| 507/766 [02:05<01:03,  4.05it/s, v_num=a0al, train_loss_step=21
Multinomial: 19.47632598876953, Poisson: -0.09319717437028885
Epoch 0:  66%|▋| 508/766 [02:05<01:03,  4.06it/s, v_num=a0al, train_loss_step=21
Epoch 0:  66%|▋| 508/766 [02:05<01:03,  4.05it/s, v_num=a0al, train_loss_step=19
Multinomial: 18.30472183227539, Poisson: -0.08737389743328094

Epoch 0:  66%|▋| 509/766 [02:05<01:03,  4.06it/s, v_num=a0al, train_loss_step=19
Epoch 0:  66%|▋| 509/766 [02:05<01:03,  4.05it/s, v_num=a0al, train_loss_step=18
Multinomial: 22.87850570678711, Poisson: -0.1103426143527031
Epoch 0:  67%|▋| 510/766 [02:05<01:03,  4.05it/s, v_num=a0al, train_loss_step=18
Epoch 0:  67%|▋| 510/766 [02:05<01:03,  4.05it/s, v_num=a0al, train_loss_step=22
Multinomial: 18.92397689819336, Poisson: -0.09042582660913467

Epoch 0:  67%|▋| 511/766 [02:05<01:02,  4.06it/s, v_num=a0al, train_loss_step=22
Epoch 0:  67%|▋| 511/766 [02:06<01:02,  4.05it/s, v_num=a0al, train_loss_step=18
Multinomial: 20.052078247070312, Poisson: -0.09605080634355545

Epoch 0:  67%|▋| 512/766 [02:06<01:02,  4.06it/s, v_num=a0al, train_loss_step=18
Epoch 0:  67%|▋| 512/766 [02:06<01:02,  4.05it/s, v_num=a0al, train_loss_step=20
Multinomial: 20.039587020874023, Poisson: -0.09586768597364426
Epoch 0:  67%|▋| 513/766 [02:06<01:02,  4.06it/s, v_num=a0al, train_loss_step=20
Epoch 0:  67%|▋| 513/766 [02:06<01:02,  4.05it/s, v_num=a0al, train_loss_step=19
Multinomial: 22.329599380493164, Poisson: -0.10767664760351181

Epoch 0:  67%|▋| 514/766 [02:06<01:02,  4.06it/s, v_num=a0al, train_loss_step=19
Epoch 0:  67%|▋| 514/766 [02:06<01:02,  4.05it/s, v_num=a0al, train_loss_step=22
Multinomial: 21.762454986572266, Poisson: -0.10471668094396591
Epoch 0:  67%|▋| 515/766 [02:07<01:01,  4.05it/s, v_num=a0al, train_loss_step=22
Epoch 0:  67%|▋| 515/766 [02:07<01:01,  4.05it/s, v_num=a0al, train_loss_step=21
Multinomial: 22.3719425201416, Poisson: -0.10770434141159058

Epoch 0:  67%|▋| 516/766 [02:07<01:01,  4.06it/s, v_num=a0al, train_loss_step=21
Epoch 0:  67%|▋| 516/766 [02:07<01:01,  4.05it/s, v_num=a0al, train_loss_step=22
Multinomial: 21.777860641479492, Poisson: -0.10472965985536575

Epoch 0:  67%|▋| 517/766 [02:07<01:01,  4.06it/s, v_num=a0al, train_loss_step=22
Epoch 0:  67%|▋| 517/766 [02:07<01:01,  4.05it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.621591567993164, Poisson: -0.09896160662174225
Epoch 0:  68%|▋| 518/766 [02:07<01:01,  4.06it/s, v_num=a0al, train_loss_step=21
Epoch 0:  68%|▋| 518/766 [02:07<01:01,  4.05it/s, v_num=a0al, train_loss_step=20
Multinomial: 16.540300369262695, Poisson: -0.07868669927120209

Epoch 0:  68%|▋| 519/766 [02:07<01:00,  4.06it/s, v_num=a0al, train_loss_step=20
Epoch 0:  68%|▋| 519/766 [02:08<01:00,  4.05it/s, v_num=a0al, train_loss_step=16
Multinomial: 22.385284423828125, Poisson: -0.10761824995279312
Epoch 0:  68%|▋| 520/766 [02:08<01:00,  4.05it/s, v_num=a0al, train_loss_step=16
Epoch 0:  68%|▋| 520/766 [02:08<01:00,  4.05it/s, v_num=a0al, train_loss_step=22
Multinomial: 23.501117706298828, Poisson: -0.11348327249288559

Epoch 0:  68%|▋| 521/766 [02:08<01:00,  4.06it/s, v_num=a0al, train_loss_step=22
Epoch 0:  68%|▋| 521/766 [02:08<01:00,  4.05it/s, v_num=a0al, train_loss_step=23
Multinomial: 21.739992141723633, Poisson: -0.10472604632377625
Epoch 0:  68%|▋| 522/766 [02:08<01:00,  4.06it/s, v_num=a0al, train_loss_step=23
Epoch 0:  68%|▋| 522/766 [02:08<01:00,  4.05it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.185258865356445, Poisson: -0.10173660516738892
Epoch 0:  68%|▋| 523/766 [02:08<00:59,  4.06it/s, v_num=a0al, train_loss_step=21
Epoch 0:  68%|▋| 523/766 [02:08<00:59,  4.06it/s, v_num=a0al, train_loss_step=21
Multinomial: 18.91326332092285, Poisson: -0.0903579592704773

Epoch 0:  68%|▋| 524/766 [02:09<00:59,  4.06it/s, v_num=a0al, train_loss_step=21
Epoch 0:  68%|▋| 524/766 [02:09<00:59,  4.06it/s, v_num=a0al, train_loss_step=18
Multinomial: 22.332609176635742, Poisson: -0.10764188319444656
Epoch 0:  69%|▋| 525/766 [02:09<00:59,  4.06it/s, v_num=a0al, train_loss_step=18
Epoch 0:  69%|▋| 525/766 [02:09<00:59,  4.06it/s, v_num=a0al, train_loss_step=22
Multinomial: 21.16383171081543, Poisson: -0.10190869122743607

Epoch 0:  69%|▋| 526/766 [02:09<00:59,  4.06it/s, v_num=a0al, train_loss_step=22
Epoch 0:  69%|▋| 526/766 [02:09<00:59,  4.06it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.098201751708984, Poisson: -0.10189001262187958
Epoch 0:  69%|▋| 527/766 [02:09<00:58,  4.06it/s, v_num=a0al, train_loss_step=21
Epoch 0:  69%|▋| 527/766 [02:09<00:58,  4.06it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.59515380859375, Poisson: -0.09890273213386536
Epoch 0:  69%|▋| 528/766 [02:10<00:58,  4.06it/s, v_num=a0al, train_loss_step=21
Epoch 0:  69%|▋| 528/766 [02:10<00:58,  4.06it/s, v_num=a0al, train_loss_step=20
Multinomial: 19.494415283203125, Poisson: -0.09319400042295456

Epoch 0:  69%|▋| 529/766 [02:10<00:58,  4.06it/s, v_num=a0al, train_loss_step=20
Epoch 0:  69%|▋| 529/766 [02:10<00:58,  4.06it/s, v_num=a0al, train_loss_step=19
Multinomial: 19.47793197631836, Poisson: -0.09325416386127472
Epoch 0:  69%|▋| 530/766 [02:10<00:58,  4.06it/s, v_num=a0al, train_loss_step=19
Epoch 0:  69%|▋| 530/766 [02:10<00:58,  4.06it/s, v_num=a0al, train_loss_step=19
Multinomial: 18.84056282043457, Poisson: -0.09030847996473312

Epoch 0:  69%|▋| 531/766 [02:10<00:57,  4.06it/s, v_num=a0al, train_loss_step=19
Epoch 0:  69%|▋| 531/766 [02:10<00:57,  4.06it/s, v_num=a0al, train_loss_step=18
Multinomial: 19.482717514038086, Poisson: -0.09316873550415039
Epoch 0:  69%|▋| 532/766 [02:11<00:57,  4.06it/s, v_num=a0al, train_loss_step=18
Epoch 0:  69%|▋| 532/766 [02:11<00:57,  4.06it/s, v_num=a0al, train_loss_step=19
Multinomial: 20.619293212890625, Poisson: -0.09898217767477036
Epoch 0:  70%|▋| 533/766 [02:11<00:57,  4.06it/s, v_num=a0al, train_loss_step=19
Epoch 0:  70%|▋| 533/766 [02:11<00:57,  4.06it/s, v_num=a0al, train_loss_step=20
Multinomial: 23.45407485961914, Poisson: -0.11366017907857895

Epoch 0:  70%|▋| 534/766 [02:11<00:57,  4.06it/s, v_num=a0al, train_loss_step=20
Epoch 0:  70%|▋| 534/766 [02:11<00:57,  4.06it/s, v_num=a0al, train_loss_step=23
Multinomial: 20.06269645690918, Poisson: -0.09609104692935944
Epoch 0:  70%|▋| 535/766 [02:11<00:56,  4.06it/s, v_num=a0al, train_loss_step=23
Epoch 0:  70%|▋| 535/766 [02:11<00:56,  4.06it/s, v_num=a0al, train_loss_step=20
Multinomial: 18.31659698486328, Poisson: -0.08762583136558533

Epoch 0:  70%|▋| 536/766 [02:11<00:56,  4.06it/s, v_num=a0al, train_loss_step=20
Epoch 0:  70%|▋| 536/766 [02:12<00:56,  4.06it/s, v_num=a0al, train_loss_step=18
Multinomial: 19.4761962890625, Poisson: -0.09314439445734024
Epoch 0:  70%|▋| 537/766 [02:12<00:56,  4.06it/s, v_num=a0al, train_loss_step=18
Epoch 0:  70%|▋| 537/766 [02:12<00:56,  4.06it/s, v_num=a0al, train_loss_step=19
Multinomial: 23.999032974243164, Poisson: -0.11631442606449127

Epoch 0:  70%|▋| 538/766 [02:12<00:56,  4.06it/s, v_num=a0al, train_loss_step=19
Epoch 0:  70%|▋| 538/766 [02:12<00:56,  4.06it/s, v_num=a0al, train_loss_step=23
Multinomial: 21.194652557373047, Poisson: -0.10190042108297348

Epoch 0:  70%|▋| 539/766 [02:12<00:55,  4.06it/s, v_num=a0al, train_loss_step=23
Epoch 0:  70%|▋| 539/766 [02:12<00:55,  4.06it/s, v_num=a0al, train_loss_step=21
Multinomial: 18.30335235595703, Poisson: -0.08749409019947052
Epoch 0:  70%|▋| 540/766 [02:13<00:55,  4.06it/s, v_num=a0al, train_loss_step=21
Epoch 0:  70%|▋| 540/766 [02:13<00:55,  4.06it/s, v_num=a0al, train_loss_step=18
Multinomial: 21.20626449584961, Poisson: -0.10183284431695938

Epoch 0:  71%|▋| 541/766 [02:13<00:55,  4.06it/s, v_num=a0al, train_loss_step=18
Epoch 0:  71%|▋| 541/766 [02:13<00:55,  4.06it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.777469635009766, Poisson: -0.10488604754209518
Epoch 0:  71%|▋| 542/766 [02:13<00:55,  4.06it/s, v_num=a0al, train_loss_step=21
Epoch 0:  71%|▋| 542/766 [02:13<00:55,  4.06it/s, v_num=a0al, train_loss_step=21
Multinomial: 22.33306121826172, Poisson: -0.10774660110473633
Epoch 0:  71%|▋| 543/766 [02:13<00:54,  4.06it/s, v_num=a0al, train_loss_step=21
Epoch 0:  71%|▋| 543/766 [02:13<00:54,  4.06it/s, v_num=a0al, train_loss_step=22
Multinomial: 19.98891830444336, Poisson: -0.09613889455795288

Epoch 0:  71%|▋| 544/766 [02:13<00:54,  4.06it/s, v_num=a0al, train_loss_step=22
Epoch 0:  71%|▋| 544/766 [02:14<00:54,  4.06it/s, v_num=a0al, train_loss_step=19
Multinomial: 19.997255325317383, Poisson: -0.09609785676002502
Epoch 0:  71%|▋| 545/766 [02:14<00:54,  4.06it/s, v_num=a0al, train_loss_step=19
Epoch 0:  71%|▋| 545/766 [02:14<00:54,  4.06it/s, v_num=a0al, train_loss_step=19
Multinomial: 21.197553634643555, Poisson: -0.10187359899282455

Epoch 0:  71%|▋| 546/766 [02:14<00:54,  4.06it/s, v_num=a0al, train_loss_step=19
Epoch 0:  71%|▋| 546/766 [02:14<00:54,  4.06it/s, v_num=a0al, train_loss_step=21
Multinomial: 22.859487533569336, Poisson: -0.11060630530118942
Epoch 0:  71%|▋| 547/766 [02:14<00:53,  4.06it/s, v_num=a0al, train_loss_step=21
Epoch 0:  71%|▋| 547/766 [02:14<00:53,  4.06it/s, v_num=a0al, train_loss_step=22
Multinomial: 22.890262603759766, Poisson: -0.11062599718570709
Epoch 0:  72%|▋| 548/766 [02:14<00:53,  4.06it/s, v_num=a0al, train_loss_step=22
Epoch 0:  72%|▋| 548/766 [02:15<00:53,  4.06it/s, v_num=a0al, train_loss_step=22
Multinomial: 21.20583724975586, Poisson: -0.10201893746852875

Epoch 0:  72%|▋| 549/766 [02:15<00:53,  4.06it/s, v_num=a0al, train_loss_step=22
Epoch 0:  72%|▋| 549/766 [02:15<00:53,  4.06it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.77797508239746, Poisson: -0.10483971983194351
Epoch 0:  72%|▋| 550/766 [02:15<00:53,  4.06it/s, v_num=a0al, train_loss_step=21
Epoch 0:  72%|▋| 550/766 [02:15<00:53,  4.06it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.097412109375, Poisson: -0.09617631137371063

Epoch 0:  72%|▋| 551/766 [02:15<00:52,  4.06it/s, v_num=a0al, train_loss_step=21
Epoch 0:  72%|▋| 551/766 [02:15<00:52,  4.06it/s, v_num=a0al, train_loss_step=20
Multinomial: 22.389467239379883, Poisson: -0.10784398019313812
Epoch 0:  72%|▋| 552/766 [02:15<00:52,  4.06it/s, v_num=a0al, train_loss_step=20
Epoch 0:  72%|▋| 552/766 [02:15<00:52,  4.06it/s, v_num=a0al, train_loss_step=22
Multinomial: 22.91577911376953, Poisson: -0.11067967116832733

Epoch 0:  72%|▋| 553/766 [02:16<00:52,  4.06it/s, v_num=a0al, train_loss_step=22
Epoch 0:  72%|▋| 553/766 [02:16<00:52,  4.06it/s, v_num=a0al, train_loss_step=22
Multinomial: 22.879379272460938, Poisson: -0.11067167669534683

Epoch 0:  72%|▋| 554/766 [02:16<00:52,  4.06it/s, v_num=a0al, train_loss_step=22
Epoch 0:  72%|▋| 554/766 [02:16<00:52,  4.06it/s, v_num=a0al, train_loss_step=22
Multinomial: 20.61037254333496, Poisson: -0.09901925921440125
Epoch 0:  72%|▋| 555/766 [02:16<00:51,  4.06it/s, v_num=a0al, train_loss_step=22
Epoch 0:  72%|▋| 555/766 [02:16<00:51,  4.06it/s, v_num=a0al, train_loss_step=20
Multinomial: 20.037792205810547, Poisson: -0.09608427435159683
Epoch 0:  73%|▋| 556/766 [02:16<00:51,  4.06it/s, v_num=a0al, train_loss_step=20
Epoch 0:  73%|▋| 556/766 [02:16<00:51,  4.06it/s, v_num=a0al, train_loss_step=19
Multinomial: 18.957345962524414, Poisson: -0.09041019529104233

Epoch 0:  73%|▋| 557/766 [02:17<00:51,  4.06it/s, v_num=a0al, train_loss_step=19
Epoch 0:  73%|▋| 557/766 [02:17<00:51,  4.06it/s, v_num=a0al, train_loss_step=18
Multinomial: 17.77267837524414, Poisson: -0.0846007689833641

Epoch 0:  73%|▋| 558/766 [02:17<00:51,  4.06it/s, v_num=a0al, train_loss_step=18
Epoch 0:  73%|▋| 558/766 [02:17<00:51,  4.06it/s, v_num=a0al, train_loss_step=17
Multinomial: 20.03352165222168, Poisson: -0.09616824984550476

Epoch 0:  73%|▋| 559/766 [02:17<00:50,  4.06it/s, v_num=a0al, train_loss_step=17
Epoch 0:  73%|▋| 559/766 [02:17<00:50,  4.06it/s, v_num=a0al, train_loss_step=19
Multinomial: 20.01946449279785, Poisson: -0.09619680792093277
Epoch 0:  73%|▋| 560/766 [02:17<00:50,  4.06it/s, v_num=a0al, train_loss_step=19
Epoch 0:  73%|▋| 560/766 [02:17<00:50,  4.06it/s, v_num=a0al, train_loss_step=19
Multinomial: 18.306650161743164, Poisson: -0.08740312606096268
Epoch 0:  73%|▋| 561/766 [02:18<00:50,  4.06it/s, v_num=a0al, train_loss_step=19
Epoch 0:  73%|▋| 561/766 [02:18<00:50,  4.06it/s, v_num=a0al, train_loss_step=18
Multinomial: 22.354135513305664, Poisson: -0.10777858644723892

Epoch 0:  73%|▋| 562/766 [02:18<00:50,  4.06it/s, v_num=a0al, train_loss_step=18
Epoch 0:  73%|▋| 562/766 [02:18<00:50,  4.06it/s, v_num=a0al, train_loss_step=22
Multinomial: 21.208993911743164, Poisson: -0.10200832784175873

Epoch 0:  73%|▋| 563/766 [02:18<00:49,  4.06it/s, v_num=a0al, train_loss_step=22
Epoch 0:  73%|▋| 563/766 [02:18<00:49,  4.06it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.653942108154297, Poisson: -0.09901890903711319

Epoch 0:  74%|▋| 564/766 [02:18<00:49,  4.06it/s, v_num=a0al, train_loss_step=21
Epoch 0:  74%|▋| 564/766 [02:18<00:49,  4.06it/s, v_num=a0al, train_loss_step=20
Multinomial: 21.73474884033203, Poisson: -0.10498061031103134
Epoch 0:  74%|▋| 565/766 [02:19<00:49,  4.06it/s, v_num=a0al, train_loss_step=20
Epoch 0:  74%|▋| 565/766 [02:19<00:49,  4.06it/s, v_num=a0al, train_loss_step=21
Multinomial: 22.364513397216797, Poisson: -0.10782323777675629
Epoch 0:  74%|▋| 566/766 [02:19<00:49,  4.06it/s, v_num=a0al, train_loss_step=21
Epoch 0:  74%|▋| 566/766 [02:19<00:49,  4.06it/s, v_num=a0al, train_loss_step=22
Multinomial: 20.02947425842285, Poisson: -0.09621088206768036

Epoch 0:  74%|▋| 567/766 [02:19<00:48,  4.07it/s, v_num=a0al, train_loss_step=22
Epoch 0:  74%|▋| 567/766 [02:19<00:48,  4.06it/s, v_num=a0al, train_loss_step=19
Multinomial: 20.660968780517578, Poisson: -0.0989837497472763

Epoch 0:  74%|▋| 568/766 [02:19<00:48,  4.07it/s, v_num=a0al, train_loss_step=19
Epoch 0:  74%|▋| 568/766 [02:19<00:48,  4.06it/s, v_num=a0al, train_loss_step=20
Multinomial: 21.195329666137695, Poisson: -0.10204056650400162
Epoch 0:  74%|▋| 569/766 [02:19<00:48,  4.07it/s, v_num=a0al, train_loss_step=20
Epoch 0:  74%|▋| 569/766 [02:20<00:48,  4.06it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.745546340942383, Poisson: -0.1049041748046875
Epoch 0:  74%|▋| 570/766 [02:20<00:48,  4.06it/s, v_num=a0al, train_loss_step=21
Epoch 0:  74%|▋| 570/766 [02:20<00:48,  4.06it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.060508728027344, Poisson: -0.09597116708755493

Epoch 0:  75%|▋| 571/766 [02:20<00:47,  4.07it/s, v_num=a0al, train_loss_step=21
Epoch 0:  75%|▋| 571/766 [02:20<00:48,  4.06it/s, v_num=a0al, train_loss_step=20
Multinomial: 21.203964233398438, Poisson: -0.10202384740114212

Epoch 0:  75%|▋| 572/766 [02:20<00:47,  4.07it/s, v_num=a0al, train_loss_step=20
Epoch 0:  75%|▋| 572/766 [02:20<00:47,  4.06it/s, v_num=a0al, train_loss_step=21
Multinomial: 25.170516967773438, Poisson: -0.12212321162223816
Epoch 0:  75%|▋| 573/766 [02:20<00:47,  4.07it/s, v_num=a0al, train_loss_step=21
Epoch 0:  75%|▋| 573/766 [02:21<00:47,  4.06it/s, v_num=a0al, train_loss_step=25
Multinomial: 21.194564819335938, Poisson: -0.10183016955852509
Epoch 0:  75%|▋| 574/766 [02:21<00:47,  4.07it/s, v_num=a0al, train_loss_step=25
Epoch 0:  75%|▋| 574/766 [02:21<00:47,  4.06it/s, v_num=a0al, train_loss_step=21
Multinomial: 22.293495178222656, Poisson: -0.10767456889152527
Epoch 0:  75%|▊| 575/766 [02:21<00:47,  4.06it/s, v_num=a0al, train_loss_step=21
Epoch 0:  75%|▊| 575/766 [02:21<00:47,  4.06it/s, v_num=a0al, train_loss_step=22
Multinomial: 22.32612419128418, Poisson: -0.10786134749650955

Epoch 0:  75%|▊| 576/766 [02:21<00:46,  4.07it/s, v_num=a0al, train_loss_step=22
Epoch 0:  75%|▊| 576/766 [02:21<00:46,  4.06it/s, v_num=a0al, train_loss_step=22
Multinomial: 19.417272567749023, Poisson: -0.09323275834321976

Epoch 0:  75%|▊| 577/766 [02:21<00:46,  4.07it/s, v_num=a0al, train_loss_step=22
Epoch 0:  75%|▊| 577/766 [02:22<00:46,  4.06it/s, v_num=a0al, train_loss_step=19
Multinomial: 22.920013427734375, Poisson: -0.11068196594715118
Epoch 0:  75%|▊| 578/766 [02:22<00:46,  4.07it/s, v_num=a0al, train_loss_step=19
Epoch 0:  75%|▊| 578/766 [02:22<00:46,  4.06it/s, v_num=a0al, train_loss_step=22
Multinomial: 22.92017936706543, Poisson: -0.11057084798812866
Epoch 0:  76%|▊| 579/766 [02:22<00:45,  4.07it/s, v_num=a0al, train_loss_step=22
Epoch 0:  76%|▊| 579/766 [02:22<00:46,  4.06it/s, v_num=a0al, train_loss_step=22
Multinomial: 20.585636138916016, Poisson: -0.09909996390342712
Epoch 0:  76%|▊| 580/766 [02:22<00:45,  4.06it/s, v_num=a0al, train_loss_step=22
Epoch 0:  76%|▊| 580/766 [02:22<00:45,  4.06it/s, v_num=a0al, train_loss_step=20
Multinomial: 18.35625648498535, Poisson: -0.08748295158147812

Epoch 0:  76%|▊| 581/766 [02:22<00:45,  4.07it/s, v_num=a0al, train_loss_step=20
Epoch 0:  76%|▊| 581/766 [02:22<00:45,  4.06it/s, v_num=a0al, train_loss_step=18
Multinomial: 18.28192710876465, Poisson: -0.08736550807952881

Epoch 0:  76%|▊| 582/766 [02:23<00:45,  4.07it/s, v_num=a0al, train_loss_step=18
Epoch 0:  76%|▊| 582/766 [02:23<00:45,  4.06it/s, v_num=a0al, train_loss_step=18
Multinomial: 18.358203887939453, Poisson: -0.08747058361768723
Epoch 0:  76%|▊| 583/766 [02:23<00:44,  4.07it/s, v_num=a0al, train_loss_step=18
Epoch 0:  76%|▊| 583/766 [02:23<00:45,  4.06it/s, v_num=a0al, train_loss_step=18
Multinomial: 20.040454864501953, Poisson: -0.09612678736448288
Epoch 0:  76%|▊| 584/766 [02:23<00:44,  4.07it/s, v_num=a0al, train_loss_step=18
Epoch 0:  76%|▊| 584/766 [02:23<00:44,  4.06it/s, v_num=a0al, train_loss_step=19
Multinomial: 20.636125564575195, Poisson: -0.09899233281612396
Epoch 0:  76%|▊| 585/766 [02:23<00:44,  4.06it/s, v_num=a0al, train_loss_step=19
Epoch 0:  76%|▊| 585/766 [02:23<00:44,  4.06it/s, v_num=a0al, train_loss_step=20
Multinomial: 22.294565200805664, Poisson: -0.10777721554040909

Epoch 0:  77%|▊| 586/766 [02:24<00:44,  4.07it/s, v_num=a0al, train_loss_step=20
Epoch 0:  77%|▊| 586/766 [02:24<00:44,  4.06it/s, v_num=a0al, train_loss_step=22
Multinomial: 21.7929630279541, Poisson: -0.10491427034139633

Epoch 0:  77%|▊| 587/766 [02:24<00:44,  4.07it/s, v_num=a0al, train_loss_step=22
Epoch 0:  77%|▊| 587/766 [02:24<00:44,  4.06it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.575651168823242, Poisson: -0.09907319396734238
Epoch 0:  77%|▊| 588/766 [02:24<00:43,  4.07it/s, v_num=a0al, train_loss_step=21
Epoch 0:  77%|▊| 588/766 [02:24<00:43,  4.06it/s, v_num=a0al, train_loss_step=20
Multinomial: 19.472824096679688, Poisson: -0.09318910539150238

Epoch 0:  77%|▊| 589/766 [02:24<00:43,  4.07it/s, v_num=a0al, train_loss_step=20
Epoch 0:  77%|▊| 589/766 [02:24<00:43,  4.06it/s, v_num=a0al, train_loss_step=19
Multinomial: 20.570619583129883, Poisson: -0.09897415339946747
Epoch 0:  77%|▊| 590/766 [02:25<00:43,  4.06it/s, v_num=a0al, train_loss_step=19
Epoch 0:  77%|▊| 590/766 [02:25<00:43,  4.06it/s, v_num=a0al, train_loss_step=20
Multinomial: 21.23089027404785, Poisson: -0.10216553509235382

Epoch 0:  77%|▊| 591/766 [02:25<00:43,  4.07it/s, v_num=a0al, train_loss_step=20
Epoch 0:  77%|▊| 591/766 [02:25<00:43,  4.06it/s, v_num=a0al, train_loss_step=21
Multinomial: 19.470623016357422, Poisson: -0.09319072216749191

Epoch 0:  77%|▊| 592/766 [02:25<00:42,  4.07it/s, v_num=a0al, train_loss_step=21
Epoch 0:  77%|▊| 592/766 [02:25<00:42,  4.06it/s, v_num=a0al, train_loss_step=19
Multinomial: 22.3203067779541, Poisson: -0.10772398114204407
Epoch 0:  77%|▊| 593/766 [02:25<00:42,  4.07it/s, v_num=a0al, train_loss_step=19
Epoch 0:  77%|▊| 593/766 [02:25<00:42,  4.06it/s, v_num=a0al, train_loss_step=22
Multinomial: 23.481698989868164, Poisson: -0.11348134279251099

Epoch 0:  78%|▊| 594/766 [02:25<00:42,  4.07it/s, v_num=a0al, train_loss_step=22
Epoch 0:  78%|▊| 594/766 [02:26<00:42,  4.06it/s, v_num=a0al, train_loss_step=23
Multinomial: 18.397077560424805, Poisson: -0.0875048041343689
Epoch 0:  78%|▊| 595/766 [02:26<00:42,  4.07it/s, v_num=a0al, train_loss_step=23
Epoch 0:  78%|▊| 595/766 [02:26<00:42,  4.06it/s, v_num=a0al, train_loss_step=18
Multinomial: 18.281776428222656, Poisson: -0.08735102415084839

Epoch 0:  78%|▊| 596/766 [02:26<00:41,  4.07it/s, v_num=a0al, train_loss_step=18
Epoch 0:  78%|▊| 596/766 [02:26<00:41,  4.07it/s, v_num=a0al, train_loss_step=18
Multinomial: 18.900833129882812, Poisson: -0.09037599712610245
Epoch 0:  78%|▊| 597/766 [02:26<00:41,  4.07it/s, v_num=a0al, train_loss_step=18
Epoch 0:  78%|▊| 597/766 [02:26<00:41,  4.07it/s, v_num=a0al, train_loss_step=18
Multinomial: 20.041349411010742, Poisson: -0.09603632241487503
Epoch 0:  78%|▊| 598/766 [02:26<00:41,  4.07it/s, v_num=a0al, train_loss_step=18
Epoch 0:  78%|▊| 598/766 [02:27<00:41,  4.07it/s, v_num=a0al, train_loss_step=19
Multinomial: 21.15169906616211, Poisson: -0.10192982852458954

Epoch 0:  78%|▊| 599/766 [02:27<00:41,  4.07it/s, v_num=a0al, train_loss_step=19
Epoch 0:  78%|▊| 599/766 [02:27<00:41,  4.07it/s, v_num=a0al, train_loss_step=21
Multinomial: 19.998565673828125, Poisson: -0.09607094526290894
Epoch 0:  78%|▊| 600/766 [02:27<00:40,  4.07it/s, v_num=a0al, train_loss_step=21
Epoch 0:  78%|▊| 600/766 [02:27<00:40,  4.07it/s, v_num=a0al, train_loss_step=19
Multinomial: 20.630062103271484, Poisson: -0.09916112571954727

Epoch 0:  78%|▊| 601/766 [02:27<00:40,  4.07it/s, v_num=a0al, train_loss_step=19
Epoch 0:  78%|▊| 601/766 [02:27<00:40,  4.07it/s, v_num=a0al, train_loss_step=20
Multinomial: 24.03969955444336, Poisson: -0.11638204008340836
Epoch 0:  79%|▊| 602/766 [02:27<00:40,  4.07it/s, v_num=a0al, train_loss_step=20
Epoch 0:  79%|▊| 602/766 [02:28<00:40,  4.07it/s, v_num=a0al, train_loss_step=23
Multinomial: 20.574071884155273, Poisson: -0.09899208694696426
Epoch 0:  79%|▊| 603/766 [02:28<00:40,  4.07it/s, v_num=a0al, train_loss_step=23
Epoch 0:  79%|▊| 603/766 [02:28<00:40,  4.07it/s, v_num=a0al, train_loss_step=20
Multinomial: 20.062055587768555, Poisson: -0.09617772698402405

Epoch 0:  79%|▊| 604/766 [02:28<00:39,  4.07it/s, v_num=a0al, train_loss_step=20
Epoch 0:  79%|▊| 604/766 [02:28<00:39,  4.07it/s, v_num=a0al, train_loss_step=20
Multinomial: 22.900306701660156, Poisson: -0.11064215004444122
Epoch 0:  79%|▊| 605/766 [02:28<00:39,  4.07it/s, v_num=a0al, train_loss_step=20
Epoch 0:  79%|▊| 605/766 [02:28<00:39,  4.07it/s, v_num=a0al, train_loss_step=22
Multinomial: 19.45872688293457, Poisson: -0.0931672751903534

Epoch 0:  79%|▊| 606/766 [02:28<00:39,  4.07it/s, v_num=a0al, train_loss_step=22
Epoch 0:  79%|▊| 606/766 [02:29<00:39,  4.07it/s, v_num=a0al, train_loss_step=19
Multinomial: 20.61547088623047, Poisson: -0.09892795234918594
Epoch 0:  79%|▊| 607/766 [02:29<00:39,  4.07it/s, v_num=a0al, train_loss_step=19
Epoch 0:  79%|▊| 607/766 [02:29<00:39,  4.07it/s, v_num=a0al, train_loss_step=20
Multinomial: 21.748783111572266, Poisson: -0.1048136055469513

Epoch 0:  79%|▊| 608/766 [02:29<00:38,  4.07it/s, v_num=a0al, train_loss_step=20
Epoch 0:  79%|▊| 608/766 [02:29<00:38,  4.07it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.059423446655273, Poisson: -0.0961209386587143

Epoch 0:  80%|▊| 609/766 [02:29<00:38,  4.07it/s, v_num=a0al, train_loss_step=21
Epoch 0:  80%|▊| 609/766 [02:29<00:38,  4.07it/s, v_num=a0al, train_loss_step=20
Multinomial: 20.010229110717773, Poisson: -0.09607797861099243
Epoch 0:  80%|▊| 610/766 [02:29<00:38,  4.07it/s, v_num=a0al, train_loss_step=20
Epoch 0:  80%|▊| 610/766 [02:29<00:38,  4.07it/s, v_num=a0al, train_loss_step=19
Multinomial: 23.505292892456055, Poisson: -0.11350621283054352

Epoch 0:  80%|▊| 611/766 [02:30<00:38,  4.07it/s, v_num=a0al, train_loss_step=19
Epoch 0:  80%|▊| 611/766 [02:30<00:38,  4.07it/s, v_num=a0al, train_loss_step=23
Multinomial: 20.647974014282227, Poisson: -0.09891486167907715
Epoch 0:  80%|▊| 612/766 [02:30<00:37,  4.07it/s, v_num=a0al, train_loss_step=23
Epoch 0:  80%|▊| 612/766 [02:30<00:37,  4.07it/s, v_num=a0al, train_loss_step=20
Multinomial: 22.9503173828125, Poisson: -0.11074764281511307

Epoch 0:  80%|▊| 613/766 [02:30<00:37,  4.07it/s, v_num=a0al, train_loss_step=20
Epoch 0:  80%|▊| 613/766 [02:30<00:37,  4.07it/s, v_num=a0al, train_loss_step=22
Multinomial: 19.505077362060547, Poisson: -0.09333376586437225

Epoch 0:  80%|▊| 614/766 [02:30<00:37,  4.07it/s, v_num=a0al, train_loss_step=22
Epoch 0:  80%|▊| 614/766 [02:30<00:37,  4.07it/s, v_num=a0al, train_loss_step=19
Multinomial: 21.72488021850586, Poisson: -0.10470889508724213
Epoch 0:  80%|▊| 615/766 [02:31<00:37,  4.07it/s, v_num=a0al, train_loss_step=19
Epoch 0:  80%|▊| 615/766 [02:31<00:37,  4.07it/s, v_num=a0al, train_loss_step=21
Multinomial: 25.20842170715332, Poisson: -0.12234365195035934

Epoch 0:  80%|▊| 616/766 [02:31<00:36,  4.07it/s, v_num=a0al, train_loss_step=21
Epoch 0:  80%|▊| 616/766 [02:31<00:36,  4.07it/s, v_num=a0al, train_loss_step=25
Multinomial: 19.443464279174805, Poisson: -0.09326735883951187
Epoch 0:  81%|▊| 617/766 [02:31<00:36,  4.07it/s, v_num=a0al, train_loss_step=25
Epoch 0:  81%|▊| 617/766 [02:31<00:36,  4.07it/s, v_num=a0al, train_loss_step=19
Multinomial: 20.06236457824707, Poisson: -0.09638020396232605
Epoch 0:  81%|▊| 618/766 [02:31<00:36,  4.07it/s, v_num=a0al, train_loss_step=19
Epoch 0:  81%|▊| 618/766 [02:31<00:36,  4.07it/s, v_num=a0al, train_loss_step=20
Multinomial: 20.628524780273438, Poisson: -0.0990341380238533

Epoch 0:  81%|▊| 619/766 [02:32<00:36,  4.07it/s, v_num=a0al, train_loss_step=20
Epoch 0:  81%|▊| 619/766 [02:32<00:36,  4.07it/s, v_num=a0al, train_loss_step=20
Multinomial: 20.54349136352539, Poisson: -0.09893681108951569
Epoch 0:  81%|▊| 620/766 [02:32<00:35,  4.07it/s, v_num=a0al, train_loss_step=20
Epoch 0:  81%|▊| 620/766 [02:32<00:35,  4.07it/s, v_num=a0al, train_loss_step=20
Multinomial: 20.07408332824707, Poisson: -0.09616923332214355

Epoch 0:  81%|▊| 621/766 [02:32<00:35,  4.07it/s, v_num=a0al, train_loss_step=20
Epoch 0:  81%|▊| 621/766 [02:32<00:35,  4.07it/s, v_num=a0al, train_loss_step=20
Multinomial: 23.465482711791992, Poisson: -0.11332377791404724
Epoch 0:  81%|▊| 622/766 [02:32<00:35,  4.07it/s, v_num=a0al, train_loss_step=20
Epoch 0:  81%|▊| 622/766 [02:32<00:35,  4.07it/s, v_num=a0al, train_loss_step=23
Multinomial: 22.919734954833984, Poisson: -0.11065004765987396
Epoch 0:  81%|▊| 623/766 [02:33<00:35,  4.07it/s, v_num=a0al, train_loss_step=23
Epoch 0:  81%|▊| 623/766 [02:33<00:35,  4.07it/s, v_num=a0al, train_loss_step=22
Multinomial: 20.591352462768555, Poisson: -0.09888836741447449

Epoch 0:  81%|▊| 624/766 [02:33<00:34,  4.07it/s, v_num=a0al, train_loss_step=22
Epoch 0:  81%|▊| 624/766 [02:33<00:34,  4.07it/s, v_num=a0al, train_loss_step=20
Multinomial: 21.751014709472656, Poisson: -0.10491835325956345
Epoch 0:  82%|▊| 625/766 [02:33<00:34,  4.07it/s, v_num=a0al, train_loss_step=20
Epoch 0:  82%|▊| 625/766 [02:33<00:34,  4.07it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.052703857421875, Poisson: -0.09618022292852402

Epoch 0:  82%|▊| 626/766 [02:33<00:34,  4.07it/s, v_num=a0al, train_loss_step=21
Epoch 0:  82%|▊| 626/766 [02:33<00:34,  4.07it/s, v_num=a0al, train_loss_step=20
Multinomial: 24.039308547973633, Poisson: -0.11643020063638687
Epoch 0:  82%|▊| 627/766 [02:33<00:34,  4.07it/s, v_num=a0al, train_loss_step=20
Epoch 0:  82%|▊| 627/766 [02:34<00:34,  4.07it/s, v_num=a0al, train_loss_step=23
Multinomial: 19.47958755493164, Poisson: -0.09322299808263779
Epoch 0:  82%|▊| 628/766 [02:34<00:33,  4.07it/s, v_num=a0al, train_loss_step=23
Epoch 0:  82%|▊| 628/766 [02:34<00:33,  4.07it/s, v_num=a0al, train_loss_step=19
Multinomial: 20.07942771911621, Poisson: -0.09612985700368881

Epoch 0:  82%|▊| 629/766 [02:34<00:33,  4.07it/s, v_num=a0al, train_loss_step=19
Epoch 0:  82%|▊| 629/766 [02:34<00:33,  4.07it/s, v_num=a0al, train_loss_step=20
Multinomial: 20.62293815612793, Poisson: -0.09899833798408508
Epoch 0:  82%|▊| 630/766 [02:34<00:33,  4.07it/s, v_num=a0al, train_loss_step=20
Epoch 0:  82%|▊| 630/766 [02:34<00:33,  4.07it/s, v_num=a0al, train_loss_step=20
Multinomial: 21.180599212646484, Poisson: -0.10211139917373657

Epoch 0:  82%|▊| 631/766 [02:34<00:33,  4.07it/s, v_num=a0al, train_loss_step=20
Epoch 0:  82%|▊| 631/766 [02:35<00:33,  4.07it/s, v_num=a0al, train_loss_step=21
Multinomial: 19.43415069580078, Poisson: -0.09314945340156555
Epoch 0:  83%|▊| 632/766 [02:35<00:32,  4.07it/s, v_num=a0al, train_loss_step=21
Epoch 0:  83%|▊| 632/766 [02:35<00:32,  4.07it/s, v_num=a0al, train_loss_step=19
Multinomial: 20.04065704345703, Poisson: -0.0961098000407219
Epoch 0:  83%|▊| 633/766 [02:35<00:32,  4.07it/s, v_num=a0al, train_loss_step=19
Epoch 0:  83%|▊| 633/766 [02:35<00:32,  4.07it/s, v_num=a0al, train_loss_step=19
Multinomial: 20.605737686157227, Poisson: -0.09911376237869263

Epoch 0:  83%|▊| 634/766 [02:35<00:32,  4.07it/s, v_num=a0al, train_loss_step=19
Epoch 0:  83%|▊| 634/766 [02:35<00:32,  4.07it/s, v_num=a0al, train_loss_step=20
Multinomial: 20.07790756225586, Poisson: -0.09614162147045135
Epoch 0:  83%|▊| 635/766 [02:36<00:32,  4.07it/s, v_num=a0al, train_loss_step=20
Epoch 0:  83%|▊| 635/766 [02:36<00:32,  4.07it/s, v_num=a0al, train_loss_step=20
Multinomial: 21.729204177856445, Poisson: -0.10484756529331207

Epoch 0:  83%|▊| 636/766 [02:36<00:31,  4.07it/s, v_num=a0al, train_loss_step=20
Epoch 0:  83%|▊| 636/766 [02:36<00:31,  4.07it/s, v_num=a0al, train_loss_step=21
Multinomial: 18.351215362548828, Poisson: -0.08738947659730911
Epoch 0:  83%|▊| 637/766 [02:36<00:31,  4.07it/s, v_num=a0al, train_loss_step=21
Epoch 0:  83%|▊| 637/766 [02:36<00:31,  4.07it/s, v_num=a0al, train_loss_step=18
Multinomial: 21.743696212768555, Poisson: -0.1048312559723854
Epoch 0:  83%|▊| 638/766 [02:36<00:31,  4.07it/s, v_num=a0al, train_loss_step=18
Epoch 0:  83%|▊| 638/766 [02:36<00:31,  4.07it/s, v_num=a0al, train_loss_step=21
Multinomial: 22.89417266845703, Poisson: -0.11054882407188416

Epoch 0:  83%|▊| 639/766 [02:36<00:31,  4.07it/s, v_num=a0al, train_loss_step=21
Epoch 0:  83%|▊| 639/766 [02:37<00:31,  4.07it/s, v_num=a0al, train_loss_step=22
Multinomial: 19.442575454711914, Poisson: -0.09320535510778427
Epoch 0:  84%|▊| 640/766 [02:37<00:30,  4.07it/s, v_num=a0al, train_loss_step=22
Epoch 0:  84%|▊| 640/766 [02:37<00:30,  4.07it/s, v_num=a0al, train_loss_step=19
Multinomial: 18.3289852142334, Poisson: -0.08738420903682709

Epoch 0:  84%|▊| 641/766 [02:37<00:30,  4.07it/s, v_num=a0al, train_loss_step=19
Epoch 0:  84%|▊| 641/766 [02:37<00:30,  4.07it/s, v_num=a0al, train_loss_step=18
Multinomial: 19.37784194946289, Poisson: -0.09329848736524582
Epoch 0:  84%|▊| 642/766 [02:37<00:30,  4.07it/s, v_num=a0al, train_loss_step=18
Epoch 0:  84%|▊| 642/766 [02:37<00:30,  4.07it/s, v_num=a0al, train_loss_step=19
Multinomial: 21.74251365661621, Poisson: -0.10472816228866577
Epoch 0:  84%|▊| 643/766 [02:37<00:30,  4.07it/s, v_num=a0al, train_loss_step=19
Epoch 0:  84%|▊| 643/766 [02:37<00:30,  4.07it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.053268432617188, Poisson: -0.09602104127407074

Epoch 0:  84%|▊| 644/766 [02:38<00:29,  4.07it/s, v_num=a0al, train_loss_step=21
Epoch 0:  84%|▊| 644/766 [02:38<00:29,  4.07it/s, v_num=a0al, train_loss_step=20
Multinomial: 20.585439682006836, Poisson: -0.0989774540066719
Epoch 0:  84%|▊| 645/766 [02:38<00:29,  4.07it/s, v_num=a0al, train_loss_step=20
Epoch 0:  84%|▊| 645/766 [02:38<00:29,  4.07it/s, v_num=a0al, train_loss_step=20
Multinomial: 22.32609748840332, Poisson: -0.10777314007282257

Epoch 0:  84%|▊| 646/766 [02:38<00:29,  4.07it/s, v_num=a0al, train_loss_step=20
Epoch 0:  84%|▊| 646/766 [02:38<00:29,  4.07it/s, v_num=a0al, train_loss_step=22
Multinomial: 22.32042694091797, Poisson: -0.10790190100669861
Epoch 0:  84%|▊| 647/766 [02:38<00:29,  4.07it/s, v_num=a0al, train_loss_step=22
Epoch 0:  84%|▊| 647/766 [02:38<00:29,  4.07it/s, v_num=a0al, train_loss_step=22
Multinomial: 18.8642578125, Poisson: -0.09032086282968521

Epoch 0:  85%|▊| 648/766 [02:39<00:28,  4.07it/s, v_num=a0al, train_loss_step=22
Epoch 0:  85%|▊| 648/766 [02:39<00:28,  4.07it/s, v_num=a0al, train_loss_step=18
Multinomial: 18.9396915435791, Poisson: -0.09038145840167999

Epoch 0:  85%|▊| 649/766 [02:39<00:28,  4.07it/s, v_num=a0al, train_loss_step=18
Epoch 0:  85%|▊| 649/766 [02:39<00:28,  4.07it/s, v_num=a0al, train_loss_step=18
Multinomial: 24.03581428527832, Poisson: -0.11642525345087051
Epoch 0:  85%|▊| 650/766 [02:39<00:28,  4.07it/s, v_num=a0al, train_loss_step=18
Epoch 0:  85%|▊| 650/766 [02:39<00:28,  4.07it/s, v_num=a0al, train_loss_step=23
Multinomial: 24.04706573486328, Poisson: -0.11630402505397797

Epoch 0:  85%|▊| 651/766 [02:39<00:28,  4.07it/s, v_num=a0al, train_loss_step=23
Epoch 0:  85%|▊| 651/766 [02:39<00:28,  4.07it/s, v_num=a0al, train_loss_step=23
Multinomial: 17.720582962036133, Poisson: -0.08459357917308807
Epoch 0:  85%|▊| 652/766 [02:40<00:27,  4.07it/s, v_num=a0al, train_loss_step=23
Epoch 0:  85%|▊| 652/766 [02:40<00:28,  4.07it/s, v_num=a0al, train_loss_step=17
Multinomial: 17.181501388549805, Poisson: -0.0816231518983841

Epoch 0:  85%|▊| 653/766 [02:40<00:27,  4.07it/s, v_num=a0al, train_loss_step=17
Epoch 0:  85%|▊| 653/766 [02:40<00:27,  4.07it/s, v_num=a0al, train_loss_step=17
Multinomial: 23.45564842224121, Poisson: -0.11357060074806213

Epoch 0:  85%|▊| 654/766 [02:40<00:27,  4.08it/s, v_num=a0al, train_loss_step=17
Epoch 0:  85%|▊| 654/766 [02:40<00:27,  4.07it/s, v_num=a0al, train_loss_step=23
Multinomial: 21.209218978881836, Poisson: -0.10190658271312714
Epoch 0:  86%|▊| 655/766 [02:40<00:27,  4.07it/s, v_num=a0al, train_loss_step=23
Epoch 0:  86%|▊| 655/766 [02:40<00:27,  4.07it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.60980224609375, Poisson: -0.09892372041940689

Epoch 0:  86%|▊| 656/766 [02:40<00:26,  4.08it/s, v_num=a0al, train_loss_step=21
Epoch 0:  86%|▊| 656/766 [02:41<00:27,  4.07it/s, v_num=a0al, train_loss_step=20
Multinomial: 21.1964054107666, Poisson: -0.10195201635360718
Epoch 0:  86%|▊| 657/766 [02:41<00:26,  4.08it/s, v_num=a0al, train_loss_step=20
Epoch 0:  86%|▊| 657/766 [02:41<00:26,  4.07it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.74530792236328, Poisson: -0.10479382425546646

Epoch 0:  86%|▊| 658/766 [02:41<00:26,  4.08it/s, v_num=a0al, train_loss_step=21
Epoch 0:  86%|▊| 658/766 [02:41<00:26,  4.07it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.213241577148438, Poisson: -0.10197531431913376

Epoch 0:  86%|▊| 659/766 [02:41<00:26,  4.08it/s, v_num=a0al, train_loss_step=21
Epoch 0:  86%|▊| 659/766 [02:41<00:26,  4.07it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.750818252563477, Poisson: -0.10478349030017853
Epoch 0:  86%|▊| 660/766 [02:42<00:26,  4.07it/s, v_num=a0al, train_loss_step=21
Epoch 0:  86%|▊| 660/766 [02:42<00:26,  4.07it/s, v_num=a0al, train_loss_step=21
Multinomial: 22.8967227935791, Poisson: -0.11057905852794647
Epoch 0:  86%|▊| 661/766 [02:42<00:25,  4.08it/s, v_num=a0al, train_loss_step=21
Epoch 0:  86%|▊| 661/766 [02:42<00:25,  4.07it/s, v_num=a0al, train_loss_step=22
Multinomial: 21.184062957763672, Poisson: -0.10184145718812943
Epoch 0:  86%|▊| 662/766 [02:42<00:25,  4.08it/s, v_num=a0al, train_loss_step=22
Epoch 0:  86%|▊| 662/766 [02:42<00:25,  4.07it/s, v_num=a0al, train_loss_step=21
Multinomial: 17.182111740112305, Poisson: -0.08155201375484467

Epoch 0:  87%|▊| 663/766 [02:42<00:25,  4.08it/s, v_num=a0al, train_loss_step=21
Epoch 0:  87%|▊| 663/766 [02:42<00:25,  4.07it/s, v_num=a0al, train_loss_step=17
Multinomial: 22.304018020629883, Poisson: -0.10763479024171829

Epoch 0:  87%|▊| 664/766 [02:42<00:25,  4.08it/s, v_num=a0al, train_loss_step=17
Epoch 0:  87%|▊| 664/766 [02:43<00:25,  4.07it/s, v_num=a0al, train_loss_step=22
Multinomial: 21.771034240722656, Poisson: -0.1047348603606224
Epoch 0:  87%|▊| 665/766 [02:43<00:24,  4.07it/s, v_num=a0al, train_loss_step=22
Epoch 0:  87%|▊| 665/766 [02:43<00:24,  4.07it/s, v_num=a0al, train_loss_step=21
Multinomial: 18.2939395904541, Poisson: -0.0873831957578659

Epoch 0:  87%|▊| 666/766 [02:43<00:24,  4.08it/s, v_num=a0al, train_loss_step=21
Epoch 0:  87%|▊| 666/766 [02:43<00:24,  4.07it/s, v_num=a0al, train_loss_step=18
Multinomial: 20.618026733398438, Poisson: -0.09891299903392792
Epoch 0:  87%|▊| 667/766 [02:43<00:24,  4.08it/s, v_num=a0al, train_loss_step=18
Epoch 0:  87%|▊| 667/766 [02:43<00:24,  4.07it/s, v_num=a0al, train_loss_step=20
Multinomial: 20.608810424804688, Poisson: -0.09906212240457535

Epoch 0:  87%|▊| 668/766 [02:43<00:24,  4.08it/s, v_num=a0al, train_loss_step=20
Epoch 0:  87%|▊| 668/766 [02:44<00:24,  4.07it/s, v_num=a0al, train_loss_step=20
Multinomial: 22.322099685668945, Poisson: -0.10757517069578171

Epoch 0:  87%|▊| 669/766 [02:44<00:23,  4.08it/s, v_num=a0al, train_loss_step=20
Epoch 0:  87%|▊| 669/766 [02:44<00:23,  4.07it/s, v_num=a0al, train_loss_step=22
Multinomial: 21.787355422973633, Poisson: -0.1049061119556427
Epoch 0:  87%|▊| 670/766 [02:44<00:23,  4.07it/s, v_num=a0al, train_loss_step=22
Epoch 0:  87%|▊| 670/766 [02:44<00:23,  4.07it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.766332626342773, Poisson: -0.10484164953231812

Epoch 0:  88%|▉| 671/766 [02:44<00:23,  4.08it/s, v_num=a0al, train_loss_step=21
Epoch 0:  88%|▉| 671/766 [02:44<00:23,  4.07it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.71084976196289, Poisson: -0.10478072613477707
Epoch 0:  88%|▉| 672/766 [02:44<00:23,  4.08it/s, v_num=a0al, train_loss_step=21
Epoch 0:  88%|▉| 672/766 [02:44<00:23,  4.07it/s, v_num=a0al, train_loss_step=21
Multinomial: 24.059356689453125, Poisson: -0.11647970974445343

Epoch 0:  88%|▉| 673/766 [02:45<00:22,  4.08it/s, v_num=a0al, train_loss_step=21
Epoch 0:  88%|▉| 673/766 [02:45<00:22,  4.07it/s, v_num=a0al, train_loss_step=23
Multinomial: 21.204574584960938, Poisson: -0.10201311111450195

Epoch 0:  88%|▉| 674/766 [02:45<00:22,  4.08it/s, v_num=a0al, train_loss_step=23
Epoch 0:  88%|▉| 674/766 [02:45<00:22,  4.07it/s, v_num=a0al, train_loss_step=21
Multinomial: 24.584774017333984, Poisson: -0.1193760558962822
Epoch 0:  88%|▉| 675/766 [02:45<00:22,  4.07it/s, v_num=a0al, train_loss_step=21
Epoch 0:  88%|▉| 675/766 [02:45<00:22,  4.07it/s, v_num=a0al, train_loss_step=24
Multinomial: 20.663442611694336, Poisson: -0.09896506369113922

Epoch 0:  88%|▉| 676/766 [02:45<00:22,  4.08it/s, v_num=a0al, train_loss_step=24
Epoch 0:  88%|▉| 676/766 [02:45<00:22,  4.07it/s, v_num=a0al, train_loss_step=20
Multinomial: 21.19459342956543, Poisson: -0.1019076555967331
Epoch 0:  88%|▉| 677/766 [02:46<00:21,  4.08it/s, v_num=a0al, train_loss_step=20
Epoch 0:  88%|▉| 677/766 [02:46<00:21,  4.07it/s, v_num=a0al, train_loss_step=21
Multinomial: 17.178815841674805, Poisson: -0.08161477744579315

Epoch 0:  89%|▉| 678/766 [02:46<00:21,  4.08it/s, v_num=a0al, train_loss_step=21
Epoch 0:  89%|▉| 678/766 [02:46<00:21,  4.07it/s, v_num=a0al, train_loss_step=17
Multinomial: 21.783733367919922, Poisson: -0.10473403334617615

Epoch 0:  89%|▉| 679/766 [02:46<00:21,  4.08it/s, v_num=a0al, train_loss_step=17
Epoch 0:  89%|▉| 679/766 [02:46<00:21,  4.07it/s, v_num=a0al, train_loss_step=21
Multinomial: 23.467281341552734, Poisson: -0.11346116662025452
Epoch 0:  89%|▉| 680/766 [02:46<00:21,  4.07it/s, v_num=a0al, train_loss_step=21
Epoch 0:  89%|▉| 680/766 [02:46<00:21,  4.07it/s, v_num=a0al, train_loss_step=23
Multinomial: 20.04427146911621, Poisson: -0.09604737907648087

Epoch 0:  89%|▉| 681/766 [02:47<00:20,  4.08it/s, v_num=a0al, train_loss_step=23
Epoch 0:  89%|▉| 681/766 [02:47<00:20,  4.07it/s, v_num=a0al, train_loss_step=19
Multinomial: 21.142990112304688, Poisson: -0.101886086165905
Epoch 0:  89%|▉| 682/766 [02:47<00:20,  4.08it/s, v_num=a0al, train_loss_step=19
Epoch 0:  89%|▉| 682/766 [02:47<00:20,  4.07it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.709651947021484, Poisson: -0.1048181876540184

Epoch 0:  89%|▉| 683/766 [02:47<00:20,  4.08it/s, v_num=a0al, train_loss_step=21
Epoch 0:  89%|▉| 683/766 [02:47<00:20,  4.07it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.052165985107422, Poisson: -0.09610036015510559

Epoch 0:  89%|▉| 684/766 [02:47<00:20,  4.08it/s, v_num=a0al, train_loss_step=21
Epoch 0:  89%|▉| 684/766 [02:47<00:20,  4.07it/s, v_num=a0al, train_loss_step=20
Multinomial: 15.995346069335938, Poisson: -0.07582859694957733
Epoch 0:  89%|▉| 685/766 [02:48<00:19,  4.07it/s, v_num=a0al, train_loss_step=20
Epoch 0:  89%|▉| 685/766 [02:48<00:19,  4.07it/s, v_num=a0al, train_loss_step=15
Multinomial: 18.30655288696289, Poisson: -0.08745232969522476

Epoch 0:  90%|▉| 686/766 [02:48<00:19,  4.08it/s, v_num=a0al, train_loss_step=15
Epoch 0:  90%|▉| 686/766 [02:48<00:19,  4.07it/s, v_num=a0al, train_loss_step=18
Multinomial: 20.636791229248047, Poisson: -0.09912166744470596
Epoch 0:  90%|▉| 687/766 [02:48<00:19,  4.08it/s, v_num=a0al, train_loss_step=18
Epoch 0:  90%|▉| 687/766 [02:48<00:19,  4.07it/s, v_num=a0al, train_loss_step=20
Multinomial: 21.205366134643555, Poisson: -0.10186842828989029

Epoch 0:  90%|▉| 688/766 [02:48<00:19,  4.08it/s, v_num=a0al, train_loss_step=20
Epoch 0:  90%|▉| 688/766 [02:48<00:19,  4.07it/s, v_num=a0al, train_loss_step=21
Multinomial: 22.31546401977539, Poisson: -0.10762669146060944

Epoch 0:  90%|▉| 689/766 [02:48<00:18,  4.08it/s, v_num=a0al, train_loss_step=21
Epoch 0:  90%|▉| 689/766 [02:49<00:18,  4.07it/s, v_num=a0al, train_loss_step=22
Multinomial: 22.30607795715332, Poisson: -0.10776594281196594
Epoch 0:  90%|▉| 690/766 [02:49<00:18,  4.08it/s, v_num=a0al, train_loss_step=22
Epoch 0:  90%|▉| 690/766 [02:49<00:18,  4.07it/s, v_num=a0al, train_loss_step=22
Multinomial: 17.707111358642578, Poisson: -0.08455497026443481
Epoch 0:  90%|▉| 691/766 [02:49<00:18,  4.08it/s, v_num=a0al, train_loss_step=22
Epoch 0:  90%|▉| 691/766 [02:49<00:18,  4.08it/s, v_num=a0al, train_loss_step=17
Multinomial: 20.056251525878906, Poisson: -0.09613558650016785
Epoch 0:  90%|▉| 692/766 [02:49<00:18,  4.08it/s, v_num=a0al, train_loss_step=17
Epoch 0:  90%|▉| 692/766 [02:49<00:18,  4.08it/s, v_num=a0al, train_loss_step=20
Multinomial: 17.71844482421875, Poisson: -0.0844632163643837

Epoch 0:  90%|▉| 693/766 [02:49<00:17,  4.08it/s, v_num=a0al, train_loss_step=20
Epoch 0:  90%|▉| 693/766 [02:50<00:17,  4.08it/s, v_num=a0al, train_loss_step=17
Multinomial: 20.011394500732422, Poisson: -0.09618280827999115

Epoch 0:  91%|▉| 694/766 [02:50<00:17,  4.08it/s, v_num=a0al, train_loss_step=17
Epoch 0:  91%|▉| 694/766 [02:50<00:17,  4.08it/s, v_num=a0al, train_loss_step=19
Multinomial: 21.220170974731445, Poisson: -0.1019996628165245
Epoch 0:  91%|▉| 695/766 [02:50<00:17,  4.08it/s, v_num=a0al, train_loss_step=19
Epoch 0:  91%|▉| 695/766 [02:50<00:17,  4.08it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.046058654785156, Poisson: -0.09611000120639801
Epoch 0:  91%|▉| 696/766 [02:50<00:17,  4.08it/s, v_num=a0al, train_loss_step=21
Epoch 0:  91%|▉| 696/766 [02:50<00:17,  4.08it/s, v_num=a0al, train_loss_step=19
Multinomial: 17.776384353637695, Poisson: -0.08445380628108978

Epoch 0:  91%|▉| 697/766 [02:50<00:16,  4.08it/s, v_num=a0al, train_loss_step=19
Epoch 0:  91%|▉| 697/766 [02:51<00:16,  4.08it/s, v_num=a0al, train_loss_step=17
Multinomial: 20.02248191833496, Poisson: -0.0961388349533081

Epoch 0:  91%|▉| 698/766 [02:51<00:16,  4.08it/s, v_num=a0al, train_loss_step=17
Epoch 0:  91%|▉| 698/766 [02:51<00:16,  4.08it/s, v_num=a0al, train_loss_step=19
Multinomial: 21.788034439086914, Poisson: -0.10468914359807968

Epoch 0:  91%|▉| 699/766 [02:51<00:16,  4.08it/s, v_num=a0al, train_loss_step=19
Epoch 0:  91%|▉| 699/766 [02:51<00:16,  4.08it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.033443450927734, Poisson: -0.09614822268486023
Epoch 0:  91%|▉| 700/766 [02:51<00:16,  4.08it/s, v_num=a0al, train_loss_step=21
Epoch 0:  91%|▉| 700/766 [02:51<00:16,  4.08it/s, v_num=a0al, train_loss_step=19
Multinomial: 19.43999671936035, Poisson: -0.0931275263428688
Epoch 0:  92%|▉| 701/766 [02:51<00:15,  4.08it/s, v_num=a0al, train_loss_step=19
Epoch 0:  92%|▉| 701/766 [02:51<00:15,  4.08it/s, v_num=a0al, train_loss_step=19
Multinomial: 20.02468490600586, Poisson: -0.09612933546304703

Epoch 0:  92%|▉| 702/766 [02:52<00:15,  4.08it/s, v_num=a0al, train_loss_step=19
Epoch 0:  92%|▉| 702/766 [02:52<00:15,  4.08it/s, v_num=a0al, train_loss_step=19
Multinomial: 17.720537185668945, Poisson: -0.08454470336437225

Epoch 0:  92%|▉| 703/766 [02:52<00:15,  4.08it/s, v_num=a0al, train_loss_step=19
Epoch 0:  92%|▉| 703/766 [02:52<00:15,  4.08it/s, v_num=a0al, train_loss_step=17
Multinomial: 22.856117248535156, Poisson: -0.11041641235351562

Epoch 0:  92%|▉| 704/766 [02:52<00:15,  4.08it/s, v_num=a0al, train_loss_step=17
Epoch 0:  92%|▉| 704/766 [02:52<00:15,  4.08it/s, v_num=a0al, train_loss_step=22
Multinomial: 21.749130249023438, Poisson: -0.10481631755828857
Epoch 0:  92%|▉| 705/766 [02:52<00:14,  4.08it/s, v_num=a0al, train_loss_step=22
Epoch 0:  92%|▉| 705/766 [02:52<00:14,  4.08it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.150577545166016, Poisson: -0.10191968083381653
Epoch 0:  92%|▉| 706/766 [02:53<00:14,  4.08it/s, v_num=a0al, train_loss_step=21
Epoch 0:  92%|▉| 706/766 [02:53<00:14,  4.08it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.73512077331543, Poisson: -0.10484417527914047

Epoch 0:  92%|▉| 707/766 [02:53<00:14,  4.08it/s, v_num=a0al, train_loss_step=21
Epoch 0:  92%|▉| 707/766 [02:53<00:14,  4.08it/s, v_num=a0al, train_loss_step=21
Multinomial: 17.735193252563477, Poisson: -0.08444802463054657

Epoch 0:  92%|▉| 708/766 [02:53<00:14,  4.08it/s, v_num=a0al, train_loss_step=21
Epoch 0:  92%|▉| 708/766 [02:53<00:14,  4.08it/s, v_num=a0al, train_loss_step=17
Multinomial: 21.171815872192383, Poisson: -0.10175672918558121

Epoch 0:  93%|▉| 709/766 [02:53<00:13,  4.08it/s, v_num=a0al, train_loss_step=17
Epoch 0:  93%|▉| 709/766 [02:53<00:13,  4.08it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.140235900878906, Poisson: -0.10202343016862869
Epoch 0:  93%|▉| 710/766 [02:54<00:13,  4.08it/s, v_num=a0al, train_loss_step=21
Epoch 0:  93%|▉| 710/766 [02:54<00:13,  4.08it/s, v_num=a0al, train_loss_step=21
Multinomial: 18.266754150390625, Poisson: -0.08740904927253723
Epoch 0:  93%|▉| 711/766 [02:54<00:13,  4.08it/s, v_num=a0al, train_loss_step=21
Epoch 0:  93%|▉| 711/766 [02:54<00:13,  4.08it/s, v_num=a0al, train_loss_step=18
Multinomial: 21.7435245513916, Poisson: -0.1047716736793518

Epoch 0:  93%|▉| 712/766 [02:54<00:13,  4.08it/s, v_num=a0al, train_loss_step=18
Epoch 0:  93%|▉| 712/766 [02:54<00:13,  4.08it/s, v_num=a0al, train_loss_step=21
Multinomial: 22.33485221862793, Poisson: -0.10773944854736328

Epoch 0:  93%|▉| 713/766 [02:54<00:12,  4.08it/s, v_num=a0al, train_loss_step=21
Epoch 0:  93%|▉| 713/766 [02:54<00:12,  4.08it/s, v_num=a0al, train_loss_step=22
Multinomial: 23.449464797973633, Poisson: -0.11353325098752975

Epoch 0:  93%|▉| 714/766 [02:54<00:12,  4.08it/s, v_num=a0al, train_loss_step=22
Epoch 0:  93%|▉| 714/766 [02:55<00:12,  4.08it/s, v_num=a0al, train_loss_step=23
Multinomial: 18.8591365814209, Poisson: -0.09025004506111145
Epoch 0:  93%|▉| 715/766 [02:55<00:12,  4.08it/s, v_num=a0al, train_loss_step=23
Epoch 0:  93%|▉| 715/766 [02:55<00:12,  4.08it/s, v_num=a0al, train_loss_step=18
Multinomial: 18.2924861907959, Poisson: -0.08760888129472733
Epoch 0:  93%|▉| 716/766 [02:55<00:12,  4.08it/s, v_num=a0al, train_loss_step=18
Epoch 0:  93%|▉| 716/766 [02:55<00:12,  4.08it/s, v_num=a0al, train_loss_step=18
Multinomial: 19.450326919555664, Poisson: -0.0932532325387001

Epoch 0:  94%|▉| 717/766 [02:55<00:12,  4.08it/s, v_num=a0al, train_loss_step=18
Epoch 0:  94%|▉| 717/766 [02:55<00:12,  4.08it/s, v_num=a0al, train_loss_step=19
Multinomial: 21.754545211791992, Poisson: -0.10489057004451752

Epoch 0:  94%|▉| 718/766 [02:55<00:11,  4.08it/s, v_num=a0al, train_loss_step=19
Epoch 0:  94%|▉| 718/766 [02:56<00:11,  4.08it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.621562957763672, Poisson: -0.09915497153997421

Epoch 0:  94%|▉| 719/766 [02:56<00:11,  4.08it/s, v_num=a0al, train_loss_step=21
Epoch 0:  94%|▉| 719/766 [02:56<00:11,  4.08it/s, v_num=a0al, train_loss_step=20
Multinomial: 20.644670486450195, Poisson: -0.09900769591331482
Epoch 0:  94%|▉| 720/766 [02:56<00:11,  4.08it/s, v_num=a0al, train_loss_step=20
Epoch 0:  94%|▉| 720/766 [02:56<00:11,  4.08it/s, v_num=a0al, train_loss_step=20
Multinomial: 19.482215881347656, Poisson: -0.09339278936386108
Epoch 0:  94%|▉| 721/766 [02:56<00:11,  4.08it/s, v_num=a0al, train_loss_step=20
Epoch 0:  94%|▉| 721/766 [02:56<00:11,  4.08it/s, v_num=a0al, train_loss_step=19
Multinomial: 21.218652725219727, Poisson: -0.10194966197013855

Epoch 0:  94%|▉| 722/766 [02:56<00:10,  4.08it/s, v_num=a0al, train_loss_step=19
Epoch 0:  94%|▉| 722/766 [02:57<00:10,  4.08it/s, v_num=a0al, train_loss_step=21
Multinomial: 18.85138511657715, Poisson: -0.09043137729167938

Epoch 0:  94%|▉| 723/766 [02:57<00:10,  4.08it/s, v_num=a0al, train_loss_step=21
Epoch 0:  94%|▉| 723/766 [02:57<00:10,  4.08it/s, v_num=a0al, train_loss_step=18
Multinomial: 21.170902252197266, Poisson: -0.10202258080244064

Epoch 0:  95%|▉| 724/766 [02:57<00:10,  4.08it/s, v_num=a0al, train_loss_step=18
Epoch 0:  95%|▉| 724/766 [02:57<00:10,  4.08it/s, v_num=a0al, train_loss_step=21
Multinomial: 21.220476150512695, Poisson: -0.10178679972887039
Epoch 0:  95%|▉| 725/766 [02:57<00:10,  4.08it/s, v_num=a0al, train_loss_step=21
Epoch 0:  95%|▉| 725/766 [02:57<00:10,  4.08it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.098186492919922, Poisson: -0.09609003365039825
Epoch 0:  95%|▉| 726/766 [02:57<00:09,  4.08it/s, v_num=a0al, train_loss_step=21
Epoch 0:  95%|▉| 726/766 [02:58<00:09,  4.08it/s, v_num=a0al, train_loss_step=20
Multinomial: 20.026958465576172, Poisson: -0.09608585387468338

Epoch 0:  95%|▉| 727/766 [02:58<00:09,  4.08it/s, v_num=a0al, train_loss_step=20
Epoch 0:  95%|▉| 727/766 [02:58<00:09,  4.08it/s, v_num=a0al, train_loss_step=19
Multinomial: 21.762968063354492, Poisson: -0.10483455657958984

Epoch 0:  95%|▉| 728/766 [02:58<00:09,  4.08it/s, v_num=a0al, train_loss_step=19
Epoch 0:  95%|▉| 728/766 [02:58<00:09,  4.08it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.0172061920166, Poisson: -0.0962311178445816

Epoch 0:  95%|▉| 729/766 [02:58<00:09,  4.08it/s, v_num=a0al, train_loss_step=21
Epoch 0:  95%|▉| 729/766 [02:58<00:09,  4.08it/s, v_num=a0al, train_loss_step=19
Multinomial: 20.594770431518555, Poisson: -0.09916572272777557
Epoch 0:  95%|▉| 730/766 [02:58<00:08,  4.08it/s, v_num=a0al, train_loss_step=19
Epoch 0:  95%|▉| 730/766 [02:58<00:08,  4.08it/s, v_num=a0al, train_loss_step=20
Multinomial: 18.910221099853516, Poisson: -0.09029804170131683
Epoch 0:  95%|▉| 731/766 [02:59<00:08,  4.08it/s, v_num=a0al, train_loss_step=20
Epoch 0:  95%|▉| 731/766 [02:59<00:08,  4.08it/s, v_num=a0al, train_loss_step=18
Multinomial: 20.605316162109375, Poisson: -0.09899063408374786

Epoch 0:  96%|▉| 732/766 [02:59<00:08,  4.08it/s, v_num=a0al, train_loss_step=18
Epoch 0:  96%|▉| 732/766 [02:59<00:08,  4.08it/s, v_num=a0al, train_loss_step=20
Multinomial: 17.684709548950195, Poisson: -0.08461694419384003

Epoch 0:  96%|▉| 733/766 [02:59<00:08,  4.08it/s, v_num=a0al, train_loss_step=20
Epoch 0:  96%|▉| 733/766 [02:59<00:08,  4.08it/s, v_num=a0al, train_loss_step=17
Multinomial: 19.47001838684082, Poisson: -0.09328693896532059
Epoch 0:  96%|▉| 734/766 [02:59<00:07,  4.08it/s, v_num=a0al, train_loss_step=17
Epoch 0:  96%|▉| 734/766 [02:59<00:07,  4.08it/s, v_num=a0al, train_loss_step=19
Multinomial: 23.442007064819336, Poisson: -0.11350321769714355
Epoch 0:  96%|▉| 735/766 [03:00<00:07,  4.08it/s, v_num=a0al, train_loss_step=19
Epoch 0:  96%|▉| 735/766 [03:00<00:07,  4.08it/s, v_num=a0al, train_loss_step=23
Multinomial: 20.5697021484375, Poisson: -0.0991271361708641

Epoch 0:  96%|▉| 736/766 [03:00<00:07,  4.08it/s, v_num=a0al, train_loss_step=23
Epoch 0:  96%|▉| 736/766 [03:00<00:07,  4.08it/s, v_num=a0al, train_loss_step=20
Multinomial: 20.006288528442383, Poisson: -0.09610603004693985

Epoch 0:  96%|▉| 737/766 [03:00<00:07,  4.08it/s, v_num=a0al, train_loss_step=20
Epoch 0:  96%|▉| 737/766 [03:00<00:07,  4.08it/s, v_num=a0al, train_loss_step=19
Multinomial: 23.479318618774414, Poisson: -0.11350422352552414

Epoch 0:  96%|▉| 738/766 [03:00<00:06,  4.08it/s, v_num=a0al, train_loss_step=19
Epoch 0:  96%|▉| 738/766 [03:00<00:06,  4.08it/s, v_num=a0al, train_loss_step=23
Multinomial: 18.866106033325195, Poisson: -0.09043769538402557
Epoch 0:  96%|▉| 739/766 [03:01<00:06,  4.08it/s, v_num=a0al, train_loss_step=23
Epoch 0:  96%|▉| 739/766 [03:01<00:06,  4.08it/s, v_num=a0al, train_loss_step=18
Multinomial: 22.89381980895996, Poisson: -0.11064435541629791
Epoch 0:  97%|▉| 740/766 [03:01<00:06,  4.08it/s, v_num=a0al, train_loss_step=18
Epoch 0:  97%|▉| 740/766 [03:01<00:06,  4.08it/s, v_num=a0al, train_loss_step=22
Multinomial: 18.364917755126953, Poisson: -0.0875319167971611

Epoch 0:  97%|▉| 741/766 [03:01<00:06,  4.08it/s, v_num=a0al, train_loss_step=22
Epoch 0:  97%|▉| 741/766 [03:01<00:06,  4.08it/s, v_num=a0al, train_loss_step=18
Multinomial: 19.492168426513672, Poisson: -0.09326649457216263

Epoch 0:  97%|▉| 742/766 [03:01<00:05,  4.08it/s, v_num=a0al, train_loss_step=18
Epoch 0:  97%|▉| 742/766 [03:01<00:05,  4.08it/s, v_num=a0al, train_loss_step=19
Multinomial: 18.37800407409668, Poisson: -0.08742114901542664

Epoch 0:  97%|▉| 743/766 [03:01<00:05,  4.08it/s, v_num=a0al, train_loss_step=19
Epoch 0:  97%|▉| 743/766 [03:02<00:05,  4.08it/s, v_num=a0al, train_loss_step=18
Multinomial: 20.02222442626953, Poisson: -0.09628087282180786
Epoch 0:  97%|▉| 744/766 [03:02<00:05,  4.08it/s, v_num=a0al, train_loss_step=18
Epoch 0:  97%|▉| 744/766 [03:02<00:05,  4.08it/s, v_num=a0al, train_loss_step=19
Multinomial: 17.798154830932617, Poisson: -0.08459708094596863
Epoch 0:  97%|▉| 745/766 [03:02<00:05,  4.08it/s, v_num=a0al, train_loss_step=19
Epoch 0:  97%|▉| 745/766 [03:02<00:05,  4.08it/s, v_num=a0al, train_loss_step=17
Multinomial: 22.28070068359375, Poisson: -0.10781168937683105

Epoch 0:  97%|▉| 746/766 [03:02<00:04,  4.08it/s, v_num=a0al, train_loss_step=17
Epoch 0:  97%|▉| 746/766 [03:02<00:04,  4.08it/s, v_num=a0al, train_loss_step=22
Multinomial: 22.335309982299805, Poisson: -0.10783626139163971

Epoch 0:  98%|▉| 747/766 [03:02<00:04,  4.08it/s, v_num=a0al, train_loss_step=22
Epoch 0:  98%|▉| 747/766 [03:03<00:04,  4.08it/s, v_num=a0al, train_loss_step=22
Multinomial: 19.987077713012695, Poisson: -0.09613754600286484

Epoch 0:  98%|▉| 748/766 [03:03<00:04,  4.08it/s, v_num=a0al, train_loss_step=22
Epoch 0:  98%|▉| 748/766 [03:03<00:04,  4.08it/s, v_num=a0al, train_loss_step=19
Multinomial: 17.21087074279785, Poisson: -0.08178546279668808
Epoch 0:  98%|▉| 749/766 [03:03<00:04,  4.08it/s, v_num=a0al, train_loss_step=19
Epoch 0:  98%|▉| 749/766 [03:03<00:04,  4.08it/s, v_num=a0al, train_loss_step=17
Multinomial: 20.066789627075195, Poisson: -0.09616988152265549
Epoch 0:  98%|▉| 750/766 [03:03<00:03,  4.08it/s, v_num=a0al, train_loss_step=17
Epoch 0:  98%|▉| 750/766 [03:03<00:03,  4.08it/s, v_num=a0al, train_loss_step=20
Multinomial: 22.929866790771484, Poisson: -0.11071749031543732

Epoch 0:  98%|▉| 751/766 [03:03<00:03,  4.08it/s, v_num=a0al, train_loss_step=20
Epoch 0:  98%|▉| 751/766 [03:04<00:03,  4.08it/s, v_num=a0al, train_loss_step=22
Multinomial: 18.374528884887695, Poisson: -0.08747097849845886

Epoch 0:  98%|▉| 752/766 [03:04<00:03,  4.08it/s, v_num=a0al, train_loss_step=22
Epoch 0:  98%|▉| 752/766 [03:04<00:03,  4.08it/s, v_num=a0al, train_loss_step=18
Multinomial: 23.452871322631836, Poisson: -0.11337430030107498
Epoch 0:  98%|▉| 753/766 [03:04<00:03,  4.08it/s, v_num=a0al, train_loss_step=18
Epoch 0:  98%|▉| 753/766 [03:04<00:03,  4.08it/s, v_num=a0al, train_loss_step=23
Multinomial: 22.317773818969727, Poisson: -0.10782989114522934
Epoch 0:  98%|▉| 754/766 [03:04<00:02,  4.08it/s, v_num=a0al, train_loss_step=23
Epoch 0:  98%|▉| 754/766 [03:04<00:02,  4.08it/s, v_num=a0al, train_loss_step=22
Multinomial: 21.17522621154785, Poisson: -0.10185651481151581
Epoch 0:  99%|▉| 755/766 [03:05<00:02,  4.08it/s, v_num=a0al, train_loss_step=22
Epoch 0:  99%|▉| 755/766 [03:05<00:02,  4.08it/s, v_num=a0al, train_loss_step=21
Multinomial: 18.87299156188965, Poisson: -0.09019719064235687

Epoch 0:  99%|▉| 756/766 [03:05<00:02,  4.08it/s, v_num=a0al, train_loss_step=21
Epoch 0:  99%|▉| 756/766 [03:05<00:02,  4.08it/s, v_num=a0al, train_loss_step=18
Multinomial: 20.004302978515625, Poisson: -0.09615175426006317

Epoch 0:  99%|▉| 757/766 [03:05<00:02,  4.08it/s, v_num=a0al, train_loss_step=18
Epoch 0:  99%|▉| 757/766 [03:05<00:02,  4.08it/s, v_num=a0al, train_loss_step=19
Multinomial: 19.4959716796875, Poisson: -0.09322664886713028
Epoch 0:  99%|▉| 758/766 [03:05<00:01,  4.08it/s, v_num=a0al, train_loss_step=19
Epoch 0:  99%|▉| 758/766 [03:05<00:01,  4.08it/s, v_num=a0al, train_loss_step=19
Multinomial: 20.062843322753906, Poisson: -0.09629540145397186

Epoch 0:  99%|▉| 759/766 [03:05<00:01,  4.08it/s, v_num=a0al, train_loss_step=19
Epoch 0:  99%|▉| 759/766 [03:05<00:01,  4.08it/s, v_num=a0al, train_loss_step=20
Multinomial: 20.584623336791992, Poisson: -0.09911467134952545
Epoch 0:  99%|▉| 760/766 [03:06<00:01,  4.08it/s, v_num=a0al, train_loss_step=20
Epoch 0:  99%|▉| 760/766 [03:06<00:01,  4.08it/s, v_num=a0al, train_loss_step=20
Multinomial: 21.18416404724121, Poisson: -0.10187393426895142

Epoch 0:  99%|▉| 761/766 [03:06<00:01,  4.08it/s, v_num=a0al, train_loss_step=20
Epoch 0:  99%|▉| 761/766 [03:06<00:01,  4.08it/s, v_num=a0al, train_loss_step=21
Multinomial: 20.091028213500977, Poisson: -0.0962648093700409

Epoch 0:  99%|▉| 762/766 [03:06<00:00,  4.08it/s, v_num=a0al, train_loss_step=21
Epoch 0:  99%|▉| 762/766 [03:06<00:00,  4.08it/s, v_num=a0al, train_loss_step=20
Multinomial: 21.143661499023438, Poisson: -0.10190434008836746
Epoch 0: 100%|▉| 763/766 [03:06<00:00,  4.08it/s, v_num=a0al, train_loss_step=20
Epoch 0: 100%|▉| 763/766 [03:06<00:00,  4.08it/s, v_num=a0al, train_loss_step=21
Multinomial: 22.87443733215332, Poisson: -0.11070720106363297

Epoch 0: 100%|▉| 764/766 [03:07<00:00,  4.08it/s, v_num=a0al, train_loss_step=21
Epoch 0: 100%|▉| 764/766 [03:07<00:00,  4.08it/s, v_num=a0al, train_loss_step=22
Multinomial: 25.80946159362793, Poisson: -0.12509529292583466
Epoch 0: 100%|▉| 765/766 [03:07<00:00,  4.08it/s, v_num=a0al, train_loss_step=22
Epoch 0: 100%|▉| 765/766 [03:07<00:00,  4.08it/s, v_num=a0al, train_loss_step=25
Multinomial: 19.509836196899414, Poisson: -0.09323473274707794
Epoch 0: 100%|█| 766/766 [03:07<00:00,  4.08it/s, v_num=a0al, train_loss_step=25
Epoch 0: 100%|█| 766/766 [03:07<00:00,  4.08it/s, v_num=a0al, train_loss_step=19
Validation: |                                             | 0/? [00:00<?, ?it/s]


Validation: |                                             | 0/? [00:00<?, ?it/s]

Validation DataLoader 0:   0%|                           | 0/71 [00:00<?, ?it/s]
Multinomial: 17.715787887573242, Poisson: -0.08436138182878494


Validation DataLoader 0:   1%|▎                  | 1/71 [00:00<00:06, 11.09it/s]
Multinomial: 17.200572967529297, Poisson: -0.08141561597585678


Validation DataLoader 0:   3%|▌                  | 2/71 [00:00<00:06, 11.25it/s]
Multinomial: 23.471235275268555, Poisson: -0.11336066573858261


Validation DataLoader 0:   4%|▊                  | 3/71 [00:00<00:05, 11.34it/s]
Multinomial: 18.440092086791992, Poisson: -0.08726721256971359


Validation DataLoader 0:   6%|█                  | 4/71 [00:00<00:05, 11.35it/s]
Multinomial: 21.81659698486328, Poisson: -0.10460898280143738


Validation DataLoader 0:   7%|█▎                 | 5/71 [00:00<00:05, 11.38it/s]
Multinomial: 17.75787353515625, Poisson: -0.08423992991447449


Validation DataLoader 0:   8%|█▌                 | 6/71 [00:00<00:05, 11.41it/s]
Multinomial: 17.787721633911133, Poisson: -0.08449624478816986


Validation DataLoader 0:  10%|█▊                 | 7/71 [00:00<00:05, 11.40it/s]
Multinomial: 22.402618408203125, Poisson: -0.10755792260169983


Validation DataLoader 0:  11%|██▏                | 8/71 [00:00<00:05, 11.41it/s]
Multinomial: 22.863603591918945, Poisson: -0.11044501513242722


Validation DataLoader 0:  13%|██▍                | 9/71 [00:00<00:05, 11.41it/s]
Multinomial: 20.60628890991211, Poisson: -0.09900801628828049


Validation DataLoader 0:  14%|██▌               | 10/71 [00:00<00:05, 11.40it/s]
Multinomial: 17.1425838470459, Poisson: -0.08161227405071259


Validation DataLoader 0:  15%|██▊               | 11/71 [00:00<00:05, 11.41it/s]
Multinomial: 21.7701358795166, Poisson: -0.10494677722454071


Validation DataLoader 0:  17%|███               | 12/71 [00:01<00:05, 11.41it/s]
Multinomial: 20.06895637512207, Poisson: -0.0960720032453537


Validation DataLoader 0:  18%|███▎              | 13/71 [00:01<00:05, 11.41it/s]
Multinomial: 21.843286514282227, Poisson: -0.10461314767599106


Validation DataLoader 0:  20%|███▌              | 14/71 [00:01<00:04, 11.41it/s]
Multinomial: 24.058631896972656, Poisson: -0.11612120270729065
Validation DataLoader 0:  21%|███▊              | 15/71 [00:01<00:04, 11.41it/s]
Multinomial: 20.56452178955078, Poisson: -0.09911813586950302


Validation DataLoader 0:  23%|████              | 16/71 [00:01<00:04, 11.40it/s]
Multinomial: 16.65570831298828, Poisson: -0.07856716960668564


Validation DataLoader 0:  24%|████▎             | 17/71 [00:01<00:04, 11.41it/s]
Multinomial: 22.95334243774414, Poisson: -0.11065673828125


Validation DataLoader 0:  25%|████▌             | 18/71 [00:01<00:04, 11.41it/s]
Multinomial: 20.04860496520996, Poisson: -0.09602082520723343


Validation DataLoader 0:  27%|████▊             | 19/71 [00:01<00:04, 11.40it/s]
Multinomial: 24.706954956054688, Poisson: -0.1192212849855423


Validation DataLoader 0:  28%|█████             | 20/71 [00:01<00:04, 11.40it/s]
Multinomial: 20.63813591003418, Poisson: -0.09894131869077682


Validation DataLoader 0:  30%|█████▎            | 21/71 [00:01<00:04, 11.41it/s]
Multinomial: 24.087890625, Poisson: -0.11592213809490204


Validation DataLoader 0:  31%|█████▌            | 22/71 [00:01<00:04, 11.41it/s]
Multinomial: 19.508792877197266, Poisson: -0.0931679755449295


Validation DataLoader 0:  32%|█████▊            | 23/71 [00:02<00:04, 11.41it/s]
Multinomial: 18.87944221496582, Poisson: -0.09000281989574432


Validation DataLoader 0:  34%|██████            | 24/71 [00:02<00:04, 11.41it/s]
Multinomial: 20.072032928466797, Poisson: -0.09605841338634491


Validation DataLoader 0:  35%|██████▎           | 25/71 [00:02<00:04, 11.41it/s]
Multinomial: 19.435930252075195, Poisson: -0.09318968653678894


Validation DataLoader 0:  37%|██████▌           | 26/71 [00:02<00:03, 11.41it/s]
Multinomial: 21.161197662353516, Poisson: -0.10190277546644211


Validation DataLoader 0:  38%|██████▊           | 27/71 [00:02<00:03, 11.41it/s]
Multinomial: 17.20120620727539, Poisson: -0.08157812058925629


Validation DataLoader 0:  39%|███████           | 28/71 [00:02<00:03, 11.41it/s]
Multinomial: 20.60438346862793, Poisson: -0.0989345908164978


Validation DataLoader 0:  41%|███████▎          | 29/71 [00:02<00:03, 11.42it/s]
Multinomial: 23.479650497436523, Poisson: -0.11340800672769547


Validation DataLoader 0:  42%|███████▌          | 30/71 [00:02<00:03, 11.42it/s]
Multinomial: 21.724428176879883, Poisson: -0.10450900346040726


Validation DataLoader 0:  44%|███████▊          | 31/71 [00:02<00:03, 11.42it/s]
Multinomial: 19.462779998779297, Poisson: -0.0931844487786293


Validation DataLoader 0:  45%|████████          | 32/71 [00:02<00:03, 11.42it/s]
Multinomial: 21.786657333374023, Poisson: -0.10474784672260284


Validation DataLoader 0:  46%|████████▎         | 33/71 [00:02<00:03, 11.42it/s]
Multinomial: 19.443769454956055, Poisson: -0.09325771033763885


Validation DataLoader 0:  48%|████████▌         | 34/71 [00:02<00:03, 11.42it/s]
Multinomial: 19.488693237304688, Poisson: -0.09301599860191345


Validation DataLoader 0:  49%|████████▊         | 35/71 [00:03<00:03, 11.42it/s]
Multinomial: 23.99990463256836, Poisson: -0.11622511595487595


Validation DataLoader 0:  51%|█████████▏        | 36/71 [00:03<00:03, 11.42it/s]
Multinomial: 16.03451156616211, Poisson: -0.07577572762966156
Validation DataLoader 0:  52%|█████████▍        | 37/71 [00:03<00:02, 11.42it/s]
Multinomial: 22.938905715942383, Poisson: -0.11054838448762894


Validation DataLoader 0:  54%|█████████▋        | 38/71 [00:03<00:02, 11.42it/s]
Multinomial: 20.075075149536133, Poisson: -0.09620349854230881


Validation DataLoader 0:  55%|█████████▉        | 39/71 [00:03<00:02, 11.42it/s]
Multinomial: 24.006479263305664, Poisson: -0.11629815399646759


Validation DataLoader 0:  56%|██████████▏       | 40/71 [00:03<00:02, 11.42it/s]
Multinomial: 18.275978088378906, Poisson: -0.08737660944461823


Validation DataLoader 0:  58%|██████████▍       | 41/71 [00:03<00:02, 11.42it/s]
Multinomial: 15.497014045715332, Poisson: -0.07280221581459045


Validation DataLoader 0:  59%|██████████▋       | 42/71 [00:03<00:02, 11.43it/s]
Multinomial: 22.40738296508789, Poisson: -0.10748593509197235


Validation DataLoader 0:  61%|██████████▉       | 43/71 [00:03<00:02, 11.43it/s]
Multinomial: 24.083242416381836, Poisson: -0.11616585403680801


Validation DataLoader 0:  62%|███████████▏      | 44/71 [00:03<00:02, 11.43it/s]
Multinomial: 22.93613624572754, Poisson: -0.11035705357789993


Validation DataLoader 0:  63%|███████████▍      | 45/71 [00:03<00:02, 11.43it/s]
Multinomial: 21.766260147094727, Poisson: -0.10466591268777847


Validation DataLoader 0:  65%|███████████▋      | 46/71 [00:04<00:02, 11.43it/s]
Multinomial: 21.18513298034668, Poisson: -0.10168717801570892


Validation DataLoader 0:  66%|███████████▉      | 47/71 [00:04<00:02, 11.43it/s]
Multinomial: 24.081628799438477, Poisson: -0.11622646450996399
Validation DataLoader 0:  68%|████████████▏     | 48/71 [00:04<00:02, 11.43it/s]
Multinomial: 18.872467041015625, Poisson: -0.09034885466098785


Validation DataLoader 0:  69%|████████████▍     | 49/71 [00:04<00:01, 11.43it/s]
Multinomial: 18.83187484741211, Poisson: -0.08994244039058685


Validation DataLoader 0:  70%|████████████▋     | 50/71 [00:04<00:01, 11.43it/s]
Multinomial: 20.023244857788086, Poisson: -0.09600641578435898


Validation DataLoader 0:  72%|████████████▉     | 51/71 [00:04<00:01, 11.44it/s]
Multinomial: 21.718595504760742, Poisson: -0.1045961007475853


Validation DataLoader 0:  73%|█████████████▏    | 52/71 [00:04<00:01, 11.44it/s]
Multinomial: 23.44754409790039, Poisson: -0.11339467763900757


Validation DataLoader 0:  75%|█████████████▍    | 53/71 [00:04<00:01, 11.44it/s]
Multinomial: 21.757465362548828, Poisson: -0.10470139235258102


Validation DataLoader 0:  76%|█████████████▋    | 54/71 [00:04<00:01, 11.44it/s]
Multinomial: 21.18140983581543, Poisson: -0.10165048390626907


Validation DataLoader 0:  77%|█████████████▉    | 55/71 [00:04<00:01, 11.44it/s]
Multinomial: 18.28765296936035, Poisson: -0.0873810350894928


Validation DataLoader 0:  79%|██████████████▏   | 56/71 [00:04<00:01, 11.44it/s]
Multinomial: 20.64971923828125, Poisson: -0.09891148656606674


Validation DataLoader 0:  80%|██████████████▍   | 57/71 [00:04<00:01, 11.44it/s]
Multinomial: 20.557477951049805, Poisson: -0.0990000069141388


Validation DataLoader 0:  82%|██████████████▋   | 58/71 [00:05<00:01, 11.44it/s]
Multinomial: 24.666433334350586, Poisson: -0.11910754442214966
Validation DataLoader 0:  83%|██████████████▉   | 59/71 [00:05<00:01, 11.44it/s]
Multinomial: 23.435176849365234, Poisson: -0.11322159320116043


Validation DataLoader 0:  85%|███████████████▏  | 60/71 [00:05<00:00, 11.44it/s]
Multinomial: 21.201955795288086, Poisson: -0.10189700126647949


Validation DataLoader 0:  86%|███████████████▍  | 61/71 [00:05<00:00, 11.45it/s]
Multinomial: 16.05048942565918, Poisson: -0.0757397785782814


Validation DataLoader 0:  87%|███████████████▋  | 62/71 [00:05<00:00, 11.45it/s]
Multinomial: 20.027244567871094, Poisson: -0.09596437960863113


Validation DataLoader 0:  89%|███████████████▉  | 63/71 [00:05<00:00, 11.45it/s]
Multinomial: 20.64202308654785, Poisson: -0.09875722229480743


Validation DataLoader 0:  90%|████████████████▏ | 64/71 [00:05<00:00, 11.45it/s]
Multinomial: 21.188655853271484, Poisson: -0.1018335297703743


Validation DataLoader 0:  92%|████████████████▍ | 65/71 [00:05<00:00, 11.45it/s]
Multinomial: 22.29461669921875, Poisson: -0.10748331248760223


Validation DataLoader 0:  93%|████████████████▋ | 66/71 [00:05<00:00, 11.45it/s]
Multinomial: 19.94753646850586, Poisson: -0.09584959596395493


Validation DataLoader 0:  94%|████████████████▉ | 67/71 [00:05<00:00, 11.45it/s]
Multinomial: 17.73777961730957, Poisson: -0.08445792645215988


Validation DataLoader 0:  96%|█████████████████▏| 68/71 [00:05<00:00, 11.45it/s]
Multinomial: 21.220571517944336, Poisson: -0.10187307000160217


Validation DataLoader 0:  97%|█████████████████▍| 69/71 [00:06<00:00, 11.45it/s]
Multinomial: 20.63282585144043, Poisson: -0.09883726388216019
Validation DataLoader 0:  99%|█████████████████▋| 70/71 [00:06<00:00, 11.45it/s]
Multinomial: 24.04494857788086, Poisson: -0.11639108508825302


Validation DataLoader 0: 100%|██████████████████| 71/71 [00:06<00:00, 11.45it/s]

                                                                                
Epoch 0: 100%|█| 766/766 [03:14<00:00,  3.93it/s, v_num=a0al, train_loss_step=19
Epoch 0: 100%|█| 766/766 [03:14<00:00,  3.93it/s, v_num=a0al, train_loss_step=19
`Trainer.fit` stopped: `max_epochs=1` reached.

Epoch 0: 100%|█| 766/766 [03:20<00:00,  3.83it/s, v_num=a0al, train_loss_step=19
wandb: 
wandb: 🚀 View run finetune_test_0 at: 

# Uncomment if necessary
# import wandb
# wandb.login(host="https://genentech.wandb.io", anonymous="never", relogin=True)

8. Make and evaluate predictions using trained models

Using the training commands above, we trained two model replicates. Now, we can use these models to predict gene expression:

checkpoint = glob.glob(os.path.join(outdir, "lightning_logs/*/checkpoints/*.ckpt"))[0]
print(checkpoint)
./example/lightning_logs/g20ya0al/checkpoints/epoch=0-step=154.ckpt
# comma-separated list of model checkpoints
checkpoint_list = ",".join([checkpoint, checkpoint])
checkpoint_list
'./example/lightning_logs/g20ya0al/checkpoints/epoch=0-step=154.ckpt,./example/lightning_logs/g20ya0al/checkpoints/epoch=0-step=154.ckpt'
! CUDA_VISIBLE_DEVICES=0 decima predict-genes \
--output example/test_preds.h5ad \
--model {checkpoint_list} \
--metadata {ad_file_path} \
--device 0 \
--batch-size 8 \
--num-workers 32 \
--max_seq_shift 0 \
--genome hg38 \
--save-replicates
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'repr' attribute with value False was provided to the `Field()` function, which has no effect in the context it was used. 'repr' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:2249: UnsupportedFieldAttributeWarning: The 'frozen' attribute with value True was provided to the `Field()` function, which has no effect in the context it was used. 'frozen' is field-specific metadata, and can only be attached to a model field using `Annotated` metadata or by assignment. This may have happened because an `Annotated` type alias using the `type` statement was used, or if the `Field()` function was attached to a single member of a union type.
  warnings.warn(
decima - INFO - Using device: 0 and genome: hg38 for prediction.
decima - INFO - Loading model ['./example/lightning_logs/g20ya0al/checkpoints/epoch=0-step=154.ckpt', './example/lightning_logs/g20ya0al/checkpoints/epoch=0-step=154.ckpt']...
decima - INFO - Making predictions
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/torch/__init__.py:1617: UserWarning: Please use the new API settings to control TF32 behavior, such as torch.backends.cudnn.conv.fp32_precision = 'tf32' or torch.backends.cuda.matmul.fp32_precision = 'ieee'. Old settings, e.g, torch.backends.cuda.matmul.allow_tf32 = True, torch.backends.cudnn.allow_tf32 = True, allowTF32CuDNN() and allowTF32CuBLAS() will be deprecated after Pytorch 2.9. Please see https://pytorch.org/docs/main/notes/cuda.html#tensorfloat-32-tf32-on-ampere-and-later-devices (Triggered internally at /pytorch/aten/src/ATen/Context.cpp:80.)
💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/torch/utils/data/dataloader.py:627: UserWarning: This DataLoader will create 32 worker processes in total. Our suggested max number of worker in current system is 4, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
SLURM auto-requeueing enabled. Setting signal handlers.
Predicting: |                                             | 0/? [00:00<?, ?it/s]
Predicting: |                                             | 0/? [00:00<?, ?it/s]
Predicting DataLoader 0:   0%|                          | 0/115 [00:00<?, ?it/s]
Predicting DataLoader 0:   1%|▏                 | 1/115 [00:03<06:53,  0.28it/s]
Predicting DataLoader 0:   2%|▎                 | 2/115 [00:05<04:55,  0.38it/s]
Predicting DataLoader 0:   3%|▍                 | 3/115 [00:06<04:18,  0.43it/s]
Predicting DataLoader 0:   3%|▋                 | 4/115 [00:08<03:59,  0.46it/s]
Predicting DataLoader 0:   4%|▊                 | 5/115 [00:10<03:47,  0.48it/s]
Predicting DataLoader 0:   5%|▉                 | 6/115 [00:12<03:38,  0.50it/s]
Predicting DataLoader 0:   6%|█                 | 7/115 [00:13<03:31,  0.51it/s]
Predicting DataLoader 0:   7%|█▎                | 8/115 [00:15<03:26,  0.52it/s]
Predicting DataLoader 0:   8%|█▍                | 9/115 [00:17<03:21,  0.53it/s]
Predicting DataLoader 0:   9%|█▍               | 10/115 [00:18<03:17,  0.53it/s]
Predicting DataLoader 0:  10%|█▋               | 11/115 [00:20<03:14,  0.54it/s]
Predicting DataLoader 0:  10%|█▊               | 12/115 [00:22<03:10,  0.54it/s]
Predicting DataLoader 0:  11%|█▉               | 13/115 [00:23<03:07,  0.54it/s]
Predicting DataLoader 0:  12%|██               | 14/115 [00:25<03:05,  0.55it/s]
Predicting DataLoader 0:  13%|██▏              | 15/115 [00:27<03:02,  0.55it/s]
Predicting DataLoader 0:  14%|██▎              | 16/115 [00:29<02:59,  0.55it/s]
Predicting DataLoader 0:  15%|██▌              | 17/115 [00:30<02:57,  0.55it/s]
Predicting DataLoader 0:  16%|██▋              | 18/115 [00:32<02:54,  0.55it/s]
Predicting DataLoader 0:  17%|██▊              | 19/115 [00:34<02:52,  0.56it/s]
Predicting DataLoader 0:  17%|██▉              | 20/115 [00:35<02:50,  0.56it/s]
Predicting DataLoader 0:  18%|███              | 21/115 [00:37<02:48,  0.56it/s]
Predicting DataLoader 0:  19%|███▎             | 22/115 [00:39<02:46,  0.56it/s]
Predicting DataLoader 0:  20%|███▍             | 23/115 [00:40<02:43,  0.56it/s]
Predicting DataLoader 0:  21%|███▌             | 24/115 [00:42<02:41,  0.56it/s]
Predicting DataLoader 0:  22%|███▋             | 25/115 [00:44<02:39,  0.56it/s]
Predicting DataLoader 0:  23%|███▊             | 26/115 [00:46<02:37,  0.56it/s]
Predicting DataLoader 0:  23%|███▉             | 27/115 [00:47<02:35,  0.56it/s]
Predicting DataLoader 0:  24%|████▏            | 28/115 [00:49<02:33,  0.57it/s]
Predicting DataLoader 0:  25%|████▎            | 29/115 [00:51<02:31,  0.57it/s]
Predicting DataLoader 0:  26%|████▍            | 30/115 [00:52<02:29,  0.57it/s]
Predicting DataLoader 0:  27%|████▌            | 31/115 [00:54<02:28,  0.57it/s]
Predicting DataLoader 0:  28%|████▋            | 32/115 [00:56<02:26,  0.57it/s]
Predicting DataLoader 0:  29%|████▉            | 33/115 [00:58<02:24,  0.57it/s]
Predicting DataLoader 0:  30%|█████            | 34/115 [00:59<02:22,  0.57it/s]
Predicting DataLoader 0:  30%|█████▏           | 35/115 [01:01<02:20,  0.57it/s]
Predicting DataLoader 0:  31%|█████▎           | 36/115 [01:03<02:18,  0.57it/s]
Predicting DataLoader 0:  32%|█████▍           | 37/115 [01:04<02:16,  0.57it/s]
Predicting DataLoader 0:  33%|█████▌           | 38/115 [01:06<02:14,  0.57it/s]
Predicting DataLoader 0:  34%|█████▊           | 39/115 [01:08<02:13,  0.57it/s]
Predicting DataLoader 0:  35%|█████▉           | 40/115 [01:10<02:11,  0.57it/s]
Predicting DataLoader 0:  36%|██████           | 41/115 [01:11<02:09,  0.57it/s]
Predicting DataLoader 0:  37%|██████▏          | 42/115 [01:13<02:07,  0.57it/s]
Predicting DataLoader 0:  37%|██████▎          | 43/115 [01:15<02:05,  0.57it/s]
Predicting DataLoader 0:  38%|██████▌          | 44/115 [01:16<02:03,  0.57it/s]
Predicting DataLoader 0:  39%|██████▋          | 45/115 [01:18<02:02,  0.57it/s]
Predicting DataLoader 0:  40%|██████▊          | 46/115 [01:20<02:00,  0.57it/s]
Predicting DataLoader 0:  41%|██████▉          | 47/115 [01:21<01:58,  0.57it/s]
Predicting DataLoader 0:  42%|███████          | 48/115 [01:23<01:56,  0.57it/s]
Predicting DataLoader 0:  43%|███████▏         | 49/115 [01:25<01:54,  0.57it/s]
Predicting DataLoader 0:  43%|███████▍         | 50/115 [01:27<01:53,  0.57it/s]
Predicting DataLoader 0:  44%|███████▌         | 51/115 [01:28<01:51,  0.57it/s]
Predicting DataLoader 0:  45%|███████▋         | 52/115 [01:30<01:49,  0.57it/s]
Predicting DataLoader 0:  46%|███████▊         | 53/115 [01:32<01:47,  0.57it/s]
Predicting DataLoader 0:  47%|███████▉         | 54/115 [01:33<01:46,  0.58it/s]
Predicting DataLoader 0:  48%|████████▏        | 55/115 [01:35<01:44,  0.58it/s]
Predicting DataLoader 0:  49%|████████▎        | 56/115 [01:37<01:42,  0.58it/s]
Predicting DataLoader 0:  50%|████████▍        | 57/115 [01:39<01:40,  0.58it/s]
Predicting DataLoader 0:  50%|████████▌        | 58/115 [01:40<01:38,  0.58it/s]
Predicting DataLoader 0:  51%|████████▋        | 59/115 [01:42<01:37,  0.58it/s]
Predicting DataLoader 0:  52%|████████▊        | 60/115 [01:44<01:35,  0.58it/s]
Predicting DataLoader 0:  53%|█████████        | 61/115 [01:45<01:33,  0.58it/s]
Predicting DataLoader 0:  54%|█████████▏       | 62/115 [01:47<01:31,  0.58it/s]
Predicting DataLoader 0:  55%|█████████▎       | 63/115 [01:49<01:30,  0.58it/s]
Predicting DataLoader 0:  56%|█████████▍       | 64/115 [01:50<01:28,  0.58it/s]
Predicting DataLoader 0:  57%|█████████▌       | 65/115 [01:52<01:26,  0.58it/s]
Predicting DataLoader 0:  57%|█████████▊       | 66/115 [01:54<01:24,  0.58it/s]
Predicting DataLoader 0:  58%|█████████▉       | 67/115 [01:56<01:23,  0.58it/s]
Predicting DataLoader 0:  59%|██████████       | 68/115 [01:57<01:21,  0.58it/s]
Predicting DataLoader 0:  60%|██████████▏      | 69/115 [01:59<01:19,  0.58it/s]
Predicting DataLoader 0:  61%|██████████▎      | 70/115 [02:01<01:17,  0.58it/s]
Predicting DataLoader 0:  62%|██████████▍      | 71/115 [02:02<01:16,  0.58it/s]
Predicting DataLoader 0:  63%|██████████▋      | 72/115 [02:04<01:14,  0.58it/s]
Predicting DataLoader 0:  63%|██████████▊      | 73/115 [02:06<01:12,  0.58it/s]
Predicting DataLoader 0:  64%|██████████▉      | 74/115 [02:08<01:10,  0.58it/s]
Predicting DataLoader 0:  65%|███████████      | 75/115 [02:09<01:09,  0.58it/s]
Predicting DataLoader 0:  66%|███████████▏     | 76/115 [02:11<01:07,  0.58it/s]
Predicting DataLoader 0:  67%|███████████▍     | 77/115 [02:13<01:05,  0.58it/s]
Predicting DataLoader 0:  68%|███████████▌     | 78/115 [02:14<01:03,  0.58it/s]
Predicting DataLoader 0:  69%|███████████▋     | 79/115 [02:16<01:02,  0.58it/s]
Predicting DataLoader 0:  70%|███████████▊     | 80/115 [02:18<01:00,  0.58it/s]
Predicting DataLoader 0:  70%|███████████▉     | 81/115 [02:19<00:58,  0.58it/s]
Predicting DataLoader 0:  71%|████████████     | 82/115 [02:21<00:57,  0.58it/s]
Predicting DataLoader 0:  72%|████████████▎    | 83/115 [02:23<00:55,  0.58it/s]
Predicting DataLoader 0:  73%|████████████▍    | 84/115 [02:25<00:53,  0.58it/s]
Predicting DataLoader 0:  74%|████████████▌    | 85/115 [02:26<00:51,  0.58it/s]
Predicting DataLoader 0:  75%|████████████▋    | 86/115 [02:28<00:50,  0.58it/s]
Predicting DataLoader 0:  76%|████████████▊    | 87/115 [02:30<00:48,  0.58it/s]
Predicting DataLoader 0:  77%|█████████████    | 88/115 [02:31<00:46,  0.58it/s]
Predicting DataLoader 0:  77%|█████████████▏   | 89/115 [02:33<00:44,  0.58it/s]
Predicting DataLoader 0:  78%|█████████████▎   | 90/115 [02:35<00:43,  0.58it/s]
Predicting DataLoader 0:  79%|█████████████▍   | 91/115 [02:37<00:41,  0.58it/s]
Predicting DataLoader 0:  80%|█████████████▌   | 92/115 [02:38<00:39,  0.58it/s]
Predicting DataLoader 0:  81%|█████████████▋   | 93/115 [02:40<00:37,  0.58it/s]
Predicting DataLoader 0:  82%|█████████████▉   | 94/115 [02:42<00:36,  0.58it/s]
Predicting DataLoader 0:  83%|██████████████   | 95/115 [02:43<00:34,  0.58it/s]
Predicting DataLoader 0:  83%|██████████████▏  | 96/115 [02:45<00:32,  0.58it/s]
Predicting DataLoader 0:  84%|██████████████▎  | 97/115 [02:47<00:31,  0.58it/s]
Predicting DataLoader 0:  85%|██████████████▍  | 98/115 [02:48<00:29,  0.58it/s]
Predicting DataLoader 0:  86%|██████████████▋  | 99/115 [02:50<00:27,  0.58it/s]
Predicting DataLoader 0:  87%|█████████████▉  | 100/115 [02:52<00:25,  0.58it/s]
Predicting DataLoader 0:  88%|██████████████  | 101/115 [02:54<00:24,  0.58it/s]
Predicting DataLoader 0:  89%|██████████████▏ | 102/115 [02:55<00:22,  0.58it/s]
Predicting DataLoader 0:  90%|██████████████▎ | 103/115 [02:57<00:20,  0.58it/s]
Predicting DataLoader 0:  90%|██████████████▍ | 104/115 [02:59<00:18,  0.58it/s]
Predicting DataLoader 0:  91%|██████████████▌ | 105/115 [03:00<00:17,  0.58it/s]
Predicting DataLoader 0:  92%|██████████████▋ | 106/115 [03:02<00:15,  0.58it/s]
Predicting DataLoader 0:  93%|██████████████▉ | 107/115 [03:04<00:13,  0.58it/s]
Predicting DataLoader 0:  94%|███████████████ | 108/115 [03:06<00:12,  0.58it/s]
Predicting DataLoader 0:  95%|███████████████▏| 109/115 [03:07<00:10,  0.58it/s]
Predicting DataLoader 0:  96%|███████████████▎| 110/115 [03:09<00:08,  0.58it/s]
Predicting DataLoader 0:  97%|███████████████▍| 111/115 [03:11<00:06,  0.58it/s]
Predicting DataLoader 0:  97%|███████████████▌| 112/115 [03:12<00:05,  0.58it/s]
Predicting DataLoader 0:  98%|███████████████▋| 113/115 [03:14<00:03,  0.58it/s]
Predicting DataLoader 0:  99%|███████████████▊| 114/115 [03:16<00:01,  0.58it/s]
Predicting DataLoader 0: 100%|████████████████| 115/115 [03:18<00:00,  0.58it/s]
Predicting DataLoader 0: 100%|████████████████| 115/115 [03:18<00:00,  0.58it/s]
/home/celikm5/miniforge3/envs/decima2/lib/python3.11/site-packages/torchmetrics/utilities/prints.py:43: UserWarning: The ``compute`` method of metric WarningCounter was called before the ``update`` method which may lead to errors, as metric states have not yet been updated.
decima - INFO - Creating anndata
decima - INFO - Evaluating performance
Performance on genes in the train dataset.
Mean Pearson Correlation per gene: Mean: 0.01.
Mean Pearson Correlation per gene using size factor (baseline): 0.03.
Mean Pearson Correlation per pseudobulk:  0.00

Performance on genes in the val dataset.
Mean Pearson Correlation per gene: Mean: -0.01.
Mean Pearson Correlation per gene using size factor (baseline): 0.06.
Mean Pearson Correlation per pseudobulk: -0.01

Performance on genes in the test dataset.
Mean Pearson Correlation per gene: Mean: -0.02.
Mean Pearson Correlation per gene using size factor (baseline): -0.00.
Mean Pearson Correlation per pseudobulk: -0.02

We can open the output h5ad file to see the individual predictions and metrics.

ad_out = anndata.read_h5ad("example/test_preds.h5ad")
ad_out
AnnData object with n_obs × n_vars = 50 × 920
    obs: 'cell_type', 'tissue', 'disease', 'study', 'size_factor', 'train_pearson', 'val_pearson', 'test_pearson'
    var: 'chrom', 'start', 'end', 'strand', 'gene_start', 'gene_end', 'gene_length', 'gene_mask_start', 'gene_mask_end', 'dataset', 'pearson', 'size_factor_pearson'
    layers: 'preds', 'preds_finetune_test_0'

.layers['preds_0'] and .layers['preds_1'] contain the predictions made by the individual models whereas .layers['preds_0'] contains the average predictions. You will see that performance metrics have been added to both .obs and .var.

ad_out.obs.head()
cell_type tissue disease study size_factor train_pearson val_pearson test_pearson
pseudobulk_0 ct_0 t_0 d_0 st_0 4946.397461 0.010020 0.171944 0.122095
pseudobulk_1 ct_0 t_0 d_1 st_0 4858.091797 -0.024151 0.061900 -0.169406
pseudobulk_2 ct_0 t_0 d_2 st_1 4921.185547 0.007005 -0.079252 -0.094602
pseudobulk_3 ct_0 t_0 d_0 st_1 4928.486816 0.016869 -0.023038 0.007967
pseudobulk_4 ct_0 t_0 d_1 st_2 4756.819336 0.050297 0.160398 -0.101163
ad_out.var.head()
chrom start end strand gene_start gene_end gene_length gene_mask_start gene_mask_end dataset pearson size_factor_pearson
gene_0 chr1 26191000 26715288 + 26354840 26879128 524288 163840 524288 train 0.177304 -0.062494
gene_1 chr19 41275257 41799545 - 41111417 41635705 524288 163840 524288 train 0.049450 -0.037428
gene_2 chr1 79937866 80462154 - 79774026 80298314 524288 163840 524288 train -0.095439 0.240203
gene_4 chr16 3905208 4429496 - 3741368 4265656 524288 163840 524288 train -0.092946 -0.042283
gene_5 chr10 22495641 23019929 + 22659481 23183769 524288 163840 524288 train -0.310151 -0.069181