Genome Minimizer 2

VAE-powered pipeline for generating minimal E. coli genomes. Models are trained on a binary gene presence/absence matrix of ~10,000 E. coli strains across ~55,000 genes.

Model Variants

Each preset is stored on its own branch:

Branch Architecture Loss Functions Description
v0 55,039 β†’ 1024 β†’ 64 Recon + KL (linear) Baseline VAE
v1 55,039 β†’ 512 β†’ 32 Recon + KL (linear) + Abundance + L1 + gene frequency control
v2 55,039 β†’ 512 β†’ 32 Recon + KL (cosine) + Abundance + L1 Improved convergence
v3 55,039 β†’ 512 β†’ 32 Recon + KL (cosine) + Weighted Abundance + L1 Best minimal genomes

Quick Start

from huggingface_hub import hf_hub_download
import torch
from src.genome_minimizer_2.training.model import VAE

# Download v3 (best for minimal genomes)
path = hf_hub_download("McClain/genome-minimizer-2", "final.pt", revision="v3")
checkpoint = torch.load(path, map_location="cpu")

model = VAE(input_dim=55039, hidden_dim=512, latent_dim=32)
model.load_state_dict(checkpoint["model_state_dict"])

Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support