Feature matrix

Exhaustive list of everything gapsmith implements, one row per feature, with pointers to both the upstream R source and the Rust module.

Status legend:

  • βœ… β€” implemented and tested against real gapseq where feasible
  • πŸ†• β€” Rust-only feature (no upstream equivalent)
  • ⚠️ β€” shipped but intentionally deviates from upstream (see porting-notes.md)
  • ❌ β€” deferred / intentionally not ported

1. Subcommands

SubcommandR sourceRust moduleStatus
gapsmith testβ€”gapsmith-cli/src/commands/test.rsβœ…
gapsmith findsrc/gapseq_find.sh + src/*.Rgapsmith-find/ + gapsmith-cli/src/commands/find.rsβœ… byte-identical on PWY-6587 & amino
gapsmith find-transportsrc/transporter.sh + src/analyse_alignments_transport.Rgapsmith-transport/ + gapsmith-cli/src/commands/find_transport.rsβœ… TC-set + row-count identical
gapsmith draftsrc/generate_GSdraft.R + src/prepare_candidate_reaction_tables.Rgapsmith-draft/ + gapsmith-cli/src/commands/draft.rsβœ… SBML validates (0 libSBML errors)
gapsmith mediumsrc/predict_medium.Rgapsmith-medium/ + gapsmith-cli/src/commands/medium.rsβœ… byte-identical on ecoli
gapsmith fillsrc/gf.suite.R + src/gapfill4.Rgapsmith-fill/ + gapsmith-cli/src/commands/fill.rsβœ… 4-phase suite + KO loop
gapsmith adaptsrc/adapt.R + src/gf.adapt.Rgapsmith-cli/src/commands/adapt.rs⚠️ EC / KEGG / name resolution deferred
gapsmith pansrc/pan-draft.Rgapsmith-cli/src/commands/pan.rsβœ… union + binary table
gapsmith doallsrc/doall.shgapsmith-cli/src/commands/doall.rsβœ… end-to-end on ecore in 2m47s
gapsmith update-sequencessrc/update_sequences.shgapsmith-cli/src/commands/update_sequences.rsβœ… Zenodo sync + md5 diff
gapsmith convertβ€”gapsmith-cli/src/commands/convert.rsπŸ†• CBOR ↔ JSON round-trip
gapsmith example-modelβ€”gapsmith-cli/src/commands/example_model.rsπŸ†• toy model fixture
gapsmith db inspectβ€”gapsmith-cli/src/commands/db.rsπŸ†• reference-data row-count dump
gapsmith export-sbmlcobrar::writeSBMLmodgapsmith-cli/src/commands/export_sbml.rsπŸ†• CBOR β†’ SBML
gapsmith alignβ€”gapsmith-cli/src/commands/align.rsπŸ†• debug-wrap for a single aligner
gapsmith batch-alignβ€”gapsmith-cli/src/commands/batch_align.rsπŸ†• cluster N genomes + single alignment
gapsmith doall-batchβ€”gapsmith-cli/src/commands/doall_batch.rsπŸ†• rayon + SLURM-shard parallel doall across N genomes
gapsmith community per-magβ€”gapsmith-cli/src/commands/community.rsπŸ†• per-MAG FBA under a shared (union) medium
gapsmith community cfbaβ€”gapsmith-cli/src/commands/community.rsπŸ†• compose N drafts, weighted-sum biomass objective
gapsmith fbaβ€”gapsmith-cli/src/commands/fba.rsπŸ†• FBA / pFBA standalone

2. Core algorithms

2.1 Alignment layer (gapsmith-align)

FeatureR sourceRust module
BLASTp wrappergapseq_find.sh blastp blockblast.rs::BlastpAligner
tBLASTn wrappersame, for -n nuclblast.rs::TblastnAligner
DIAMOND wrappergapseq_find.sh diamond blockdiamond.rs::DiamondAligner
mmseqs2 wrapper (full pipeline)gapseq_find.sh mmseqs blockmmseqs2.rs::Mmseqs2Aligner
Precomputed TSV inputβ€”precomputed.rs::PrecomputedTsvAligner πŸ†•
Batch-cluster (N genomes β†’ 1 alignment)β€”batch.rs::BatchClusterAligner πŸ†•
gspa-run manifest reader (cluster-aware hit expansion)β€”gspa.rs::{GspaManifest, GspaRunAligner} πŸ†•
2-decimal scientific e-value formatBLAST -outfmt 6 nativetsv.rs

2.2 find pipeline (gapsmith-find)

FeatureR sourceRust module
Pathway table loader (meta / kegg / seed / custom)gapseq_find.sh:520-532gapsmith-db::PathwayTable
metacyc + custom merge (custom-wins-on-id)samesame
Keyword-shorthand resolution (amino, carbo, ...)gapseq_find.sh:40-60pathways.rs::MatchMode::Hierarchy
Reference FASTA resolver (user/ β†’ rxn/ β†’ rev/EC β†’ unrev/EC β†’ md5)prepare_batch_alignments.R:150-234seqfile.rs
Complex-subunit detectioncomplex_detection.Rcomplex.rs (R-parity on 9 cases)
Hit classification with exception tableanalyse_alignments.R:108-189classify.rs
Pathway completeness scoring (f64 precision)filter_pathways.R:10-34pathways.rs::score
dbhit lookup (EC + altEC + MetaCyc id + enzyme name)getDBhit.R:60-130dbhit.rs
noSuperpathways=true defaultgapseq_find.sh:20find::FindOptions
Word-boundary-less header filter (matches shell grep -Fivf)gapseq_find.shseqfile.rs ⚠️ intentional
Output writers (Reactions.tbl, Pathways.tbl)sameoutput.rs

2.3 find-transport pipeline (gapsmith-transport)

FeatureR sourceRust module
subex.tbl substrate filtertransporter.sh:140-280filter.rs
TC-id parsing + type canonicalisationanalyse_alignments_transport.R:1-188tc.rs
Substrate resolution (tcdb_all + FASTA header fallback)samerunner.rs
Alt-transporter reaction assignment (gated by --nouse-alternatives)analyse_alignments_transport.R:110-130runner.rs
Substrate-case preservation (gapseq emits sub=Potassium)shell behaviourdata.rs

2.4 Draft model builder (gapsmith-draft)

FeatureR sourceRust module
Candidate selection (bitscore β‰₯ cutoff OR pathway support)prepare_candidate_reaction_tables.R + generate_GSdraft.R:55-100candidate.rs
Stoichiometric hash dedupgenerate_rxn_stoich_hash.Rstoich_hash.rs
Best-status-across-rows (OR is_complex, max complex_status, highest-rank pathway_status)implicit in R's data.table mergescandidate.rs::build_candidates ⚠️ explicit
Biomass JSON parser (single + pipe-separated multi-link)parse_BMjson.R:1-107biomass.rs + gapsmith-db::BiomassComponent::links
Biomass cofactor mass-rescalinggenerate_GSdraft.R:281-292biomass.rs ⚠️ menaquinone-8 auto-removal deferred
GPR composition (and / or tree, "subunit undefined" edge cases)get_gene_logic_string.Rgpr.rs
Diffusion + exchange expansionadd_missing_exRxns.R:1-156exchanges.rs
Conditional transporter additions (butyrate, IPA, PPA, phloretate)generate_GSdraft.Rrunner.rs::add_conditional_transporters
SBML ID sanitiser (-/./:/space β†’ _)β€”builder.rs πŸ†•
Cytosolic met-id format (cpd00001_c0 not cpd00001[c0])β€”builder.rs ⚠️ for SBML SId compliance

2.5 FBA / pFBA solver (gapsmith-fill)

FeatureR sourceRust module
Split-flux LP encoding (vp, vn β‰₯ 0)implicit in cobrar::pfbaHeuristiclp.rs::SplitFluxLp
FBAcobrar::fbafba.rs::fba
pFBA (single call)cobrar::pfbaHeuristicpfba.rs::pfba
pFBA-heuristic tolerance ladder (15 iters, 1e-6 β†’ 1e-9, pFBA-coef relaxation)gapfill4.R:95-137pfba.rs::pfba_heuristic
HiGHS solverβ€” (R uses glpk/cplex)good_lp 1.15 + highs-sys
CBC fallbackβ€”pfba.rs::pfba_cbc (feature-gated cbc) πŸ†•
Row-expression builder (O(nnz))implicitfba.rs::build_row_exprs πŸ†• performance

2.6 Gap-filling (gapsmith-fill)

FeatureR sourceRust module
gapfill4 single-iteration drivergapfill4.R:1-303gapfill.rs::gapfill4
Candidate pool (draft + all approved SEED, stoich-hash deduped)construct_full_model.R + gapfill4.R:12-56pool.rs::build_full_model
rxnWeights derivation from bitscoresprepare_candidate_reaction_tables.R:222-228pool.rs::rxn_weight
KO essentiality loop (serial, core-first, highest-weight-first)gapfill4.R:247-280gapfill.rs::gapfill4
Medium application (close all EX, open per-medium, add missing EX)constrain.model.Rmedium.rs::apply_medium
Environment overrides (env_highH2.tsv)adjust_model_env.Rmedium.rs::apply_environment_file
Step 1 (user medium + biomass target)gf.suite.R:244-258suite.rs::run_suite
Step 2 (per-biomass-component on MM_glu + carbon sources)gf.suite.R:285-372suite.rs::step2
Step 2b (aerobic / anaerobic variant)gf.suite.R:377-464suite.rs::run_suite
Step 3 (energy-source screen with ESP1-5)gf.suite.R:480-581suite.rs::step3
Step 4 (fermentation-product screen)gf.suite.R:585-683suite.rs::step4
Target-met sink as objectiveadd_met_sink in add_missing_exRxns.R:56-72suite.rs::add_target_sink_obj
Futile-cycle detector (parallel pairwise LP probe)recent upstream cccbb6f0futile.rs::detect_futile_cycles (opt-in --prune-futile)
Community model composition (block-diagonal, shared _e0)β€”community.rs::compose_models πŸ†•
Weighted-sum community biomass + optional balanced-growthβ€”community.rs::add_community_biomass πŸ†•
Union-medium + per-MAG weights (community per-mag mode)β€”community.rs::{union_medium, per_mag_weights, weighted_growth} πŸ†•

2.7 Medium inference (gapsmith-medium)

FeatureR sourceRust module
Rules-table loaderpredict_medium.R:46rules.rs::load_rules
Boolean-expression evaluator (| & ! < > == <= >=)eval(parse(text=))boolexpr.rs::eval
Counting-rule support (a + b + c < 3)same (R int arithmetic)boolexpr.rs::parse_sum
Cross-rule dedup + mean fluxpredict_medium.R:84-86predict.rs::predict_medium
Saccharides / Organic acids category deduppredict_medium.R:88-92predict.rs::predict_medium
Manual flux overridespredict_medium.R:94-114predict.rs::parse_manual_flux
Proton balancerpredict_medium.R:121-132predict.rs::predict_medium

2.8 Serialisation (gapsmith-sbml, gapsmith-io)

FeatureR sourceRust module
CBOR round-tripβ€”gapsmith-io::{read,write}_model_cbor πŸ†•
JSON round-tripβ€”gapsmith-io::{read,write}_model_json πŸ†•
SBML L3V1 + FBC2 + groups writercobrar::writeSBMLmodgapsmith-sbml::write_sbml
SBML SId idempotent on mets with compartment suffixβ€”writer.rs::species_id πŸ†• bugfix
Streaming via quick-xmlβ€”writer.rs πŸ†• no libSBML dep
SBML consistency validationlibSBML nativetools/validate_sbml.py (libSBML + COBRApy)

2.9 Reference-data loaders (gapsmith-db)

FeatureR sourceRust module
seed_reactions_corrected.tsvdata.table::freadseed.rs::load_seed_reactions
seed_metabolites_edited.tsvsameseed.rs::load_seed_metabolites
MNXref cross-refs (mnxref_*.tsv)samemnxref.rs
meta_pwy.tbl / kegg_pwy.tbl / seed_pwy.tbl / custom_pwy.tblsamepathway.rs
subex.tblsamesubex.rs
tcdb.tsvsametcdb.rs
exception.tblsameexception.rs
medium_prediction_rules.tsvsamegapsmith-medium::rules
complex_subunit_dict.tsvsamecomplex.rs
Biomass JSON (Gram+, Gram-, archaea, user custom)parse_BMjson.Rbiomass.rs
SEED stoichiometry parser (-1:cpd00001:0:0:"H2O";...)parse_BMjson.R:21-29stoich_parse.rs

3. New Rust-only features

FeatureRust moduleMotivation
Precomputed alignment input (--aligner precomputed -P <tsv>)gapsmith-align::PrecomputedTsvAlignerSkip per-genome BLAST when the user pre-runs diamond / mmseqs2 at batch scale
BatchClusterAligner (gapsmith batch-align)gapsmith-align::BatchClusterAlignerAmortise alignment cost over N genomes via one mmseqs2 cluster + single alignment
gspa-run manifest reader (--gspa-run <dir>)gapsmith-align::GspaRunAlignerConsume precomputed cluster-rep hits from the upstream gspa pipeline; fans rep hits onto per-genome members
gapsmith doall-batchgapsmith-cli::commands::doall_batchRayon + SLURM-array-friendly driver for reconstructing 100 β†’ 1 M genomes in one batch
gapsmith community per-maggapsmith-fill::community + CLIShared-medium per-MAG FBA for metagenomes with 50+ MAGs
gapsmith community cfbagapsmith-fill::community + CLIFull community LP (block-diagonal compose, weighted-sum biomass, optional balanced-growth)
In-process LP (HiGHS via good_lp)gapsmith-fillReplaces R cobrar's shelled-out glpk / cplex; faster warm-starts
Optional CBC fallback backendgapsmith-fill::pfba_cbc (--features cbc)When HiGHS exhausts the tolerance ladder on pathological LPs
CBOR native formatgapsmith-ioFast, compact, stdlib-free; replaces R's RDS
gapsmith fba subcommandgapsmith-cliStandalone FBA / pFBA without shelling into R
gapsmith convert subcommandgapsmith-cliCBOR ↔ JSON round-trip for inspection
gapsmith db inspect subcommandgapsmith-cliSmoke-test the reference data directory
gapsmith export-sbml subcommandgapsmith-cliWrite an arbitrary CBOR model as SBML

4. Known gaps (deferred)

GapUpstream locationWorkaround / plan
EC / TC conflict resolution (IRanges overlap math)prepare_candidate_reaction_tables.R::resolve_common_{EC,TC}_conflictsAffects <1 % of multi-EC annotations. Plan: port when a user case needs it.
MIRIAM cross-ref annotations (KEGG / BiGG / MetaNetX / HMDB / ChEBI)addReactAttr.R + addMetAttr.RSBML emits ModelSEED id only; round-trip in COBRApy still works.
HMM-based taxonomy / gram predictionpredict_domain.R, predict_gramstaining.RCLI requires explicit `--taxonomy Bacteria
Gene-name MD5 fallback in seqfile resolveruniprot.sh:179Common-case MD5 fallback is ported; gene-name branch rarely fires.
Menaquinone-8 auto-removal (gated on MENAQUINONESYN-PWY / PWY-5852 / PWY-5837)generate_GSdraft.R:281-292Bio1 includes cpd15500 regardless; affects anaerobic predictions marginally.
gram_by_network.R (predict gram by metabolic-network similarity)sameRequires explicit `-b pos
adapt EC / KEGG / enzyme-name resolutionadapt.R::ids2seed strategies 3–7Direct SEED + pathway id resolution works; user can pre-resolve via gapsmith find.
pan weight medianing (custom_median)pan-draft_functions.RPan-model emits without merged rxnWeights metadata; gapsmith fill on a pan-draft needs the source Reactions.tbl.
CPLEX solver supportβ€”Plan explicitly calls for HiGHS + optional CBC; no CPLEX path.
MetaCyc DB updaters (meta2pwy.py, meta2genes.py, meta2rea.py)upstream Python helpersRun once per year by maintainers; kept in Python.

5. Testing surface

Test suiteTestsWhat it asserts
gapsmith-core unit18Type invariants, serde round-trips, stoichiometric matrix construction.
gapsmith-io unit5CBOR / JSON round-trip, data-dir auto-detect.
gapsmith-db unit18Every reference-data parser on realistic inputs.
gapsmith-sbml unit + integration2 + 1SBML writer emits every FBC2 / groups element; libSBML validates cleanly.
gapsmith-align unit + smoke + parity18 + 4 + 3Aligner trait, precomputed TSV, gspa-run manifest + fan-out, BLAST / diamond / mmseqs2 shell parity.
gapsmith-find unit + smoke + parity36 + 1 + 2Pathway scoring, complex detection (R-parity on 9 cases), find -p PWY-6587 and -p amino byte-identical against real gapseq.
gapsmith-transport unit + parity7 + 1TC parsing, substrate resolution, end-to-end row+TC-id parity against real gapseq.
gapsmith-draft unit + smoke10 + 1Biomass rescaling, GPR composition, stoich dedup, conditional transporters.
gapsmith-fill unit + textbook + smoke20 + 5 + 1FBA / pFBA / pFBA-heuristic on toys; community compose + cFBA on 2-organism toy; gapfill4 end-to-end on ecoli draft.
gapsmith-medium unit14Boolean-expression evaluator (incl. counting rules), rule loader, cross-rule dedup, proton balance.
gapsmith-cli integration6 + 1CBOR↔JSON round-trip end-to-end via the binary; SLURM-shard parser.
Total~170

Run the full suite:

cargo test --workspace
cargo clippy --workspace --all-targets -- -D warnings

6. LOC breakdown

Cratesrc/ LOCtests/ LOC
gapsmith-core~1 000β€”
gapsmith-io~330β€”
gapsmith-db~1 600β€”
gapsmith-sbml~870~220
gapsmith-align~1 250~560
gapsmith-find~2 700~380
gapsmith-transport~1 040~115
gapsmith-draft~1 670~80
gapsmith-fill~2 150~400
gapsmith-medium~550β€”
gapsmith-cli~2 400~60
Total~17 000~2 100