User guide
This page introduces how DRAKKAR is organized, what kinds of inputs it expects, and where to find the detailed workflow and operations documentation.
Quickstart
Run the complete pipeline with a sample info table:
$ drakkar complete -f input_info.tsv -o drakkar_output
Run the complete pipeline using a directory of reads:
$ drakkar complete -i /path/to/reads -o drakkar_output
Core concepts
Modules: DRAKKAR can be run end-to-end with
drakkar completeor as independent modules such as preprocessing, cataloging, profiling, annotating, expressing, dereplicating, inspecting, database, status, logging, config, and transfer.Output directory: all outputs are written under
-o/--outputand organized into predictable module-specific folders.Profiles: use
-p/--profileto select a Snakemake profile. The default isslurm.Environments: use
-e/--env_pathto select a shared Conda environment directory.Run logs: every workflow run writes a metadata file
drakkar_YYYYMMDD-HHMMSS.yamland captures Snakemake stdout/stderr inlog/drakkar_<run_id>.snakemake.log.Locked runs: output-writing workflows support
--overwriteto delete a locked output directory and rerun after a broken Snakemake session.
Input formats
You can provide inputs as read directories or as a sample info table.
Directory input
Provide a directory with paired-end reads. DRAKKAR expects matching read-pair
names such as *_1.fq.gz and *_2.fq.gz.
$ drakkar preprocessing -i /path/to/reads -o drakkar_output
Sample info table (TSV)
A tab-separated table can include any of these columns. Only the columns needed for the chosen workflow are required.
sample: sample name.rawreads1: path or URL to R1 reads (raw, before preprocessing).rawreads2: path or URL to R2 reads (raw, before preprocessing).accession: ENA/SRA paired-end run accession such asERR4303216orSRR12345678. Use this instead ofrawreads1andrawreads2when you want DRAKKAR to download the read pair automatically.preprocessedreads1: explicit path to quality-filtered R1 reads for use in cataloging. Takes priority over all other read columns. See Cataloging read resolution below.preprocessedreads2: explicit path to quality-filtered R2 reads for use in cataloging. Must be provided together withpreprocessedreads1.reference_name: host reference label for host-removal workflows.reference_path: local path or URL to a host FASTA, or to a tarball containing the FASTA plus Bowtie2 index files.assembly: labels defining assembly groups. Legacycoassemblyis still accepted.coverage: labels defining coverage-sharing groups for multicoverage cataloging.
Example:
sample\trawreads1\trawreads2\taccession\treference_name\treference_path\tassembly\tcoverage
sample1\tpath/sample1_1.fq.gz\tpath/sample1_2.fq.gz\t\tref1\tpath/ref1.fna\tassembly1,all\tcoverage1
sample2\t\t\tERR4303216\tref1\tpath/ref1.fna\tassembly2,all\tcoverage2
Input notes
Read files can be local paths or remote URLs (http/https/ftp/sftp).
Sample tables can also use an
accessioncolumn with ENA/SRA paired-end run accessions; DRAKKAR resolves and downloads the matching R1 and R2 FASTQ files automatically.-r/--reference,-x/--reference-index, andreference_pathvalues can be local files or remote URLs.Reference inputs may be FASTA files, compressed FASTA files, or tarballs containing a FASTA plus Bowtie2 index files.
Genome lists passed through options such as
-B/--bins_filecan also use remote URLs; DRAKKAR caches them locally before execution.Directory-style inputs such as
-i/--inputand-b/--bins_dirmust be local filesystem paths.Before Snakemake starts, DRAKKAR checks downloaded and local input files for existence and non-zero size. Remote downloads retry up to five times with exponential backoff;
sftp://URLs requirecurlwith SFTP support.The preferred sample-table column name is
assembly. The legacy column namecoassemblyis still accepted.Assembly labels can be any identifiers you choose; they do not need to match sample names.
-m individualadds per-sample assemblies alongside grouped assemblies.-b/--binnersselects the binners used in cataloging. Use a comma-separated list ofmetabat,maxbin,semibin, andcomebin; the default is all four.--multicoveragemaps samples sharing the same coverage label to each other’s individual assemblies.
Cataloging read resolution
When drakkar cataloging (or drakkar complete) loads a sample info table
with -f/--file, it resolves the reads to use for assembly and mapping in the
following priority order for each sample:
``preprocessedreads1`` / ``preprocessedreads2`` columns — if both are present the cataloging workflow uses these paths directly. This is the explicit override for cases where preprocessed reads live outside the default output tree.
``preprocessing/final/<sample>_1.fq.gz`` — if neither
preprocessedreads1norpreprocessedreads2is supplied but a priordrakkar preprocessingrun has already written quality-filtered reads into the output directory, cataloging detects and uses them automatically. This is the typical case when running cataloging as a follow-up step after preprocessing in the same output directory.``rawreads1`` / ``rawreads2`` or ``accession`` — fallback to raw input paths. This path is taken when neither preprocessed column is present and no
preprocessing/final/files are found. The assembly will run directly on unfiltered reads.
This means you can keep a single input table that contains raw read paths (or
accessions) together with assembly and coverage grouping columns, and
cataloging will automatically pick up the quality-filtered reads from a
completed preprocessing run without any changes to the file.
Guide map
Use the next pages depending on what you need:
Topic |
Where to go next |
|---|---|
Running the complete workflow or a specific module |
See Workflow guide. |
Databases, logging, config, transfer, outputs, and troubleshooting |
|
Command list only |
See CLI Reference. |