nf-core/detaxizer

A pipeline to identify (and remove) certain sequences from raw genomic data. Default taxon to identify (and remove) is Homo sapiens. Removal is optional.

de-identificationdecontaminationednafastqfilterlong-readsmetabarcodingmetagenomicsmicrobiomenanoporeshort-readsshotguntaxonomic-classificationtaxonomic-profiling

Launch version 1.3.0 https://github.com/nf-core/detaxizer

Version history

Download .zip Download .tar.gz View on GitHub

Summary of changes

Alternative filtering tool with --filtering_tool bbmap (default as previous versions with seqkit)
Added peer-reviewed paper as pipeline citation
Updated nf-core modules software versions
Updated nf-core template (now 3.4.1)

Detailed changes

`Added`

PR #80 - bbmap/filterbyname.sh natively handles pair-end reads, eliminating the steps required to modify fastq headers, improves run time, choose via --filtering_tool bbmap (by @m3hdad)

`Changed`

PR #79,PR #87 - Update pipeline version (by @d4straub)
PR #81 - Update citation from preprint to peer-reviewed publication (by @d4straub)
PR #84 - Template update for nf-core/tools 3.4.1 (by @d4straub)

`Fixed`

PR #90 - Update minimum nextflow version to 25.04.8 to fix conda (by @d4straub)

`Dependencies`

PR #86 - Update nf-core modules (by @d4straub)

Software	Previous version	New version
MultiQC	1.31	1.32
Kraken2	2.1.3	2.1.6
fastp	0.23.4	1.0.1
BLAST	2.15.0	2.17.0
bbmap	39.10	39.18
nextflow	25.04.0	25.04.8

Download .zip Download .tar.gz View on GitHub

Summary of changes

filtering is set now by default
defaults reflect best settings from benchmarking human decontamination
improvements to memory and time requirements

Detailed changes

`Added`

PR #70 - Filtering is now default, --skip_filter was added (by @d4straub)
PR #71 - Add usage information learned from our benchmarking (by @d4straub)

`Changed`

PR #65,PR #69 - Template update for nf-core/tools 3.3.2 (by @d4straub)
PR #72 - Default for --kraken2db was changed from ‘https://genome-idx.s3.amazonaws.com/kraken/k2_standard_08gb_20240904.tar.gz’ to ‘https://genome-idx.s3.amazonaws.com/kraken/k2_standard_20240605.tar.gz’. That database is much larger (60GB) but default settings will therefore reflect best decontamination performance in benchmarks (by @d4straub)
PR #73 - Doubled memory allocation for ISOLATE_BBDUK_IDS (by @d4straub)
PR #75 - Updated version and contributors (by @d4straub)

`Fixed`

PR #62 - Use dnaio to reduce memory spikes during renaming (by @bede)
PR #77 - Fixed conda versions to exactly follow container versions (by @d4straub)
PR #78 - Update typos, re-add code comments (by @d4straub)

`Dependencies`

Software	Previous version	New version
MultiQC	1.27	1.29
tar	1.3	1.34

`Removed`

PR #70 - Filtering is now default, --enable_filter was removed and replaced by --skip_filter (by @d4straub)

Download .zip Download .tar.gz View on GitHub

`Added`

PR #34 - Added bbduk to the classification step (kraken2 as default, both can be run together) (by @jannikseidelQBiC)
PR #34 - Added --fasta_bbduk parameter to provide a fasta file with contaminants (by @jannikseidelQBiC)
PR #34 - Rewrote summary step of classification to be usable with bbduk and/or kraken2 (by @jannikseidelQBiC)
PR #34 - Made preprocessing with fastp optional and added the parameter --fastp_eval_duplication to turn on duplication removal (off as default, was on/not changeable in v1.0.0) (by @jannikseidelQBiC)
PR #34 - Optionally the removed reads can now be written to the output folder (by @jannikseidelQBiC)
PR #34 - Added optional classification of filtered and removed reads via kraken2 (by @jannikseidelQBiC)
PR #39 - Added generation of input samplesheet for nf-core/mag, nf-core/taxprofiler (by @Joon-Klaps)

Parameters

Added parameters:

Parameter
`--fasta_bbduk`
`--preprocessing`
`--output_removed_reads`
`--classification_kraken2`
`--classification_bbduk`
`--kraken2confidence_filtered`
`--kraken2confidence_removed`
`--classification_kraken2_post_filtering`
`--fastp_eval_duplication`
`--bbduk_kmers`

Changed default values of parameters:

Parameter	Old default value	New default value
`--fastp_cut_mean_quality`	15	1
`--kraken2db`	’https://genome-idx.s3.amazonaws.com/kraken/k2_standard_08gb_20231009.tar.gz'	'https://genome-idx.s3.amazonaws.com/kraken/k2_standard_08gb_20240605.tar.gz’
`--kraken2confidence`	0.05	0.00
`--tax2filter`	’Homo'	'Homo sapiens’
`--cutoff_tax2filter`	2	0
`--cutoff_tax2keep`	0.5	0.0

`Changed`

PR #42 - Template update for nf-core/tools 3.0.2, for details read this blog post

`Fixed`

PR #33 - Addition of quotation marks in parse_kraken2report.nf prevents failure of the pipeline when using a taxon with space (e.g. Homo sapiens) with the --tax2filter parameter (by @jannikseidelQBiC)
PR #34 - Made validation via blastn optional by default (by @jannikseidelQBiC)
PR #34 - Changed parameter --fasta to --fasta_blastn (by @jannikseidelQBiC)

`Dependencies`

Updated and added dependencies

Tool	Previous version	Current version
bbmap	-	39.10
blastn	2.14.1	2.15.0
multiQC	1.21	1.25.1
kraken2	2.1.2	2.1.3
seqkit	2.8.0	2.8.2

`Deprecated`

Parameter	New parameter	Reason
`--fasta`	`--fasta_blastn`	Introduction of fasta_bbduk; necessary to further distinguish the two parameters
`--skip_blastn`	`--validation_blastn`	blastn is now to be enabled on purpose; too resource intensive for a default setting
`--max_cpus`	-	New behavior of nextflow, `resourceLimits` can now be set via a config
`--max_memory`	-	New behavior of nextflow, `resourceLimits` can now be set via a config
`--max_time`	-	New behavior of nextflow, `resourceLimits` can now be set via a config

Download .zip Download .tar.gz View on GitHub

First release of nf-core/detaxizer!

This is the initial version of the pipeline:

Read QC (FastQC)
Pre-processing (fastp)
Classification of reads (Kraken2)
Optional validation of searched taxon/taxa (blastn)
Optional filtering of the searched taxon/taxa from the reads (either from the raw files or the preprocessed reads, using either the output from kraken2 or blastn)
Summary of the processes (how many reads were initially present after preprocessing, how many were classified as the tax2filter plus potential taxonomic subtree and optionally how many were validated)
Present QC for raw reads (MultiQC)

run with

See the docs on how to configure the Seqera Platform CLI.

subscribers

174

stars

open issues

open PRs

last release

11 days ago

last update

11 days ago

included modules

included subworkflows

utils_nextflow_pipeline utils_nfcore_pipeline utils_nfschema_plugin

contributors

get help

Ask a question on Slack Open an issue on GitHub

nf-core/detaxizer

Version history

1.3.0 11 days ago

Summary of changes

Detailed changes

Added

Changed

Fixed

Dependencies

1.2.0 4 months ago

Summary of changes

Detailed changes

Added

Changed

Fixed

Dependencies

Removed

1.1.0 about 1 year ago

Added

Parameters

Changed

Fixed

Dependencies

Deprecated

1.0.0 over 1 year ago

run with

subscribers

stars

open issues

open PRs

last release

last update

included modules

included subworkflows

contributors

get help

`Added`

`Changed`

`Fixed`

`Dependencies`

`Added`

`Changed`

`Fixed`

`Dependencies`

`Removed`

`Added`

`Changed`

`Fixed`

`Dependencies`

`Deprecated`