nf-core/detaxizer      
 A pipeline to identify (and remove) certain sequences from raw genomic data. Default taxon to identify (and remove) is Homo sapiens. Removal is optional.
 de-identificationdecontaminationednafastqfilterlong-readsmetabarcodingmetagenomicsmicrobiomenanoporeshort-readsshotguntaxonomic-classificationtaxonomic-profiling 
   Version history
Summary of changes
- filtering is set now by default
 - defaults reflect best settings from benchmarking human decontamination
 - improvements to memory and time requirements
 
Detailed changes
Added
- PR #70 - Filtering is now default, 
--skip_filterwas added (by @d4straub) - PR #71 - Add usage information learned from our benchmarking (by @d4straub)
 
Changed
- PR #65,PR #69 - Template update for nf-core/tools 3.3.2 (by @d4straub)
 - PR #72 - Default for 
--kraken2dbwas changed from ‘https://genome-idx.s3.amazonaws.com/kraken/k2_standard_08gb_20240904.tar.gz’ to ‘https://genome-idx.s3.amazonaws.com/kraken/k2_standard_20240605.tar.gz’. That database is much larger (60GB) but default settings will therefore reflect best decontamination performance in benchmarks (by @d4straub) - PR #73 - Doubled memory allocation for ISOLATE_BBDUK_IDS (by @d4straub)
 - PR #75 - Updated version and contributors (by @d4straub)
 
Fixed
- PR #62 - Use dnaio to reduce memory spikes during renaming (by @bede)
 - PR #77 - Fixed conda versions to exactly follow container versions (by @d4straub)
 - PR #78 - Update typos, re-add code comments (by @d4straub)
 
Dependencies
| Software | Previous version | New version | 
|---|---|---|
| MultiQC | 1.27 | 1.29 | 
| tar | 1.3 | 1.34 | 
Removed
 Added
- PR #34 - Added bbduk to the classification step (kraken2 as default, both can be run together) (by @jannikseidelQBiC)
 - PR #34 - Added 
--fasta_bbdukparameter to provide a fasta file with contaminants (by @jannikseidelQBiC) - PR #34 - Rewrote summary step of classification to be usable with bbduk and/or kraken2 (by @jannikseidelQBiC)
 - PR #34 - Made preprocessing with fastp optional and added the parameter 
--fastp_eval_duplicationto turn on duplication removal (off as default, was on/not changeable in v1.0.0) (by @jannikseidelQBiC) - PR #34 - Optionally the removed reads can now be written to the output folder (by @jannikseidelQBiC)
 - PR #34 - Added optional classification of filtered and removed reads via kraken2 (by @jannikseidelQBiC)
 - PR #39 - Added generation of input samplesheet for nf-core/mag, nf-core/taxprofiler (by @Joon-Klaps)
 
Parameters
Added parameters:
| Parameter | 
|---|
--fasta_bbduk | 
--preprocessing | 
--output_removed_reads | 
--classification_kraken2 | 
--classification_bbduk | 
--kraken2confidence_filtered | 
--kraken2confidence_removed | 
--classification_kraken2_post_filtering | 
--fastp_eval_duplication | 
--bbduk_kmers | 
Changed default values of parameters:
| Parameter | Old default value | New default value | 
|---|---|---|
--fastp_cut_mean_quality | 15 | 1 | 
--kraken2db | ’https://genome-idx.s3.amazonaws.com/kraken/k2_standard_08gb_20231009.tar.gz' | 'https://genome-idx.s3.amazonaws.com/kraken/k2_standard_08gb_20240605.tar.gz’ | 
--kraken2confidence | 0.05 | 0.00 | 
--tax2filter | ’Homo' | 'Homo sapiens’ | 
--cutoff_tax2filter | 2 | 0 | 
--cutoff_tax2keep | 0.5 | 0.0 | 
Changed
- PR #42 - Template update for nf-core/tools 3.0.2, for details read this blog post
 
Fixed
- PR #33 - Addition of quotation marks in 
parse_kraken2report.nfprevents failure of the pipeline when using a taxon with space (e.g. Homo sapiens) with the--tax2filterparameter (by @jannikseidelQBiC) - PR #34 - Made validation via blastn optional by default (by @jannikseidelQBiC)
 - PR #34 - Changed parameter 
--fastato--fasta_blastn(by @jannikseidelQBiC) 
Dependencies
Updated and added dependencies
| Tool | Previous version | Current version | 
|---|---|---|
| bbmap | - | 39.10 | 
| blastn | 2.14.1 | 2.15.0 | 
| multiQC | 1.21 | 1.25.1 | 
| kraken2 | 2.1.2 | 2.1.3 | 
| seqkit | 2.8.0 | 2.8.2 | 
Deprecated
| Parameter | New parameter | Reason | 
|---|---|---|
--fasta | --fasta_blastn | Introduction of fasta_bbduk; necessary to further distinguish the two parameters | 
--skip_blastn | --validation_blastn | blastn is now to be enabled on purpose; too resource intensive for a default setting | 
--max_cpus | - | New behavior of nextflow, resourceLimits can now be set via a config | 
--max_memory | - | New behavior of nextflow, resourceLimits can now be set via a config | 
--max_time | - | New behavior of nextflow, resourceLimits can now be set via a config | 
First release of nf-core/detaxizer!
This is the initial version of the pipeline:
- Read QC (
FastQC) - Pre-processing (
fastp) - Classification of reads (
Kraken2) - Optional validation of searched taxon/taxa (
blastn) - Optional filtering of the searched taxon/taxa from the reads (either from the raw files or the preprocessed reads, using either the output from kraken2 or blastn)
 - Summary of the processes (how many reads were initially present after preprocessing, how many were classified as the 
tax2filterplus potential taxonomic subtree and optionally how many were validated) - Present QC for raw reads (
MultiQC)