Newer
Older
# Introduction
**metagWGS** is a Nextflow bioinformatics analysis pipeline used for Metagenomic Shotgun Sequencing data.
The workflow processes raw data from FastQ inputs, control quality data (FastQC and multiQC), trim adapters sequences and clean reads (Cutadapt, Sickle), suppress contaminants (BWA mem)... See the output documentation for more details of the results.
The pipeline is built using [Nextflow,](https://www.nextflow.io/docs/latest/index.html#) a bioinformatics workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It will come with a singularity container making installation trivial and results highly reproducible.
All the following tools must be installed and copied or moved to a directory in your $PATH:
* [Nextflow](https://www.nextflow.io/docs/latest/index.html#) v19.01.0
* [Cutadapt](https://cutadapt.readthedocs.io/en/stable/#) v1.15
* [Sickle](https://github.com/najoshi/sickle) v1.33
* [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) v0.11.7
* [MultiQC](https://multiqc.info/) v1.5
* [Python](https://www.python.org/) v3.6.3
* [Kaiju](https://github.com/bioinformatics-centre/kaiju) v1.7.0
* [SPAdes](https://github.com/ablab/spades) v3.11.1
* [megahit](https://github.com/voutcn/megahit) v1.1.3
* [prokka](https://github.com/tseemann/prokka) v1.13.4 - WARNING : always have the new release
* [cdhit](http://weizhongli-lab.org/cd-hit/) v4.6.8
* [samtools](http://www.htslib.org/) v0.1.19
* [bedtools](https://bedtools.readthedocs.io/en/latest/)s v2.27.1
## Installation
## Configuration
### Configuration profiles
A configuration file has been developped ([nextflow.config](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/blob/dev/nextflow.config)) to run the pipeline on a local machine and on a SLURM cluster.
To use these configurations run the pipeline as follow:
* `nextflow run main.nf -profile standard` runs metagWGS on a local machine.
* `nextflow run main.nf -profile cluster_slurm` runs metagWGS on a SLURM cluster.
### Reproducibility with a Singularity container
A [Singularity](https://sylabs.io/docs/) container will be soon available to run the pipeline metagWGS.
### Basic usage
A basic command line running the pipeline is:
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
nextflow run -profile [standard or cluster_slurm] main.nf --reads '*_{R1,R2}.fastq.gz' --assembly [metaspades or megahit]
```
'*_{R1,R2}.fastq.gz' run the pipeline with all the R1.fastq.gz and R2.fastq.gz files in your working directory.
### Other parameters
Other parameters are available and can be used:
Mode:
--mode: Paired-end ('pe') or single-end ('se') reads. Default: 'pe'. Single-end mode has not been developped yet.
Trimming options:
--adapter1 Sequence of adapter 1. Default: Illumina TruSeq adapter.
--adapter2 Sequence of adapter 2. Default: Illumina TruSeq adapter.
Quality options:
--qualityType Sickle supports three types of quality values: Illumina, Solexa, and Sanger. Default: 'sanger'.
Alignment options:
--db_alignment Alignment data base.
Toxonomic classification options:
--kaiju_nodes File nodes.dmp built with kaiju-makedb.
--kaiju_db File kaiju_db_refseq.fmi built with kaiju-makedb.
--kaiju_names File names.dmp built with kaiju-makedb.
Other options:
--outdir The output directory where the results will be saved.
--help Show this message and exit.
### Generated files
The pipeline will create the following files in your working directory:
```
* work # Directory containing the nextflow working files
* results # Directory containing result files
* .nextflow_log # Log file from Nextflow
* # Other nextflow hidden files, eg. history of pipeline runs and old logs.
metagWGS is distributed under the GNU General Public License v3.