Skip to content
Snippets Groups Projects
README.md 4.2 KiB
Newer Older
Joanna Fourquet's avatar
Joanna Fourquet committed
# **metagWGS**
Celine Noirot's avatar
Celine Noirot committed

Joanna Fourquet's avatar
Joanna Fourquet committed
# Introduction

**metagWGS** is a Nextflow bioinformatics analysis pipeline used for Metagenomic Shotgun Sequencing data.

The workflow processes raw data from FastQ inputs, control quality data (FastQC and multiQC), trim adapters sequences and clean reads (Cutadapt, Sickle), suppress contaminants (BWA mem)... See the output documentation for more details of the results.

Joanna Fourquet's avatar
Joanna Fourquet committed
The pipeline is built using [Nextflow,](https://www.nextflow.io/docs/latest/index.html#) a bioinformatics workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It will come with a singularity container making installation trivial and results highly reproducible.
Joanna Fourquet's avatar
Joanna Fourquet committed
## Prerequisites
Joanna Fourquet's avatar
Joanna Fourquet committed
All the following tools must be installed and copied or moved to a directory in your $PATH:

Joanna Fourquet's avatar
Joanna Fourquet committed
* [Nextflow](https://www.nextflow.io/docs/latest/index.html#) v19.01.0
Joanna Fourquet's avatar
Joanna Fourquet committed
* [Cutadapt](https://cutadapt.readthedocs.io/en/stable/#) v1.15
* [Sickle](https://github.com/najoshi/sickle) v1.33
* [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) v0.11.7
* [MultiQC](https://multiqc.info/) v1.5
Joanna Fourquet's avatar
Joanna Fourquet committed
* [BWA](http://bio-bwa.sourceforge.net/) v0.7.17
Joanna Fourquet's avatar
Joanna Fourquet committed
* [Python](https://www.python.org/) v3.6.3
* [Kaiju](https://github.com/bioinformatics-centre/kaiju) v1.7.0
* [SPAdes](https://github.com/ablab/spades) v3.11.1
* [megahit](https://github.com/voutcn/megahit) v1.1.3
* [prokka](https://github.com/tseemann/prokka) v1.13.4 - WARNING : always have the new release
* [cdhit](http://weizhongli-lab.org/cd-hit/) v4.6.8
* [samtools](http://www.htslib.org/) v0.1.19
Joanna Fourquet's avatar
Joanna Fourquet committed
* [bedtools](https://bedtools.readthedocs.io/en/latest/)s v2.27.1
Joanna Fourquet's avatar
Joanna Fourquet committed
* [subread](http://subread.sourceforge.net/) v1.6.0
Joanna Fourquet's avatar
Joanna Fourquet committed
## Installation

## Configuration

### Configuration profiles

A configuration file has been developped ([nextflow.config](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/blob/dev/nextflow.config)) to run the pipeline on a local machine and on a SLURM cluster.

To use these configurations run the pipeline as follow:
* `nextflow run main.nf -profile standard` runs metagWGS on a local machine.
* `nextflow run main.nf -profile cluster_slurm` runs metagWGS on a SLURM cluster.

### Reproducibility with a Singularity container

A [Singularity](https://sylabs.io/docs/) container will be soon available to run the pipeline metagWGS.

Joanna Fourquet's avatar
Joanna Fourquet committed
## Usage

Joanna Fourquet's avatar
Joanna Fourquet committed
### Basic usage

A basic command line running the pipeline is:

Joanna Fourquet's avatar
Joanna Fourquet committed
```python
Joanna Fourquet's avatar
Joanna Fourquet committed
nextflow run -profile [standard or cluster_slurm] main.nf --reads '*_{R1,R2}.fastq.gz' --assembly [metaspades or megahit]
```

'*_{R1,R2}.fastq.gz' run the pipeline with all the R1.fastq.gz and R2.fastq.gz files in your working directory.

### Other parameters

Other parameters are available and can be used:

    Mode:
      --mode:                       Paired-end ('pe') or single-end ('se') reads. Default: 'pe'. Single-end mode has not been developped yet.

    Trimming options:

      --adapter1                    Sequence of adapter 1. Default: Illumina TruSeq adapter.
      --adapter2                    Sequence of adapter 2. Default: Illumina TruSeq adapter.

    Quality options:
      --qualityType                 Sickle supports three types of quality values: Illumina, Solexa, and Sanger. Default: 'sanger'.

    Alignment options:
      --db_alignment                Alignment data base.

    Toxonomic classification options:
      --kaiju_nodes                 File nodes.dmp built with kaiju-makedb.
      --kaiju_db                    File kaiju_db_refseq.fmi built with kaiju-makedb.
      --kaiju_names                 File names.dmp built with kaiju-makedb.

    Other options:
      --outdir                      The output directory where the results will be saved.
      --help                        Show this message and exit.
      

### Generated files

The pipeline will create the following files in your working directory:

```
* work            # Directory containing the nextflow working files
* results         # Directory containing result files
* .nextflow_log   # Log file from Nextflow
*                 # Other nextflow hidden files, eg. history of pipeline runs and old logs.
Joanna Fourquet's avatar
Joanna Fourquet committed
```

## Contributing


## License
Joanna Fourquet's avatar
Joanna Fourquet committed
metagWGS is distributed under the GNU General Public License v3.
Joanna Fourquet's avatar
Joanna Fourquet committed

## Copyright

2019 INRA

## Citation

metagWGS has not been published yet.