BIOINFORMATICS
 

MASSIVE PARALLEL SEQUENCING ANALYSES 

Most NGS technologies generate data in the form of reads, gathered in a file (e.g. FASTQ for Illumina HiSeq, XSQ for LifeTechnologie Solid, etc). These “raw” data are useless until they are processed by different algorithms to generate “analyzable” data (e.g. variant calls from exome sequences). We call this process “tier-1 analysis” and it consists of a pipeline of software that performs multiple treatments on the raw data. For example, if you have performed a genome, exome or gene panel sequencing, you will have one or more FASTQ files as raw data. A typical tier-1 pipeline will align the reads to a reference genome, clean the resulting alignment (e.g. by removal of duplicated reads, local realignment, base scores recalibration) and call variants. The resulting files will include an alignment (BAM file) and a variant call (VCF file), that can be further analyzed by users.

Since NGS produces tremendous amounts of data, variant call analyses are not trivial. For this “tier-2 analysis”, bioinformatics support is generally essential, but most tools are too complex for use by non-informaticians. For this reason, we developed Highlander, which allows biologists to analyze their NGS data via a user-friendly graphical interface. Please check the project home page for further information.


CLUSTER

The platform includes a high performance cluster, with BigData capabilities, which can perform any kind of NGS analysis and host the Highlander databases. Here is the detailed configuration:

1 Machine "User"

The machine dedicated to users has 64Gb of RAM, 8 cores (16 threads), 12 Tb of local RAID 1 storage, and is equipped with one GPU nVidia 1070 to allow users to tests their jobs locally before submitting them to the cluster. The storage allows researchers to transfer more easily data that should be analysed/stored directly on the cluster. Software largely used in genomics, like RStudio, can also easily be deployed here, before being duplicated on each other machine. It allows quick launch of Map Reduce or Spark applications without suffering complications required by the packaging of all dependencies of one particular software.
The configuration is:

  • RAM: 2x 32 Gb DDR4 ECC CL17 PC4-2400
  • CPU: 1x Xeon Silver 4108 - 8 Cores - 16 Threads @ 1.8Ghz
  • SSD Data/OS: 1x SSD M.2 PCIe 128Gb
  • HDD Data: 4x HDD Western Digital WD60EFRX 3.5 6Tb
  • 1Tb SSD raid0 dedicated to MySQL

10 Machines "Node" BigData

"Worker" machines are used for all proposed services, representing the net workforce of the cluster. Their configuration maximizes the number of cores, quantity of RAM and HDD/SSHD storage. Hard drives are used for HDFS and SSHDs, and dedicated to services like Cassandra and ElasticSearch. All machines gather a total of 320 cores (740 threads), 1.2 Tb of RAM, 1.1 Pb of storage and 10 GPU 1070Ti. Storage is made available as one volume of ~1 Pb through glusterFS on ZFS (RAID 5).
The configuration of one machine is:

  • RAM: 16x 8Gb DDR4 ECC
  • CPU: 2x AMD Epyc 7281 16 Cores @ 2.1Ghz
  • SSD Data/OS: 2x 2Tb Seagate FireCuda 2.5’’ SSHD
  • HDD Data: 12x HDD Seagate ST10000NM0016 10Tb HDD 3.5 PMR Enterprise 2.5MTBF
  • GPU: 1x MSI Aero GeForce 1070Ti Blower 8G

2 Machines "Compute" HPC

These machines main purpose is to run classical HPC analyses, without using the BigData architecture. They gather 128 cores, 512 Gb of RAM et 8 Tb of storage.
The configuration of one machine is:

  • RAM: 256GB RAM DDR3 ECC
  • CPU: AMD 64 cores
  • HDD: 4TB

2 Machines "Scratch storage"

These machines are mainly dedicated to scratch storage (for working jobs only), and offer a total of ~ 35 Tb through glusterFS on ZFS.
The configuration of one machine is:

  • RAM: 24GB 1333MHz DDR3
  • CPU: Dual Xeon E5520 2.26GHz
  • HDD: approx. 33TB of useable RAID storage

Services et software

  • Distribution Ubuntu 16.04 Linux
  • Provisioning MAAS
  • GlusterFS
  • Scheduler Slurm
  • Stack Hadoop
  • Cassandra
  • ElasticSearch & Kibana

Please visit the cluster WIKI for more information.

 

AVAILABLE SERVICES 

We can provide:

  • Tier-1 analysis of raw NGS data through our pipeline (e.g. alignment, clean of the data and variant calling on genome/exome/panel data, generating BAM and VCF files from a FASTQ).
  • Tier-1 analysis of RNA-Seq data.
  • Tier-2 analysis of NGS data, through our in-house software Highlander.
  • Other NGS-related analyses are possible.
  • Access to the cluster.

All analyses are performed on our cluster by our bioinformatician (see contact below), but personal cluster access can also be granted. Please contact us for any collaboration.


PRICING

  • Highlander access for genomic data (WGS, WES, panels): 40€ per sample for UCLouvain members, 125€ per sample for externals (25% PAFG included, TVA not included). 
    This fee includes download/upload of raw data on the cluster, Tier-1 analysis to generate BAM, VCF and QC, importation into Highlander database, and illimited access to the software for Tier-2 analysis.
    Note that a consultancy fee (see below) could be added if your data necessitate custom adjustment to the pipeline that will only benefit you (e.g. setting up a specific reference genome). 
  • Highlander access for RNA-Seq data: 5€ per sample for UCLouvain members, 15€ per sample for externals (25% PAFG included, TVA not included). 
    This fee includes download/upload of raw data on the cluster, Tier-1 analysis to generate BAM, VCF and QC, importation into Highlander database, and illimited access to the software for variant analysis (Highlander doesn't allow other transcriptomic analyses like gene expression quantification).
    Note that a consultancy fee (see below) could be added if your data necessitate custom adjustment to the pipeline that will only benefit you (e.g. setting up a specific reference genome). 
  • Bioinformatics consultancy: 60€ per hour for de Duve members, 75€ / h for UCLouvain members, 190€ / h for externals (25% PAFG included, TVA not included).
    It includes
    • Development of custom scripts or software
    • Data transfer (download of external data to the cluster, upload of internal data to public repository, ...)
    • Data quality checking 
    • Training (e.g. use of Highlander)
    • Custom data analysis

Alternatively it's possible to set up a collaboration without fee, please contact us if you're interested.

 

CONTACT AND INFORMATION 

Raphaël HELAERS, Ph.D.
Laboratory of Human Molecular Genetics
de Duve Institute, 74 (5th floor)
02/764.74.53
Email: 
raphael.helaers@uclouvain.be

 

USEFUL LINKS