Effects of 454 Sequencing Error Correction on HIV-1 Diversity

13
Effects of 454 Sequencing Error Correction on HIV-1 Diversity. M. Cristina Rodríguez 1 , Maria Casadellà 1 , Christian Pou 1 , Eloisa Yuste 3 , Víctor Sánchez-Merino 3 , Bonaventura Clotet 1,2 , Roger Paredes 1,2, Marc Noguera-Julian 1 1 Institut de Recerca de la SIDA IrsiCaixa-HIVACAT, 2 Unitat VIH, Hospital Universitari Germans Trias i Pujol, Badalona, Spain . 3 Hospital Clinic Barcelona, Spain. Presented at the 10th EU. Meeting on HIV & Hepatitis, 28 - 30 March 2012, Barcelona, Spain

Transcript of Effects of 454 Sequencing Error Correction on HIV-1 Diversity

Page 1: Effects of 454 Sequencing Error Correction on HIV-1 Diversity

Effects of 454 Sequencing Error Correction on HIV-1

Diversity.

M. Cristina Rodríguez1, Maria Casadellà1, Christian Pou1, Eloisa Yuste3, Víctor Sánchez-Merino3, Bonaventura Clotet1,2, Roger Paredes 1,2, Marc Noguera-Julian1

1Institut de Recerca de la SIDA IrsiCaixa-HIVACAT, 2Unitat VIH, Hospital Universitari Germans Trias i Pujol, Badalona, Spain . 3Hospital Clinic Barcelona, Spain.

Presented at the 10th EU. Meeting on HIV & Hepatitis, 28 - 30 March 2012, Barcelona, Spain

Page 2: Effects of 454 Sequencing Error Correction on HIV-1 Diversity

Several different bioinformatic tools are available to correct sequencing errors obtained from 454 output, through use of clustering based algorithms: Amplicon Variant Analyzer (AVA) software (454 Life Sciences) and AmpliconNoise produce a set of consensus reads. Shorah yields a set of individually corrected reads.

Background 454 sequencing (454 life sciences/Roche Diagnostics) can be used to detect low frequency variants within HIV-1 viral population. 454 sensitivity is compromised by sequencing error.

Presented at the 10th EU. Meeting on HIV & Hepatitis, 28 - 30 March 2012, Barcelona, Spain

Page 3: Effects of 454 Sequencing Error Correction on HIV-1 Diversity

Materials & Methods l  We generated a low-diversity viral mix by using the full-length env gene from HIV-1 strain AC10 cloned into pNL43 context and generating a library of randomly mutated AC10 envelope by a PCR-based method. l  We used three amplicons: amp_1, amp_3 and amp_4; amplicons 3 and 4 are overlapped between the 1595 to 1620 positions.

l  Single amplicon (425bp) sequences were obtained by 454 sequencing on single-strain AC10 and mutated AC10 (AC10-Mut) sequence data.

Presented at the 10th EU. Meeting on HIV & Hepatitis, 28 - 30 March 2012, Barcelona, Spain

Page 4: Effects of 454 Sequencing Error Correction on HIV-1 Diversity

Amplicon Design

Amp_1  

Amp_3  

Amp_4  

Presented at the 10th EU. Meeting on HIV & Hepatitis, 28 - 30 March 2012, Barcelona, Spain

Page 5: Effects of 454 Sequencing Error Correction on HIV-1 Diversity

SFF  

Contamination Filter (pNL4.3)

DeMult. & TRIM

AMPLICONNOISE

AVA  SOFTWARE  

VARIANTS  

SFF  

IN-HOUSE SCRIPTS

FASTA  

ALIGNMENT (Mosaik, BWA)

CLEAN(FF)

CLEAN(SEQ)

CLEAN(SEQ)

FILTERING  (CHREM)  •  Homology  ≥  80%  relaFve  to  AC10  •  Read  length(+/-­‐  15%)  over  Reference  

SHORAH

OUT  

454  

Presented at the 10th EU. Meeting on HIV & Hepatitis, 28 - 30 March 2012, Barcelona, Spain

Page 6: Effects of 454 Sequencing Error Correction on HIV-1 Diversity

l  Only substitution variants were considered.

l  The mutation rate for non processed data wasn’t larger than 3%. l  The frequency of all non-reference nucleotide variants was measured on a per-position basis. l  The percentage of signal reduction for raw signal (<0.5, <1.0, <1.5, <2, >= 2) was divided into five categories.

Presented at the 10th EU. Meeting on HIV & Hepatitis, 28 - 30 March 2012, Barcelona, Spain

Page 7: Effects of 454 Sequencing Error Correction on HIV-1 Diversity

RESULTS

AC10 Raw data

00,5

11,5

22,5

33,5

44,5

5

254 322 390 458 526 594 662 1327 1395 1463 1531 1599 1667 1735 1803 1871 1939

Ampl_1 Ampli_3_4

Perc

enta

ge

Nucleotide variant frequency from AC10 raw sequence data.

Random Mutagenesis

Presented at the 10th EU. Meeting on HIV & Hepatitis, 28 - 30 March 2012, Barcelona, Spain

Page 8: Effects of 454 Sequencing Error Correction on HIV-1 Diversity

AC10_mut Raw data

00,5

11,5

22,5

33,5

44,5

5

254 322 390 458 526 594 662 1327 1395 1463 1531 1599 1667 1735 1803 1871 1939

Ampl_1 Ampl_3_4

Perc

enta

geNucleotide variant frequency corresponding to raw sequence

data was higher in AC10-Mut than in AC10.

Presented at the 10th EU. Meeting on HIV & Hepatitis, 28 - 30 March 2012, Barcelona, Spain

Page 9: Effects of 454 Sequencing Error Correction on HIV-1 Diversity

When AC10 data was treated, error was efficiently removed by AVA, Shorah, AmpliconNoise(S) and AmpliconNoise(F).

Presented at the 10th EU. Meeting on HIV & Hepatitis, 28 - 30 March 2012, Barcelona, Spain

Page 10: Effects of 454 Sequencing Error Correction on HIV-1 Diversity

Signal reduction plots for each raw signal category

Presented at the 10th EU. Meeting on HIV & Hepatitis, 28 - 30 March 2012, Barcelona, Spain

Page 11: Effects of 454 Sequencing Error Correction on HIV-1 Diversity

Conclusions l  While data cleaning methods are needed to efficiently correct for sequencing error, the application of these tools on real sequence data may also result in the loss of true diversity. l  Low diversity positive controls would help fine-tuning error correction tools for selectively removing sequencing error over real low-level true signal.

Presented at the 10th EU. Meeting on HIV & Hepatitis, 28 - 30 March 2012, Barcelona, Spain

Page 12: Effects of 454 Sequencing Error Correction on HIV-1 Diversity

Acknowledgements

Presented at the 10th EU. Meeting on HIV & Hepatitis, 28 - 30 March 2012, Barcelona, Spain

Page 13: Effects of 454 Sequencing Error Correction on HIV-1 Diversity

Molecular Epidemiology group, Irsicaixa.

Presented at the 10th EU. Meeting on HIV & Hepatitis, 28 - 30 March 2012, Barcelona, Spain