Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored...

55
1 Lectures 20, 21: Singlecell Sequencing and Assembly Spring 2017 April 20,25, 2017

Transcript of Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored...

Page 1: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

1

Lectures  20,  21:  Single-­‐cell  Sequencing  and  Assembly  

Spring  2017  April  20,25,  2017  

Page 2: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

SINGLE-CELL SEQUENCING AND ASSEMBLY

2

Page 3: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Single-cell Sequencing �  Motivation:

◦  Vast majority of environmental bacteria are unculturable outside of their natural habitat.

◦  Cell culture may distort genomic information, e.g. cancerous cells.

�  Metagenomics:

◦  Superimposed sequencing of mixed cells of different species in one pool.

�  Single-cell genomics: sequencing of one DNA molecule from one cell.

3

Page 4: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Single Cell Genome Sequencing

4

Start with a single copy of genome.

Fragment the amplified DNA and sequence reads at both ends.

Amplify (copy only) the genome using multiple displacement amplification (MDA) technique.

F.B. Dean ,et al., PNAS (2002) 99(8): 5261-6

Page 5: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Multiple Displacement Amplification Video https://www.youtube.com/watch?v=CaFq9cnfTZI

5

Page 6: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Sequencing Coverage Normal multicell vs. single cell

Green regions are blackout

H. Chitsaz, et al., Nature Biotech 29(10): 915 – 921, Oct 2011

6

Number of reads: ~28 million, read length: 100 bp, genome size: 4.6 Mbp, coverage: ~600x

Page 7: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Distribution of Coverage

7

A cutoff threshold will eliminate about 25% of valid data in the single cell case, whereas it eliminates noise in the normal multicell case. H. Chitsaz, et al., Nature Biotech 29(10): 915 – 921, Oct 2011

Page 8: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Rescuing Low Coverage Contigs A quick example

8

We remove the lowest coverage contig, in blue.

Page 9: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Rescuing Low Coverage Contigs After error removal

9

Merged Contig. Coverage = 9

Page 10: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Our assembly algorithm

(a) EULER-SR error correction

(b) Velvet-SC assembly algorithm 1-7: Same as Velvet assembly algorithm.

8: for i =2 to cutoff do 9: Remove vertices with average coverage <

10: Clip tips of graph. 11: Correct graph by the Tour Bus algorithm. 12: Resolve repeats using read pairing. 13: Condense graph by merging 1-in1-out vertices. 14: end for 15: Return vertices of graph as contigs.

Velvet assembly algorithm 1: Build a roadmap rdmap from R by indexing all

k-mers. 2: Build a de Bruijn pregraph pg from rdmap. 3: Clip tips of pg. 4: Build a graph from pg by threading R. 5: Condense graph by merging 1-in 1-out vertices. 6: Clip tips of graph. 7: Correct graph by the Tour Bus algorithm. 8: Remove vertices with average coverage < 9: Clip tips of graph.

10: Correct graph by the Tour Bus algorithm. 11: Resolve repeats using read pairing. 12: Condense graph by merging 1-in 1-out vertices. 13: Return vertices of graph as contigs.

Velvet vs. Velvet-SC

cutoff i

Daniel Zerbino and Ewan Birney, Genome Research 18: 821-829, 2008 Hamidreza Chitsaz, et al., Nature Biotech

29(10): 915 – 921, Oct 2011

10

Page 11: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

E. coli Assembly Results Assembler #

contigs NG50 (bp)

Known genes

Complete genes

EULER-SR Edena

SOAPdenovo Velvet

E+V-SC

1344 1592 1240 428 501

26662 3919 18468 22648 32051

4324 3178 2425 3021 3055 3753

NG50 = the contig length at which longer contigs represent half of the total genome length.

11

H. Chitsaz, et al., Nature Biotech 29(10): 915 – 921, Oct 2011

Page 12: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

New Genome Deltaproteobacteria single cell assembly results

Assembler # of contigs

N50 (bp)

Velvet 1856 11531 E+V-SC 823 30293

12

N50 = the contig length at which longer contigs represent half of the total assembly length.

Page 13: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

October 2011

13

Page 14: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Single-cell Assemblers �  E+Velvet-SC ◦  H. Chitsaz, et al., Nature Biotech 29(10): 915 – 921, Oct 2011.

�  SPAdes ◦  Anton Bankevich, et al., Journal of Computational Biology 19(5): 455 –

477, 2012.

�  IDBA-UD ◦  Y. Peng, et al., Bioinformatics 28(11): 1420 – 8, 2012.

14

Page 15: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Coverage Bias Not Sequence Specific

15

Page 16: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Combining Multiple MDAs �  Combining DNA from multiple identical single cells, before or

after amplification, reduces non-uniformity.

�  In practice, combining MDA from 6-12 identical cells gives very high quality assemblies.

�  It is hard to pick identical cells before sequencing. Chicken and egg problem.

16

Page 17: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Synergistic Co-assembly �  Our solution: co-assembly ◦  N. Movahedi, et al., IEEE BiBM 2012. ◦  M. Embree, et al., The ISME J. 2013. ◦  N. Movahedi, et al., BMC Genomics, under review.

�  Another application of co-assembly: variation detection ◦  Iqbal, Z., et al. De novo assembly and genotyping of variants using colored

de Bruijn graphs, Nat Genetics, 44, 226 – 232, 2012.

17

Page 18: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

SYNERGISTIC CO-ASSEMBLY

18

Page 19: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

HyDA Single Cell Co-Assembler �  Isolate a number of single cells that are likely to be of the same

species. But don’t worry, if they are not, our algorithm will tell you in the end.

�  Amplify and sequence each of them individually.

�  Assign a unique color to each read dataset.

�  Build a colored de Bruijn graph from the colored datasets. �  Iteratively remove errors, condense, and finally output contigs.

19

Iqbal, et al., Nature Genetics 44: 226–232, 2012. J. Simpson, Genome Informatics 2011.

Page 20: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Small Toy Example Shred reads into k-mers (k = 3)

G G A C T A A AG G A

G A CA C T

C T AT A A

A A A

GGA (1x) ‏

GAC (1x) ‏

ACT (1x) ‏

CTA (1x) ‏

TAA (1x) ‏

AAA (1x) ‏

G A C C A A A TG A C

A C CC C A

C A AA A A

A A T

P. Pevzner, J Biomol Struct Dyn (1989) 7:63–73 R. Idury, M. Waterman, J Comput Biol (1995) 2:291–306

20

GAC (1x) ‏

ACC (1x) ‏

CCA (1x) ‏

CAA (1x) ‏

AAA (1x) ‏

AAT (1x) ‏

Green Read Red Read

Page 21: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Small Toy Example Merge vertices labeled by identical k-mers

Green Read: Red Read:

Resulting Graph:

21

GGA (1x) ‏

GAC (1x) ‏

ACT (1x) ‏

CTA (1x) ‏

TAA (1x) ‏

AAA (1x) ‏

GAC (1x) ‏

ACC (1x) ‏

CCA (1x) ‏

CAA (1x) ‏

AAA (1x) ‏

AAT (1x) ‏

GGA (1x) ‏

GAC (1x) (1x) ‏

ACT (1x) ‏

CTA (1x) ‏

TAA (1x) ‏

AAA (1x) (1x) ‏

ACC (1x) ‏

CCA (1x) ‏

CAA (1x) ‏

AAT (1x) ‏

Page 22: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Co-assembly �  Condensation is done solely based on graph structure, ignoring colorings.

�  Maximum colored coverage is used to determine erroneous sequences.

22

Page 23: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Relationships between Co-assembled Sequences

Exclusive portion:

Exclusivity ratio:

Assembly size:

23

Page 24: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Alkane-Degrading Bacterial Community �  An alkane-degrading community enriched from sediment from a

hydrocarbon-contaminated ditch in Bremen, Germany.

�  Consists of 3 species: Anaerolinea (2 cells), Smithella (6 cells), and Syntrophus (2 cells), that have sophisticated metabolic interactions. They cannot be cultured.

�  Finished reference genome for a member of Anaerolinea and a member of Syntrophus is available.

24

In collaboration with Karsten Zengler and Mallory Embree at UCSD. M. Embree, et al., The ISME J., 2013 N. Movahedi, et al., BMC Genomics, under review

Page 25: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Co-assembly Results QUAST results, comparison with the state-of-the-art

25

HyDA   SPAdes  Total (bp)   N50   Total (bp)   N50  

Syntrophus  K05   1,265,548   3,782   869,586   3,128  

C04   465,091   1,928   390,923   4,234  

Smithella  

MEL13   1,590,259   6,977   1,415,399   10,475  

MEK03   1,945,701   5,952   1,960,722   11,372  

MEB10   1,569,709   5,887   1,514,813   8,861  

K19   840,236   7,295   653,866   3,834  

K04   720,188   5,239   618,500   9,332  

F16   1,323,536   6,088   982,263   5,366  

Anaerolinea  F02   1,352,341   8,201   1,698,195   5,944  

A17   260,386   850   169,413   1,187  

Page 26: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Co-assembly Results RAST functional elements

26

  HyDA   SPAdes  

  Coding sequence  

subsystem   Coding sequence  

subsystem  

Anaerolinea  A17   212   8   146   9  F02   1,283   122   1,653   153  

Smithella  

F16   1,197   117   899   91  K04   659   89   559   75  K19   757   82   581   54  

MEB10   1,491   151   1,504   156  MEK03   1,856   180   1,955   200  MEL13   1,535   165   1,435   154  

Syntrophus  C04   416   48   375   49  K05   1,216   121   873   68  

Page 27: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Co-assembly Results Exclusivity ratios (%)

27

  Anaerolinea   Smithella   Syntrophus  A17   F02   F16   K04   K19   MEB10   MEK03   MEL13   C04   K05  

Ana.   A17   0   24   87   95   96   80   82   86   22   19  F02   77   0   96   98   99   95   95   96   74   73  

Smi.   F16   96   96   0   73   73   37   22   38   96   55  K04   97   97   49   0   67   42   25   45   97   57  K19   98   98   54   68   0   35   32   32   98   58  MEB10   96   96   48   74   69   0   24   39   95   56  MEK03   97   97   49   73   74   38   0   37   96   61  MEL13   97   97   50   76   68   39   22   0   97   59  

Syn.   C04   44   39   89   96   97   85   86   90   0   64  K05   77   75   54   76   75   45   41   49   73   0  

Page 28: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

HyDA Outline 1.  Index all distinct k-mers, storing their multiplicities and

connections, in a hash table. Each hash node is a self-balancing tree.

2.  Construct the condensed de Bruijn graph.

3.  Iteratively remove low coverage vertices and recondense.

4.  Under development and future work: resolve repeats using long reads and mate pairs.

28

Page 29: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

�  Up to 2 billion reads for a vertebrate genome. Up to several billion vertices in the graph.

�  Highly entangled and complex graph.

�  Current tools require special large shared memory hardware.

�  Our goal: reduce the memory footprint so that assembly becomes accessible on publicly available cloud computing environments, e.g. Amazon EC2, etc.

29

HyDA Need for parallelization

Page 30: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

HyDA Distributing k-mers among processing nodes

�  Design a hash function that computes a processing node for every k-mer. More precisely, design

in which K is the set of k-mers and n is the number of processors.

�  It should provide balance between processing nodes.

�  It should not impose communication or memory overhead.

30

},...,2,1{: nKh →

Page 31: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Let be 1+(k-mer mod n).

31

Naïve Approach A quick example

},...,2,1{: nKh →

GATCCT GGATCC‏ ATCCTC

1 2 1 h(k-mer):

TCCTCA

2 Communication Communication Communication

Page 32: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

�  Try to assign adjacent k-mers to the same processing node, as much as possible.

�  It is impossible to expect all adjacent k-mers to be assigned to the same processing node.

�  Try to keep the balance between processing nodes, i.e. keep memory usage almost equal.

32

HyDA Approach

Page 33: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

G G A T C CG G

G AA T

T CC C

M. Roberts, W. Hayes, B. R. Hunt, S. M. Mount and J. A. Yorke, Bioinformatics (2004) 20 (18): 3363-3369

Minimizer

Minimizer = lexicographically minimum m-mer in the k-mer

Minimizer: Minimizer hash:

33

GATCCT GGATCC‏ ATCCTC

AT 1

AT 1

AT 1

TCCTCA

CA 2

Communication

Minimizers A quick example (k = 6, m = 2)

Page 34: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

One single cell few single cells the entire sample

�  Goal: capture every distinct genome present in a microbial sample, even if represented by only one (few) single cell(s).

�  Exploit sparsity: there are billions of cells, but only thousands distinct genomes/species. A lot of duplicate information.

34

Page 35: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

COMPRESSIVE SINGLE-CELL GENOMICS

35

Page 36: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Assumptions Automated microfluidic devices are envisioned that will be capable of high throughput:

�  isolation of every individual cell in the sample,

�  DNA extraction,

�  DNA amplification,

�  selective sampling and grouping (and potential barcoding) of amplicons.

36

Page 37: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Cell Isolation

37

Input microfluidics channel containing separated single cells

Microfluidics channel containing water droplets in oil

Page 38: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Capture One Cell per Microdroplet

38

Page 39: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Assumptions Automated microfluidic devices are envisioned that will be capable of high throughput:

�  isolation of every individual cell in the sample,

�  DNA extraction,

�  DNA amplification,

�  selective sampling and grouping (and potential barcoding) of amplicons.

39

Page 40: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Lysis and DNA Extraction

40

Page 41: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Assumptions Automated microfluidic devices are envisioned that will be capable of high throughput:

�  isolation of every individual cell in the sample,

�  DNA extraction,

�  DNA amplification,

�  selective sampling and grouping (and potential barcoding) of amplicons.

41

Page 42: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

DNA Amplification

42

Amplification kit Primers + Φ29

DNA template

Amplicons

Page 43: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Assumptions Automated microfluidic devices are envisioned that will be capable of high throughput:

�  isolation of every individual cell in the sample,

�  DNA extraction,

�  DNA amplification,

�  selective sampling and grouping (and potential barcoding) of amplicons.

43

Page 44: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Selective Sampling and Grouping

44

1 2 3 4 5

Sample from droplets 2 and 5 not the entire droplets

Prepare and send for DNA sequencing, e.g. Illumina HiSeq, etc.

Page 45: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Naïve Method

� Recall the goal: capture every distinct genome.

� Exhaustive sequencing and co-assembly of every single cell in the sample: amplify the genome of each cell, sequence, and co-assemble.

Not tractable, because of the cost and duration of sequencing.

45

Page 46: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Adaptive Divide-and-Conquer �  Objective: capture all the genomes while minimizing the cost

which depends on

◦  the total number of bases required to be sequenced,

◦  the number of pools (sampling-and-sequencing runs) needed.

�  Method: iteratively pool samples of amplicons from different cells, sequence, co-assemble, and compare.

46

Page 47: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Adaptive Divide-and-Conquer (Example)

47 Z. Taghavi, et al., Bioinformatics 29 (19): 2395-2401, 2013

Page 48: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Relationships between Co-assembled Sequences (reminder)

Exclusivity ratio:

Subsumption:

: assembly size

τ : an input parameter

48

Page 49: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Computational Challenges

�  Resource allocation – choosing the required sampling amount from each cell in each round.

�  Assembly-assembly comparison – hard to choose a good genome distinction sensitivity parameter τ. Parameters are interdependent.

49

Page 50: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Squeezambler �  A software to implement both naïve and adaptive divide-and-

conquer methods.

�  Squeezambler is available at

http://chitsazlab.org/software/squeezambler/

50 Z. Taghavi, et al., Bioinformatics 29 (19): 2395-2401, 2013

Page 51: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Simulation Results

�  We selected 9 distinct genomes (species) from human gut microbiome and created three scenarios for our simulation. DNA amplification (MDA) and sequencing were simulated by MDAsim and ART software packages.

�  MDAsim: A software to simulate MDA process

Z. Taghavi and Sorin Draghici, IEEE BIBM, 2012.

51

Page 52: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Simulation Results (cont. )

52

Page 53: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Simulation Results (cont. )

53

Page 54: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Conclusions:

�  The number of required barcodes with our adaptive divide-and-conquer algorithm is less than that required by the naïve approach,

�  The amount of sequencing needed remains the same or decreases.

54

Page 55: Lectures(20,(21:(Single3cell( Sequencing(and(Assembly(cs425/spring17/slides/... · Build a colored de Bruijn graph from the colored datasets. ! Iteratively remove errors, condense,

Assembly Scaffolding and Verification Using optical mapping

55

D. C. Schwartz, et al. Science (1993) 262.5130: 110-114

Source:  Wikipedia