Structure of the bacteriophage T4 long tail fiber receptor ...Structure of the bacteriophage T4 long...

6
Structure of the bacteriophage T4 long tail fiber receptor-binding tip Sergio G. Bartual a,1 , José M. Otero a,b , Carmela Garcia-Doval a , Antonio L. Llamas-Saiz c , Richard Kahn b , Gavin C. Fox b,2 , and Mark J. van Raaij a,d,3,4 a Departamento de Bioquimica y Biologia Molecular, Facultad de Farmacia, Campus Vida, Universidad de Santiago de Compostela, E-15782 Santiago de Compostela, Spain; b Laboratoire de Proteines Membranaires, Institut de Biologie Structurale Jean-Pierre Ebel, 41 rue Jules Horowitz, F-38027 Grenoble, France; c Unidad de Rayos X, Red de Infraestructuras de Apoyo a la Investigacion y al Desarrollo Tecnologico, Campus Vida, Universidad de Santiago de Compostela, E-15782 Santiago de Compostela, Spain; and d Departamento de Biologia Estructural, Instituto de Biologia Molecular de Barcelona, Consejo Superior de Investigaciones Cientificas, calle Baldiri Reixac 4, E-08028 Barcelona, Spain Edited by Jonathan A. King, Massachusetts Institute of Technology, Cambridge, MA, and accepted by the Editorial Board October 1, 2010 (received for review July 29, 2010) Bacteriophages are the most numerous organisms in the biosphere. In spite of their biological significance and the spectrum of potential applications, little high-resolution structural detail is available on their receptor-binding fibers. Here we present the crystal structure of the receptor-binding tip of the bacteriophage T4 long tail fiber, which is highly homologous to the tip of the bacteriophage lambda side tail fibers. This structure reveals an unusual elongated six- stranded antiparallel beta-strand needle domain containing seven iron ions coordinated by histidine residues arranged colinearly along the core of the biological unit. At the end of the tip, the three chains intertwine forming a broader head domain, which contains the putative receptor interaction site. The structure reveals a pre- viously unknown beta-structured fibrous fold, provides insights into the remarkable stability of the fiber, and suggests a framework for mutations to expand or modulate receptor-binding specificity. gene product 37 host cell attachment octahedral coordination viral fibers X-ray crystallography B acteriophages are exploited in an emergent array of applica- tions including the typing of bacteria (1), peptide display (2), and experimental phage therapy (35). Bacteriophages have also been extensively studied as model systems for fundamental processes such as viral infection and replication, gene transfer, protein folding, and assembly. Escherichia coli bacteriophage T4 (6), a member of the Myoviridae family of the Caudovirales order, has an exclusively lytic lifecycle. Host recognition occurs through a reversible interaction of the tip of the long tail fibers with lipopolysaccharides or with the outer membrane porin pro- tein C (7) (Fig. 1). Upon receptor binding, a recognition signal is sent to the baseplate (811), causing the short tail fibers to extend and irreversibly bind to the outer core region of the lipopolysac- charides (12). This binding is followed by contraction of the outer tail sheath (13, 14), penetration of the bacterial membrane by the hollow inner tail tube (Fig. 1B), and ejection of the viral DNA into the bacterium (15). The T4 long tail fibers are an assembly of four different proteins [gene product (gp) 34, gp35, gp36, and gp37; ref. 16] and can be separated into proximal and distal half-fiber segments of approximately 70 nm (17), hinged at an angle of around 160° (Fig. 1C). Proximal half-fibers are composed of trimers of gp34, followed by a monomer of gp35 forming the hinge or kneecap,whereas the distal half-fibers contain a tri- mer of gp36 (closest to gp35) and a trimer of gp37. Gene product 34 and gp37, as well as the short tail fiber protein gp12, need the chaperone gp57 for proper trimeric assembly; gp37 also requires gp38 (18). Eleven domains (D111) have been observed in the distal half-fiber by electron microscopy and D311 were assigned to gp37. Gene product 37 comprises 1,026 amino acids per mono- mer and D9 is predicted to start at residue 651, D10 at residue 803, and D11 at residue 926 (assuming that the protein chain is colinear with the fiber; ref. 17). Antibodies to gp37 inactivate T4 by blocking infection (19, 20). Experiments with hybrid phages suggest the receptor-binding region encompasses residues 907996 (21). Results Protein Expression, Purification and Crystallization. His-tagged gp37 (6511026) was coexpressed with its chaperones gp38 and gp57 similarly to previously described for gp37(121026) (22) and purified by metal affinity chromatography and anion exchange chromatography. After mild heat treatment (similar to that per- formed for gp12; ref. 23), trypsinization yielded the stable frag- Fig. 1. Bacteriophage T4 and its long tail fibers. (A and B) Schematic repre- sentations of bacteriophage T4 attached to a bacterial membrane before (Left) and after (Right) contraction of the outer tail tube. The tip domain of gp37 is boxed. (C) Schematic representation of the bacteriophage T4 long tail fiber. Domains P15 correspond to gp34; the kneecap domain (K-C) is formed by gp35, whereas the distal part of the fiber, consisting of gp36 and gp37, is divided into regions D111. The expressed protein, gp37(6511026), corresponds to D9-11 (larger gray box), whereas the crystallized frag- ment, gp37(7851026), corresponds to D1011 (smaller gray box). Author contributions: S.G.B. and M.J.v.R. designed research; S.G.B., J.M.O., C.G.-D., A.L.L.-S., R.K., G.C.F., and M.J.v.R. performed research; S.G.B. and C.G.-D. contributed new reagents/analytic tools; S.G.B., J.M.O., A.L.L.-S., R.K., G.C.F., and M.J.v.R. analyzed data; and S.G.B., J.M.O., A.L.L.-S., G.C.F., and M.J.v.R. wrote the paper. The authors declare no conflict of interest. This article is a PNAS Direct Submission. J.A.K. is a guest editor invited by the Editorial Board. Data deposition: The coordinates and structure factors (of remote, peak, and inflection point data) have been deposited in the Protein Data Bank, www.pdb.org (PDB ID code 2XGF). 1 Present address: Departamento de Cristalografia y Biologia Estructural, Instituto de Quimica-Fisica Rocasolano, Consejo Superior de Investigaciones Cientificas, calle Serrano 119, E-28006 Madrid, Spain. 2 Present address: Synchrotron Soleil, Ormes des Merisiers, F-91190 Saint Aubin, France. 3 Present address: Departamento de Estructura de Macromoleculas, Centro Nacional de Biotecnologia, Consejo Superior de Investigaciones Cientificas, calle Darwin 3, Campus de Cantoblanco, E-28049 Madrid, Spain. 4 To whom correspondence should be addressed. E-mail: [email protected]. www.pnas.org/cgi/doi/10.1073/pnas.1011218107 PNAS November 23, 2010 vol. 107 no. 47 2028720292 BIOCHEMISTRY Downloaded by guest on February 11, 2021

Transcript of Structure of the bacteriophage T4 long tail fiber receptor ...Structure of the bacteriophage T4 long...

Page 1: Structure of the bacteriophage T4 long tail fiber receptor ...Structure of the bacteriophage T4 long tail fiber receptor-binding tip Sergio G. Bartuala,1, José M. Oteroa,b, Carmela

Structure of the bacteriophage T4long tail fiber receptor-binding tipSergio G. Bartuala,1, José M. Oteroa,b, Carmela Garcia-Dovala, Antonio L. Llamas-Saizc, Richard Kahnb, Gavin C. Foxb,2,and Mark J. van Raaija,d,3,4

aDepartamento de Bioquimica y Biologia Molecular, Facultad de Farmacia, Campus Vida, Universidad de Santiago de Compostela, E-15782 Santiago deCompostela, Spain; bLaboratoire de Proteines Membranaires, Institut de Biologie Structurale Jean-Pierre Ebel, 41 rue Jules Horowitz, F-38027 Grenoble,France; cUnidad de Rayos X, Red de Infraestructuras de Apoyo a la Investigacion y al Desarrollo Tecnologico, Campus Vida, Universidad de Santiago deCompostela, E-15782 Santiago de Compostela, Spain; and dDepartamento de Biologia Estructural, Instituto de Biologia Molecular de Barcelona, ConsejoSuperior de Investigaciones Cientificas, calle Baldiri Reixac 4, E-08028 Barcelona, Spain

Edited by Jonathan A. King, Massachusetts Institute of Technology, Cambridge, MA, and accepted by the Editorial Board October 1, 2010 (received for reviewJuly 29, 2010)

Bacteriophages are themost numerous organisms in the biosphere.In spite of their biological significance and the spectrumof potentialapplications, little high-resolution structural detail is available ontheir receptor-binding fibers. Here we present the crystal structureof the receptor-binding tip of the bacteriophage T4 long tail fiber,which is highly homologous to the tip of the bacteriophage lambdaside tail fibers. This structure reveals an unusual elongated six-stranded antiparallel beta-strand needle domain containing seveniron ions coordinated by histidine residues arranged colinearlyalong the core of the biological unit. At the end of the tip, the threechains intertwine forming a broader head domain, which containsthe putative receptor interaction site. The structure reveals a pre-viously unknown beta-structured fibrous fold, provides insightsinto the remarkable stability of the fiber, and suggests a frameworkfor mutations to expand or modulate receptor-binding specificity.

gene product 37 ∣ host cell attachment ∣ octahedral coordination ∣viral fibers ∣ X-ray crystallography

Bacteriophages are exploited in an emergent array of applica-tions including the typing of bacteria (1), peptide display (2),

and experimental phage therapy (3–5). Bacteriophages have alsobeen extensively studied as model systems for fundamentalprocesses such as viral infection and replication, gene transfer,protein folding, and assembly. Escherichia coli bacteriophageT4 (6), a member of the Myoviridae family of the Caudoviralesorder, has an exclusively lytic lifecycle. Host recognition occursthrough a reversible interaction of the tip of the long tail fiberswith lipopolysaccharides or with the outer membrane porin pro-tein C (7) (Fig. 1). Upon receptor binding, a recognition signal issent to the baseplate (8–11), causing the short tail fibers to extendand irreversibly bind to the outer core region of the lipopolysac-charides (12). This binding is followed by contraction of the outertail sheath (13, 14), penetration of the bacterial membrane by thehollow inner tail tube (Fig. 1B), and ejection of the viral DNAinto the bacterium (15). The T4 long tail fibers are an assemblyof four different proteins [gene product (gp) 34, gp35, gp36, andgp37; ref. 16] and can be separated into proximal and distalhalf-fiber segments of approximately 70 nm (17), hinged at anangle of around 160° (Fig. 1C). Proximal half-fibers are composedof trimers of gp34, followed by a monomer of gp35 forming thehinge or “kneecap,” whereas the distal half-fibers contain a tri-mer of gp36 (closest to gp35) and a trimer of gp37. Gene product34 and gp37, as well as the short tail fiber protein gp12, need thechaperone gp57 for proper trimeric assembly; gp37 also requiresgp38 (18). Eleven domains (D1–11) have been observed in thedistal half-fiber by electron microscopy and D3–11 were assignedto gp37. Gene product 37 comprises 1,026 amino acids per mono-mer and D9 is predicted to start at residue 651, D10 at residue803, and D11 at residue 926 (assuming that the protein chain iscolinear with the fiber; ref. 17). Antibodies to gp37 inactivate T4by blocking infection (19, 20). Experiments with hybrid phages

suggest the receptor-binding region encompasses residues 907–996 (21).

ResultsProtein Expression, Purification and Crystallization. His-tagged gp37(651–1026) was coexpressed with its chaperones gp38 and gp57similarly to previously described for gp37(12–1026) (22) andpurified by metal affinity chromatography and anion exchangechromatography. After mild heat treatment (similar to that per-formed for gp12; ref. 23), trypsinization yielded the stable frag-

Fig. 1. Bacteriophage T4 and its long tail fibers. (A and B) Schematic repre-sentations of bacteriophage T4 attached to a bacterial membrane before(Left) and after (Right) contraction of the outer tail tube. The tip domainof gp37 is boxed. (C) Schematic representation of the bacteriophage T4 longtail fiber. Domains P1–5 correspond to gp34; the kneecap domain (K-C) isformed by gp35, whereas the distal part of the fiber, consisting of gp36and gp37, is divided into regions D1–11. The expressed protein, gp37(651–1026), corresponds to D9-11 (larger gray box), whereas the crystallized frag-ment, gp37(785–1026), corresponds to D10–11 (smaller gray box).

Author contributions: S.G.B. and M.J.v.R. designed research; S.G.B., J.M.O., C.G.-D.,A.L.L.-S., R.K., G.C.F., and M.J.v.R. performed research; S.G.B. and C.G.-D. contributednew reagents/analytic tools; S.G.B., J.M.O., A.L.L.-S., R.K., G.C.F., and M.J.v.R. analyzeddata; and S.G.B., J.M.O., A.L.L.-S., G.C.F., and M.J.v.R. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission. J.A.K. is a guest editor invited by theEditorial Board.

Data deposition: The coordinates and structure factors (of remote, peak, and inflectionpoint data) have been deposited in the Protein Data Bank, www.pdb.org (PDB IDcode 2XGF).1Present address: Departamento de Cristalografia y Biologia Estructural, Instituto deQuimica-Fisica Rocasolano, Consejo Superior de Investigaciones Cientificas, calle Serrano119, E-28006 Madrid, Spain.

2Present address: Synchrotron Soleil, Ormes des Merisiers, F-91190 Saint Aubin, France.3Present address: Departamento de Estructura de Macromoleculas, Centro Nacional deBiotecnologia, Consejo Superior de Investigaciones Cientificas, calle Darwin 3, Campusde Cantoblanco, E-28049 Madrid, Spain.

4To whom correspondence should be addressed. E-mail: [email protected].

www.pnas.org/cgi/doi/10.1073/pnas.1011218107 PNAS ∣ November 23, 2010 ∣ vol. 107 ∣ no. 47 ∣ 20287–20292

BIOCH

EMISTR

Y

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 11

, 202

1

Page 2: Structure of the bacteriophage T4 long tail fiber receptor ...Structure of the bacteriophage T4 long tail fiber receptor-binding tip Sergio G. Bartuala,1, José M. Oteroa,b, Carmela

ment gp37(785–1026), which was purified by size exclusion chro-matography. Manganese (II) chloride was included in the finalstep of the preparation as it was identified as a stabilizing agent,although no evidence for ordered manganese ions was found inthe final structure. Average yields of purified gp37(651–1026)were around 1.7 mg per liter of bacterial culture before proteo-lysis and highly purified gp37(785–1026) was obtained in yields of0.36 mg∕L. A stable fragment of 24.6 kDa, gp37(785–1026), wasidentified by N-terminal sequence analysis and mass spectro-scopic peptide fingerprinting. Clusters of small gp37(785–1026)crystals were grown by vapor diffusion from solutions containingpolyethyleneglycol and sodium citrate at pH 5.0.

Data Collection and Structure Solution.An X-ray fluorescence emis-sion spectrummeasured from a single crystal measuring 50 × 20 ×5 μm3 indicated the presence of iron, presumably Fe2þ, becausethe crystals were not appreciably colored. Therefore, a multiwa-velength anomalous diffraction dataset was collected around theiron absorption edge from the same crystal. Seven iron ion siteswere located during experimental phasing, consistent with thepresence of seven His-X-His motifs in the gp37(785–1026) se-quence. After solvent flattening, the resulting map displayed asolvent boundary corresponding well with the expected outlinesof the tip of the T4 long tail fiber. Although detail in the initialexperimental map was poor, it was possible to manually positionthe homologous part of the collar domain of gp12 (24, 25) intothe globular region. The density surrounding the iron ions wascompatible with an octahedral coordination sphere involvingsix histidine residues per ion. This observation, combined withthe realization that both the N and C termini are located in thecollar domain, the geometric impossibility of the chain going allthe way to the tip and back if it did not form an almost continuousextended strand, and the fact that the distance between the ironions and the spacing of the His-X-His motifs in the sequenceallowed unequivocal assignment of each iron ion to the correctHis-X-His motif, made it possible to manually trace a partialmodel. Using this model and the peak wavelength data, a com-bined single anomalous diffraction/Fourier synthesis map wascalculated, allowing tracing of additional residues. Subsequently,a complete simulated annealing omit map was calculated, whichallowed further additions. Finally, using positive difference mapsresulting from refinement, the model could be completed and re-fined to satisfactory geometry (Table 1). The final model containsresidues 811–1026 from all three chains; residues 785–810 are notvisible in the electron density maps. The final model, refined at2.2-Å resolution, includes residues 811–1026 from all three chainsand has been refined to Rwork∕Rfree of 17.9∕23.8%.

Description of the Structure.The structure (Figs. 2 and 3) reveals anelaborately interwoven trimer formed by a globular collar domain(about 45-Å wide), an elongated needle domain (around 15-Åwide), and a small head domain with a width of around 25 Å.The total length is just over 200 Å (20 nm). Each of the threechains runs from the collar domain to the end of the tip and twistsaround a neighboring chain before turning back, with both theN and C terminus located near the bottom of the collar domain.Amino acids 811–860 and 1016–1026 form the trimeric collardomain (Fig. 3A). Each monomer comprises a sandwich of twoantiparallel beta-sheets (one containing three strands, the othertwo) with an alpha-helix (residues 843–850) at one end of thebeta-sandwich.

Located next to the collar domain is an intricately intertwinedregion in which residues 861–880 encircle residues 1009–1015of the neighboring chain. This region is followed by the needledomain, which is a 150-Å-long six-stranded antiparallel right-handed twisted circular sheet formed by residues 881–933 and960–1008 from each of the three chains (Fig. 3B). Its diameteris roughly 15 Å, with each of the chains completing one-and-a-half

turns (about 540°) around the fiber axis. In the core of the needledomain, hydrophobic and hydrophilic regions alternate, with thelatter forming the metal-binding sites. The seven iron ions arecoordinated octahedrally by the Nϵ atoms of two histidine resi-dues from each chain, in a similar fashion to the zinc ion in thestructure of gp12 (25). Iron ions 1, 5, and 7 are bound by theHis-883/His-885, His-915/His-917, and His-929/His-931 doublets,respectively, whereas the His-966/His-968, His-980/His-982, His-989/His-991, and His-998/His-1000 pairs from the returningstrand coordinate iron ions 6, 4, 3, and 2 (Fig. 3B). There areno water molecules on the threefold central axis, apart from inthe collar domain, near the border with the needle domain. Wealso modeled a triad of water molecules around the threefoldfiber axis between iron ions 6 and 7. The biological significance,if any, of these water molecules is not clear at this point, but theymay indicate some flexibility in these regions.

Residues 934–959 from each of the three chains form a com-pact, interwoven head domain of 22 Å in diameter and 18-Å high(Fig. 3C andD). Amino acids 934–947 loop around a neighboringchain; the chain then threads through the loop of a neighbor,before turning back into the needle domain. The head domain,located at the extreme distal end of the long tail fiber, is likelyto play a primary role in receptor binding. As gp37 is known tointeract with the glucosyl-alpha-1,3-glucose terminus of lipopoly-saccharides (7) and protein–saccharide interactions almost alwaysinvolve stacking of sugar residues onto aromatic amino acid sidechains (26), aromatic surface amino acids (Tyr-932, Trp-936, Tyr-949, and Tyr-953) are attractive candidates for receptor binding.Lys-945 (near Trp-936) and Arg-954 (near Tyr-932 of a neighbor-ing chain) may also interact with the lipopolysaccharides phos-phate groups.

Stability and Folding. Of the total surface area of each gp37(811–1026) monomer, 57% (14.5 × 103 Å2) is buried within the trimerinterface, whereas the estimated energy gain upon complexformation is 270 kcal∕mol. Like gp37(12–1026) (21), gp37(785–1026) is resistant to denaturation by sodium dodecylsulphate atroom temperature. The complex interweaving suggests thatgp37 is unlikely to exist as a stable monomer and potentiallyaccounts for the requirement for chaperones gp57 and gp38 forcorrect folding. Gene product 57, also necessary for the produc-tive folding of the short tail fiber protein gp12 and the proximallong tail fiber protein gp34, may be involved in keeping unfoldedmonomers apart until the collar domain trimerizes. Gene pro-duct 38 is exclusively required for gp37 folding and may have amore specialized function.

Structural Homologues. The closest structural homologue identi-fied by the DALI server (27) is gp12 (24, 25), which trimerizesto form the T4 short tail fiber. The five beta-strands and thealpha-helix of the gp37 collar domain can be superimposed ontofive of the six beta-strands and an alpha-helix of the gp12 collardomain (Fig. 3E) with an rmsd of 2.7 Å over 115 C-alpha atoms.Similarity extends to the intertwined region adjacent to the collardomain and the first metal-binding site. His-885 and His-887superpose well onto His-445 and His-447 of gp12, respectively,although gp12 binds a zinc ion instead of iron (25, 28). Geneproduct 10 (29) also exhibits significant structural homology togp37, and 87 C-alpha atoms can be superposed with an rmsd of4.8 Å. Three beta-strands of the collar domain of gp37 can besuperimposed onto similar strands in the collars of gp10 andthe closely related protein, gp11 (30). In the case of gp10, struc-tural similarity also extends into the intertwined region next tothe collar domain (Fig. 3G). The structural similarity of gp10,gp11, gp12, and gp37 makes it probable that these genes evolvedfrom a common ancestor (29).

20288 ∣ www.pnas.org/cgi/doi/10.1073/pnas.1011218107 Bartual et al.

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 11

, 202

1

Page 3: Structure of the bacteriophage T4 long tail fiber receptor ...Structure of the bacteriophage T4 long tail fiber receptor-binding tip Sergio G. Bartuala,1, José M. Oteroa,b, Carmela

DiscussionBioinformatic analysis (31) reveals sequence similarity to fibersfrom various phages and prophages (including several pathogenssuch as Shigella dysenteria, Yersinia pestis, and Salmonella enterica).When the sequences of the tip domains of bacteriophage T4 gp37,TuIa, and TuIb gp37 (allMyoviridae) and of the Siphovirus bacter-iophage lambda Ur side tail fibers (32) are aligned (Fig. 4), exten-sive similarity is evident for residues 811–931 and 966–1026, i.e.,for the whole tip domain except the putative receptor-bindinghead domain. The conservation pattern suggests the structuralframework of the tip domain is maintained intact, whereas the

head domain has diverged to acquire specific receptor-bindingproperties. It has been suggested (33) that T4, TuIa, and TuIbmay have evolved from the T2 lineage and incorporated theC-terminal segment of the side tail fiber and the lambda tail fiberassembly protein by recombination with lambda or a close rela-tive. This hypothesis was proposed based on experimental datashowing that the C-terminal region of lambda side tail fiberscan functionally substitute for gp37 in receptor binding, whereasthe lambda tail fiber assembly protein ltfa can functionally substi-tute for gp38 in mediating the correct folding of gp37.

Table 1. Crystallographic data and refinement statistics

Data collectionSpace group C2

Cell parameters (a, b, c), Å157.3, 54.0, 112.8

(β ¼ 100.4°)

peak inflection remoteBeamline (ESRF) ID23-1 ID23-1 ID23-2Detector ADSC Q315r CCD ADSC Q315r CCD MarMOSAIC CCDDistance, mm 179.9 179.9 210.9Wavelength, Å 1.73945 1.74115 0.87260Resolution, Å 22-3.0 (3.16-3.00)* 22-2.5 (2.64-2.50) 22-2.2 (2.32-2.20)Observed reflections† 19,041 (2,732) 32,101 (4,563) 47,653 (6,906)Multiplicity 3.5 (3.5) 3.6 (3.6) 3.8 (3.8)Completeness, % 99.2 (99.0) 99.2 (98.5) 99.9 (99.9)Rsym,

‡ % 14.8 (44.4) 10.4 (43.9) 13.2 (44.0)hI∕sigmaðIÞi 4.0 (1.6) 5.3 (1.5) 5.0 (1.6)

PhasingResolution range used, Å 22-2.2 (2.32-2.20)No. of reflections 46,766 (6,722)Heavy atom sites§ 7 Fe2þ

Correlation coefficient (all/weak) 46.15∕20.72Patterson figure of merit 62.59Correlation coefficient (E) 0.456

R-Cullis¶

Isomorphous (acentric/centric) 0.835∕0.855 0.457∕0.479 —/—Anomalous (acentric) 0.964 0.986 0.998

Phasing power¶

Isomorphous (acentric/centric) 0.268∕0.230 0.873∕0.654 —/—Anomalous (acentric) 0.426 0.246 0.125Figure of merit cosðphase errorÞ (acentric/centric)¶ 0.178∕0.195Solvent flattening (27 cycles with 56.0% solvent content)∥

R factor (before/after density modification)∥ 0.5250∕0.2820Overall correlation on jEj2 (before/after density modification)∥ 0.1549∕0.7196

Hand score ¼ correlation on jEj2∕contrast (original/inverted) 0.2190∕0.2178

Refinement statisticsResolution range used, Å 20-2.2 (2.32-2.20)No. of reflections used 45,619 (6,609)No. of reflections used for R free 2,028 (226)R factor** 0.179 (0.237)R-free 0.238 (0.292)No. of protein/water atoms 4;656∕665Ions 7 Fe2þ∕1CO2−

3Average B-value protein/water atoms, Å2 19.5∕27.5Average B-value iron/carbonate ions, Å2 22.3∕50.9Wilson B, Å2 22Ramachandran statistics, †† % 98.4∕100.0rmsd‡‡ (bonds, Å; angles, °) 0.014∕1.4

*Values in parentheses are for the highest resolution bin, where applicable.†No sigma cutoff was used for inclusion of observed reflections.‡Rsym ¼ ΣhΣi jIhi-hIhij∕ΣhΣi jIhi j, where Ihi is the intensity of the ith measurement of the same reflection and hIhi is the mean observed intensity for thatreflection.

§Determined with SHELXD (39).¶Calculated with SHARP (40).∥According to SOLOMON (41).**R ¼ ΣjjFobsðhklÞj-jFcalcðhklÞjj∕ΣjFobsðhklÞj.††According to the program MOLPROBITY (46). The percentages are indicated of residues in favored and allowed regions of the Ramachandran plot,respectively.

‡‡Estimates provided by the program REFMAC (45).

Bartual et al. PNAS ∣ November 23, 2010 ∣ vol. 107 ∣ no. 47 ∣ 20289

BIOCH

EMISTR

Y

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 11

, 202

1

Page 4: Structure of the bacteriophage T4 long tail fiber receptor ...Structure of the bacteriophage T4 long tail fiber receptor-binding tip Sergio G. Bartuala,1, José M. Oteroa,b, Carmela

The outer diameter of the head domain and the inner diameterof the surface cavity of the also trimeric outer membrane porinprotein C (34) are both just under 25 Å. The very tip of the bac-teriophage long tail fiber fits snugly into the mainly negativelycharged outer cavity of its receptor outer membrane porin proteinC (the gp37 head domain is uncharged apart from two smallpositive patches on the sides corresponding to Lys-945 and Arg-954). Automated docking experiments were performed with outermembrane porin protein C and gp37(811–1026) or the head do-main plus a part of the needle domain (residues 918–973). Of thesolutions obtained, many docked the side of the gp37 tip onto the

hydrophobic side of the porin. Because this region would normallybe covered by lipids in the membrane, these solutions were re-jected. No solutions were obtained with gp37 docked on the innermembrane side of the porin. Of the remaining solutions, manyaligned the threefold axes of both trimeric molecules, althoughsome solutions with the tip oriented at an angle were observed.In both kinds of solutions, the head domain is consistently placedinside the extracellular cavity of the porin, when interactions withthe hydrophobic membrane-interacting regions are excluded. InFig. 5 the top “symmetric” solution and the solution with the lar-gest angle are shown; in the latter, the side of the gp37 needle

Fig. 2. Structure of gp37(785–1026). Chains A, B, and C are colored red, green, and blue, respectively; iron ions are shown in yellow. (A) Ribbonrepresentation. The N and C termini and every 10th residue of chain A are labeled. (B and C) Surface representations of the structure of gp37(785–1026)seen from the side (B) and top (C) to illustrate the extensive intertwining of the three protein chains in the trimer; domains are indicated; i.r. is the intertwinedregion between the collar and needle domains.

Fig. 3. Domain structure of gp37(785–1026). (A) Walleyed stereo representation of the collar domain viewed from the direction of the needle domain. (B) Theneedle domain viewed from the side. Iron ions are represented as yellow balls. The histidine doublets coordinating the iron ions are shown and labeled withletters a-g; iron ions with numbers 1–7. (C and D) Side (C) and top (D) view of the head domain. Aromatic and basic side chains are shown and labeled (R954 ishidden behind Y953 in D). (E) Comparison of gp37(811–1026) with the bacteriophage T4 baseplate proteins gp12 and gp10. Superposition of monomers ofgp37(811–1026) in red, onto gp12(330–527) in blue, and gp10 in green. Iron ions belonging to the gp37(811–1026) structure are shown in yellow, whereas thezinc ion identified in gp12 is shown in gray.

20290 ∣ www.pnas.org/cgi/doi/10.1073/pnas.1011218107 Bartual et al.

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 11

, 202

1

Page 5: Structure of the bacteriophage T4 long tail fiber receptor ...Structure of the bacteriophage T4 long tail fiber receptor-binding tip Sergio G. Bartuala,1, José M. Oteroa,b, Carmela

contacts surface loops of the porin. Interactions are either “head-on” or transversal; both are potentially relevant in recognition andinfection and could represent different phage approach anglesand be compatible with a conformational change of the baseplatethat varies the attachment angle of the long tail fiber.

The present structure provides insights into the conservedmolecular architecture of the T4 bacteriophage fiber tip and sug-gests the surface and residues that are most likely to be involvedin receptor binding. Several surface-exposed aromatic and posi-tive residues are prime candidates for mutagenesis studies to dis-sect the binding determinants and modulate the receptor-bindingproperties of this fiber. Future studies directed at the remainingcomponents of the long tail fiber will provide valuable insightsinto this remarkable molecular machine.

Materials and MethodsConstruction of Expression Vectors. Sequences representing coding regions forgp38 and gp37(651–1026) were cloned into pCDF-Duet and pET30a(+)(Merck), respectively. The resulting plasmids were designated pCDF(Sm)g38 and pET(Kn)g37(651-1026). The vector pET(Ap)g57 was provided byStefan Miller. Gene product 37(651–1026) was expressed with an additionalN-terminal six-histidine tag.

Protein Expression and Purification. Four liters of growth media (22) supple-mented with ampicillin (50 mg∕L), streptomycin (50 mg∕L), and kanamycin(25 mg∕L) were inoculated with the E. coli strain JM109(DE3) (Promega)cotransformed with pET(Kn)g37(651-1026), pET(Ap)g57, and pCDF(Sm)g38.Expression was induced at 16 °C and harvesting of bacteria performed asdescribed (22). The cells were resuspended in 40 mL of 50 mM sodium phos-phate pH 8.0, 0.3 M sodium chloride, 10 mM beta-mercaptoethanol, 10 mMimidazole, 1% glycerol, and protease inhibitors, frozen at −20 °C, and lysedby a double pass through an Avestin C5 emulsifier (Avestin, Europe, GmbH).Lysates were centrifuged at 39;000 × g and 10 °C for 40 min. Supernatantcontaining soluble His-tagged gp37(651–1026) was loaded onto a 5ml nickel-nitrilotriacetic acid agarose (Qiagen) column preequilibrated with elutionbuffer (50 mM sodium phosphate pH 8.0, 0.3 M sodium chloride, 10 mMbeta-mercaptoethanol). The recombinant protein was eluted with a stepgradient of imidazole in elution buffer. The 0.25–0.4 mM imidazole fractionscontained gp37(651–1026) and were combined and dialyzed overnight at

4 °C against 10 mM Tris • HCl pH 8.5. The protein was applied to a 6 mLUno-Q column (Bio-Rad) equilibrated with the same buffer and eluted witha sodium chloride gradient. Highly purified gp37(651–1026) eluted at around0.1 M sodium chloride. Gene product 37(651–1026) was concentrated to10 mg∕mL using 10 kDa cutoff centrifuge filters (Millipore) and bufferexchanged into 20 mM ammonium bicarbonate pH 7.8, 150 mM sodiumchloride in the same step. One milliliter fractions of the concentrated proteinwere heat treated by incubation at 56 °C for 30 min. After cooling to 37 °C,13.3 μg of sequencing grade modified trypsin (Promega) was added to theprotein and the mixture was incubated for 80 min at 37 °C. The reaction wasstopped using 1 mM phenylmethylsulfonyl fluoride. The resulting mixturewas loaded onto a Hiload 16∕60 sephacryl 300 column (GE Healthcare Bio-Sciences) equilibrated with 10 mM Tris • HCl pH 8.5, 150 mM sodium chloride,1 mM manganese (II) chloride. Elution was done in the same buffer at a flowrate of 0.5 mL∕min. Peak fractions containing proteolyzed gp37(785–1026)were concentrated to 8 mg∕mL and buffer exchanged into 10 mM 4-(2-hy-droxyethyl)-1-piperazineethanesulfonic acid-NaOH pH 7.5, 1 mMmanganese(II) chloride. Manganese (II) chloride was identified as a stabilizing agent bya thermofluor assay performed according standard protocols (35).

Crystallization, Data Collection, and Structure Solution. Crystallization was byvapor diffusion in sitting drops of 2 μL protein solution plus 2 μL of a reservoirsolution containing 5% (wt∕vol) polyethyleneglycol 6,000 and 0.1 M sodiumcitrate pH 5.0. Crystals were harvested in reservoir solution supplementedwith 20% (wt∕vol) glycerol, mounted into a cryoloop, and flash-frozen inliquid nitrogen for data collection. The presence of iron in the crystals wasdetermined from X-ray fluorescence emission spectra recorded on BM30A atthe European Synchrotron Radiation Facility (ESRF). A three-wavelength mul-tiwavelength anomalous diffraction experiment was subsequently carriedout on beamline ID23-1 at the ESRF. Due to radiation damage, the remotedataset was not included in the analysis; instead, a higher resolution datasetcollected on ID23-2 was used as a remote and reference dataset. All datawere processed and scaled using MOSFLM (36) and SCALA (37) and furtheranalyzed using programs from the CCP4 suite (38). Reflections for calculationof Rfreewere selected in thin resolution shells. The initial sites were located

Fig. 5. Docking of gp37 into outer membrane porin protein C. Two super-imposed results of automatic docking experiments are shown with the gp37(811–1026) trimer in blue, a truncated trimeric gp37(918–973) model in red,and the three subunits of the outer membrane porin protein C trimer (PDBcode 2J1N) in green, magenta, and cyan.

Fig. 4. Alignment of the tip domains of the long tail fiber gp37 proteins ofbacteriophages T4 (UniProt code P03744), TuIa (S13237), and TuIb (S13239)and of the side tail fiber of bacteriophage lambda (P03764). Of the lattertwo, only the sequence of the C-terminal 382 and 267 amino acids are known,respectively, although their entire gp37 proteins are expected to be similar insize to gp37 of T4. Identical residues are indicated with asterisks and similarones with dots. Hydrophobic residues contributing to the central longitudinalcore of the needle domain are boxed in gray; His-X-His motifs are also labeledwith letters on top of the alignment. A deletion of 10 amino acids in the T4needle domain after residue 909 with respect to the others is compensatedfor by a deletion of nine amino acids just before residue 979 in the “return”strand; these last nine amino acids contain a putative eighth metal-bindingsite His-Ala-His for TuIa, TuIb, and lambda.

Bartual et al. PNAS ∣ November 23, 2010 ∣ vol. 107 ∣ no. 47 ∣ 20291

BIOCH

EMISTR

Y

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 11

, 202

1

Page 6: Structure of the bacteriophage T4 long tail fiber receptor ...Structure of the bacteriophage T4 long tail fiber receptor-binding tip Sergio G. Bartuala,1, José M. Oteroa,b, Carmela

using SHELXD (39) and refined within AUTOSHARP (40). Solvent flatteningwas with SOLOMON (41) and model building was performed with COOT(42). Combined single wavelength anomalous diffraction/Fourier synthesisand simulated annealing omit maps were calculated using PHENIX (43), inwhich PHASER performs the phase combination using a maximum likelihoodprocedure (44). Refinement was performed with REFMAC (45) and validationwas carried out using MOLPROBITY (46). Loose noncrystallographic restraintswere used in the final refinement step. Complex assembly parameters wereestimated with PISA (47). Automated docking was performed with PATCH-DOCK (48) and HEX (49), using their respective Web servers and defaultparameters, inputting Protein Data Bank files from which water molecules

had been removed. Structure figures were prepared with PYMOL (PyMOLMolecular Graphics System, Schrödinger, LLC).

ACKNOWLEDGMENTS. We thank Javier Varela for N-terminal sequenceanalysis, Jana Alonso for mass spectroscopy, Stefan Miller for providingthe gp57 expression vector and the ESRF for measurement time on beamlinesBM30A, ID23-1, and ID23-2. This research was sponsored by Grant BFU2008-01588 (to M.J.v.R.), a José Castillejo fellowship (J.M.O.), and a Formaciondel Profesorado Universitario Fellowship (C.G.D.) from the Spanish Ministryof Education and Science. This work was also supported by the EuropeanCommission under Contract NMP4-CT-2006-033256 and by the Xunta deGalicia via an Angeles Alvariño fellowship (J.M.O.).

1. Hagens S, Loessner MJ (2007) Application of bacteriophages for detection and controlof foodborne pathogens. Appl Microbiol Biotechnol 76:513–519.

2. Petrenko VA, Vodyanoy VJ (2003) Phage display for detection of biological threatagents. J Microbiol Methods 53:253–262.

3. Parisien A, Allain B, Zhang J, Mandeville R, Lan CQ (2008) Novel alternatives toantibiotics: Bacteriophages, bacterial cell wall hydrolases, and antimicrobial peptides.J Appl Microbiol 104:1–13.

4. Chanishvili N, Sharp RA (2009) Literature Review of the Practical Application ofBacteriophage Research (Eliava Inst of Bacteriophage, Microbiology and Virology,Tbilisi, Georgia).

5. Wright A, Hawkins CH, Anggard EE, Harper DR (2009) A controlled clinical trial of atherapeutic bacteriophage preparation in chronic otitis due to antibiotic-resistantPseudomonas aeruginosa; a preliminary report of efficacy. Clin Otolaryngol34:349–357.

6. Karam JD (1994) Molecular biology of bacteriophage T4. (Am Society for Microbiol-ogy, Washington, DC).

7. Yu F, Mizushima S (1982) Roles of lipopolysaccharide and outer membrane proteinOmpC of Escherichia coli K-12 in the receptor function for bacteriophage T4. J Bacter-iol 151:718–722.

8. Crowther RA, Lenk EV, Kikuchi Y, King J (1977) Molecular reorganization in the hexa-gon to star transition of the baseplate of bacteriophage T4. J Mol Biol 116:489–523.

9. Kostyuchenko VA, et al. (2003) Three-dimensional structure of bacteriophage T4baseplate. Nat Struct Biol 10:688–693.

10. Leiman PG, Chipman PR, Kostyuchenko VA, Mesyanzhinov VV, Rossmann MG (2004)Three-dimensional rearrangement of proteins in the tail of bacteriophage T4 oninfection of its host. Cell 118:419–429.

11. Aksyuk AA, Leiman PG, Shneider MM, Mesyanzhinov VV, Rossmann MG (2009)The structure of gene product 6 of bacteriophage T4, the hinge-pin of the baseplate.Structure 17:800–808.

12. Riede I (1987) Receptor specificity of the short tail fibres (gp12) of T-even type Escher-ichia coli phages. Mol Gen Genet 206:110–115.

13. Kostyuchenko VA, et al. (2005) The tail structure of bacteriophage T4 and its mechan-ism of contraction. Nat Struct Mol Biol(12):810–813.

14. Aksyuk AA, et al. (2009) The tail sheath structure of bacteriophage T4: A molecularmachine for infecting bacteria. EMBO J 28:821–819.

15. Rossmann MG, Mesyanzhinov VV, Arisaka F, Leiman PG (2004) The bacteriophage T4DNA injection machine. Curr Opin Struct Biol 14:171–180.

16. King J, Laemmli UK (1971) Polypeptides of the tail fibres of bacteriophage T4. J MolBiol 62:465–477.

17. Cerritelli ME, Wall JS, Simon MN, Conway JF, Steven AC (1996) Stoichiometry anddomainal organization of the long tail-fiber of bacteriophage T4: A hinged viraladhesin. J Mol Biol 260:767–780.

18. Hashemolhosseini S, Stierhof YD, Hindennach I, Henning U (1996) Characterizationof the helper proteins for the assembly of tail fibers of coliphages T4 and lambda,.J Bacteriol 178:6258–6265.

19. Edgar RS, Lielausis I (1965) Serological studies with mutants of phage T4D defective ingenes determining tail fiber structure. Genetics 52:1187–1200.

20. King J, Wood WB (1969) Assembly of bacteriophage T4 tail fibers: The sequence ofgene product interaction. J Mol Biol 39:583–601.

21. Montag D, Hashemolhosseini S, Henning U (1990) Receptor-recognizing proteins ofT-even type bacteriophages. The receptor-recognizing area of proteins 37 of phagesT4 TuIa and TuIb. J Mol Biol 216:327–334.

22. Bartual SG, Garcia-Doval C, Alonso J, Schoehn G, van Raaij MJ (2010) Two-chaperoneassisted soluble expression and purification of the bacteriophage T4 long tail fibreprotein gp37. Protein Expression Purif 70:116–121.

23. van Raaij MJ, et al. (2001) Identification and crystallisation of a heat- and protease-stable fragment of the bacteriophage T4 short tail fibre. Biol Chem 382:1049–1055.

24. van Raaij MJ, Schoehn G, Burda MR, Miller S (2001) Crystal structure of a heat andprotease-stable part of the bacteriophage T4 short tail fibre. J Mol Biol 314:1137–1146.

25. Thomassen E, et al. (2003) The structure of the receptor-binding domain of thebacteriophage T4 short tail fibre reveals a knitted trimeric metal-binding fold.J Mol Biol 331:361–373.

26. Vyas NK (1991) Atomic features of protein-carbohydrate interactions. Curr Opin StructBiol 1:732–740.

27. Holm L, Kaariainen S, Rosenstrom P, Schenkel A (2008) Searching protein structuredatabases with DaliLite v.3. Bioinformatics 24:2780–2781.

28. Zorzopulos J, Kozloff LM (1978) Identification of T4D bacteriophage gene product 12as the base-plate zinc metalloprotein. J Biol Chem 253:5543–5547.

29. Leiman PG, Shneider MM, Mesyanzhinov VV, Rossmann MG (2006) Evolution ofbacteriophage tails: Structure of T4 gene product 10. J Mol Biol 358:912–921.

30. Leiman PG, et al. (2000) Structure of bacteriophage T4 gene product 11, the interfacebetween the baseplate and short tail fibers. J Mol Biol 301:975–985.

31. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment searchtool. J Mol Biol 215:403–410.

32. Hendrix RW, Duda RL (1992) Bacteriophage lambda PaPa: Not the mother of all lamb-da phages. Science 258:1145–1148.

33. Montag D, Schwarz H, Henning U (1989) A component of the side tail fiber ofEscherichia coli bacteriophage lambda can functionally replace the receptor-recogniz-ing part of a long tail fiber protein of the unrelated bacteriophage T4. J Bacteriol171:4378–4384.

34. Basle A, Rummel G, Storici P, Rosenbusch JP, Schirmer T (2006) Crystal structure ofosmoporin OmpC from E. coli at 2.0 Å. J Mol Biol 362:933–942.

35. Ericsson UB, Hallberg BM, Detitta GT, Dekker N, Nordlund P (2006) Thermofluor-based high-throughput stability optimization of proteins for structural studies. AnalBiochem 357:289–298.

36. Leslie AG (2006) The integration of macromolecular diffraction data. Acta Crystallogr,Sect D: Biol Crystallogr 62:48–57.

37. Evans P (2006) Scaling and assessment of data quality. Acta Crystallogr, Sect D: BiolCrystallogr 62:72–82.

38. Collaborative Computational Project Number 4 (1994) The CCP4 Suite: Programs forProtein Crystallography. Acta Crystallogr, Sect D: Biol Crystallogr 50:760–763.

39. Sheldrick GM (2008) A short history of SHELX. Acta Crystallogr, Sect A: Found Crystal-logr 64:112–122.

40. Vonrhein C, Blanc E, Roversi P, Bricogne G (2007) Automated structure solution withAUTOSHARP. Methods Mol Biol 364:215–230.

41. Abrahams JP, Leslie AG (1996) Methods used in the structure determination of bovinemitochondrial F1 ATPase. Acta Crystallogr, Sect D: Biol Crystallogr 52:30–42.

42. Emsley P, Lohkamp B, Scott WG, Cowtan K (2010) Features and development of Coot.Acta Crystallogr, Sect D: Biol Crystallogr 66:486–501.

43. Adams PD, et al. (2010) PHENIX—a comprehensive Python-based system for macromo-lecular structure solution. Acta Crystallogr, Sect D: Biol Crystallogr 66:213–221.

44. McCoy AJ, Read RJ (2010) Experimental phasing: Best practice and pitfalls. Acta Crys-tallogr, Sect D: Biol Crystallogr 66:458–469.

45. Murshudov GN, Vagin AA, Dodson EJ (1997) Refinement of macromolecular structuresby the maximum-likelihood method. Acta Crystallogr, Sect D: Biol Crystallogr53:240–255.

46. Chen VB, et al. (2010) MolProbity: All-atom structure validation for macromolecularcrystallography. Acta Crystallogr, Sect D: Biol Crystallogr 66:12–21.

47. Krissinel E, Hendrick K (2007) Inference of macomolecular assemblies from crystallinestate. J Mol Biol 372:774–797.

48. Schneidman-Duhovny D, Inbar Y, Nussinov R, Wolfson HJ (2005) PatchDock andSymmDock: Servers for rigid and symmetric docking. Nucleic Acids Res 33:W363–367.

49. Macindoe G, Mavridis L, Venkatraman V, Devignes MD, Ritchie DW (2010) HexServer:An FFT-based protein docking server powered by graphics processors. Nucleic AcidsRes 38:W445–W449.

20292 ∣ www.pnas.org/cgi/doi/10.1073/pnas.1011218107 Bartual et al.

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 11

, 202

1