1 · Web viewtwo leptons and neutrinos and two b quark jets (see the diagram in Fig. 3.3). The...

B Tagging in the DØ Experiment using an Artificial Neural Network

Michiel Vogelvang

University Of

Amsterdam

The National

Institute for Nuclear

Physics and High Energy Physics

The DØ Experiment

page 2

B Tagging in the DØ Experiment using an Artificial Neural Network

Michiel Vogelvang

Afstudeerrichting Experimentele Fysica

Faculteit der Natuurwetenschappen, Wiskunde en Informatica

Universiteit van Amsterdam

NIKHEF< Nationaal Instituut voor Kern- en Hoge Energie Fysica

August 2004

Supervisor: Dr. M. Vreeswijk.

Abstract

This report summaries the work I have done during my research period at NIKHEF. The main goal of this thesis is to study the use of an artificial neural network in b-tagging for the DØ Experiment at the Fermi National Accelerator Laboratory (Fermilab). I studied jet variables based on vertex-tracking and calorimeter information from a tt sample to provide the signal events and a QCD sample to provide the background events. In this thesis a Probalistic Neural Network is used to study different types of jet variables and their contribution to selecting the correct signal events. The results are analyzed and further recommendations are done.

page 1

Contents

INTRODUCTION 1

1 HIGH ENERGY PHYSICS

1.1 THE STANDARD MODEL.....................................................................ERROR! BOOKMARK NOT DEFINED.

2 THE TEVATRON AND THE DØ DETECTOR 6

2.1 THE TEVATRON........................................................................................................................................62.2 THE DØ DETECTOR..................................................................................................................................8

The Central Detector..................................................................................................................................9The Calorimeter System............................................................................................................................10The Muon System.....................................................................................................................................10

3 PHYSICS AT THE TEVATRON 11

3.1 ELECTROWEAK PHYSICS..........................................................................................................................113.2 QCD PHYSICS.........................................................................................................................................113.3 PHYSICS OF THE BOTTOM QUARK (B PHYSICS).........................................................................................123.4 TOP PHYSICS...........................................................................................................................................133.5 HIGGS PHYSICS.......................................................................................................................................153.6 SEARCHES FOR NEW PHENOMENA............................................................................................................18

4 B TAGGING 19

4.1 B TAGGING.............................................................................................................................................194.2 JET VARIABLES.......................................................................................................................................194.3 OVERVIEW STUDIES ON B-TAGGING AND RELATED TOPICS.......................................................................20

5 ARTIFICIAL NEURAL NETWORKS 23

5.1 THE HUMAN BRAIN................................................................................................................................235.2 ARTIFICIAL NEURAL NETWORKS..............................................................................................................25

Pattern Recognition..................................................................................................................................25Feed Forward Network............................................................................................................................28Probalistic Neural Network......................................................................................................................28

6 RESULTS 33

6.1 THE EVENT SAMPLE................................................................................................................................336.2 THE USE OF THE PNN.............................................................................................................................346.3 EFFICIENCY AND PURITY.........................................................................................................................356.4 THE PNN OUTPUT...................................................................................................................................356.5 RESULTS USING 2 INPUT VARIABLES........................................................................................................366.6 RESULTS USING 3 INPUT VARIABLES........................................................................................................366.7 PHYSICAL MEANING OF SOME RESULTS....................................................................................................416.8 RESULTS USING 4 INPUT VARIABLES........................................................................................................42

CONCLUSION 44

page 2

Introduction

In July 1998, NIKHEF groups in Amsterdam and Nijmegen have formally joined the DØ experiment at the Fermi National Accelerator Laboratory (FNAL/Fermilab) in Batavia, Illinois, USA. The DØ detector is one of the two general-purpose detectors in the Tevatron accelerator and storage ring, in which protons and anti-protons are accelerated and collided.

The DØ experiment was proposed in 1983 and approved in 1984. After 8 years of design, testing and construction of its hardware and software components, the experiment recorded its first antiproton-proton interaction on May 12, 1992. The data-taking period referred to as Run I lasted through the beginning of 1996. Collisions were studied at an energy of 1.8 TeV in the center of mass. Among the highlights from Run I is the discovery of the top quark and measurements of its mass and production cross section.

The Main Injector Ring (front) and the 4-mile-long Tevatron dominate the aerial view of Fermilab.

page 3

The Tevatron and the DØ experiment have began a new physics run (Run II), which started in 2001. The Tevatron is upgraded to operate at an increased center-of-mass energy of 2 TeV and a much improved luminosity. The luminosity of an accelerator is a measure of the number of particle collisions that occur each second. Using existing technology, it appears possible to increase the luminosity of the collider to at least 1033cm-2sec-1, more than a 20-fold increase in the number of particle collisions observed and recorded at Run I. To match the more demanding environment and to improve the abilities of the DØ detector a substantial upgrade of the detector has been performed.

The study of B hadrons and processes involving production of b-quarks becomes the important direction of physics measurements of current and future experiments. In addition to the direct interest in the properties of b-quark, many new particles, like Higgs boson or top quark, have at least one b-quark among their decay products. These events give rise to specific signatures in the detector (like jets) and are therefore easier to recognize than other production channels due to background processes. An efficient technique of selection of events with B-hadrons is required to detect the new particles and to study B-hadrons themselves.

In this thesis the use of a neural network is studied in the selection of variables used for b tagging. First, the Standard Model and the Tevatron and DØ detector are described. Furthermore, the highlights of the physics from the DØ experiment are described as well as the main targets for Run II. Artificial Neural Networks (ANN’s) are introduced as well as a literature study of past researches of b tagging using ANN’s. Using a Monte Carlo data sample, consisting B jets as well as background jets, the use of the ANN in recognizing the correct events (B jets) will be tested.

page 4

The Standard Model

High-Energy Physics is the branch of science concerned with the ultimate constituents of matter and the fundamental interactions that occur amongst them. For over two thousand years people have thought about the fundamental particles from which all matter is made, starting with the gradual development of atomic theory, followed by a deeper understanding of the quantized atom, leading to the recent theory of the Standard Model. Experiments over the last 50 years have revealed whole families of short-lived particles that can be created from the energy released in the high energy collisions of ordinary particles, such as electrons or protons. The classification of these particles and the detailed understanding of the manner in which their interactions lead to the observable world has been one of the major scientific achievements of the twentieth century.

Understanding the composition of the universe involves the identification of the basic particles, knowledge of the forces the particles feel and then ascertaining the behavior of the particles given the forces. The Standard Model (SM) has been successful in providing answers to all three facets of the problems. The Standard Model is a Langrangian Quantum Field Theory based on the idea of local gauge invariance. The gauge symmetry group of the Standard Model is SU(3)C SU(2)L U(1)Y , where SU(3)C is the symmetry group describing the strong (color) interaction and SU(2)L U(1)Y

describes the electroweak interactions.

Lepton Charge (e)

Mass (MeV/c2)

Electron (e)

Electron Neutrino (e)

Muon ()

Muon Neutrino ()

Tau ()

Tau Neutrino ()

Table 1.1: Lepton Properties

Quark Charge (e)

Mass (GeV/c2)

Up (u)

Down (d)

Charm

page 5

(c)Strange (s)

Top (t)

Bottom (b)

Table 1.2: Quark Properties

page 6

There are two general classes of particles in the theory, the fermions that have spin ½ and gauge vector bosons, which have spin 1. Fermions are further subdivided into particles called leptons and quarks. They are the basic constituents of matter and considered to be pointlike particles. These particles are grouped into three families (or generations), with each family consisting of two members. Quarks may exist in one of three color states (blue, green and red), an internal degree of freedom. Tables 1.1 and 1.2 summarize the basic properties of leptons and quarks. Local gauge invariance requires the introduction of massless vector gauge bosons, the particles that mediate the forces between the particles. For the strong force, also called the color force because it only interacts with particles carrying color (quarks), the quarks interact via the gauge bosons of SU(3)C. In total, there are eight in number and they are called gluons (g). Because the SU(3)C

symmetry of the strong force is exact, the gluons are massless. The observed strongly interacting particles in nature are called hadrons, which themselves are classified into mesons (quark-antiquark) and baryons (triplets of quarks). The pions are examples of the mesons, the least massive baryons are the proton and the neutron. Up to now no free quarks have been observed. Consequently, it is hypothesized that only color singlet states exist in nature.

Force Acts on MediatorGravity All particles Gravito

nElectromagnetism

Electrically charged particles Photon ()

Weak interaction

Fermions and gauge bosons W , Z0

Strong interaction

Quarks and gluons (particles carrying color)

Gluons (g)

Table 1.3: Forces and their mediators.

Gauge bosons

Charge

Mass (GeV/c2)

Gluons (g)

Photon ()

W

Z0

Table 1.4: Properties of interaction mediators.

The Electroweak sector of the SM, also called the Glashow-Weinberg-Salam Model, unites the weak and electromagnetic interactions. The gauge symmetry group SU(2)L U(1)Y has (22 - 1) + (12) = 4 massless gauge vector bosons. In order to describe the weak force phenomology, it is required that some of the vector bosons acquire a non-zero mass. The masses are generated by spontaneously breaking the symmetry of the SU(2)L U(1)Y

group. This is implemented by the Higgs Mechanism via the introduction of complex scalar fields. By allowing the scalar field to acquire a nonzero vacuum expectation value, three of the four vector gauge bosons gain mass.

These particles are identified as the W bosons (W+ en W-, mediates the current weak interactions) and the Z boson (Z0, mediates the neutral current weak interactions). The remaining massless gauge boson is identified as the photon (, mediates the electromagnetic interactions). All particles carrying electric charge feel electromagnetic interactions. One neutral scalar field remains from the broken symmetry called the Higgs boson (H). To date, the Higgs boson, which has spin 0, has not been observed.

It is an amazing feature of the Standard Model that, despite its extraordinary predictive power, it is almost surely incomplete. There are 26 parameters needed to specify the SM, and these can only be supplied by experiment (e.g. the neutron and proton mass). The strong and electroweak interactions that jointly make up the SM are seemingly unrelated entities; it is preferred to see a unification of these forces but the SM does not do this. The mechanism that breaks the underlying symmetry of the electroweak interaction, and thereby provides wrong masses to W/Z bosons and the photon, is not understood; in the SM the Higgs boson is inserted to provide the symmetry breaking. Beyond these defects, the SM offers no clue as to why there are three generations of quark and lepton families with nearly identical properties apart from their mass. It can accommodate, but not explain the existence of CP violation or how to get gravity into a unified framework with the other forces.

Figure 1.1: Schematic view of the three families of leptons and quarks. On the right side the force carriers.

page 8

The Tevatron and the DØ Detector

Fermilab is the largest U.S. laboratory for research in high-energy physics and is second only to CERN, the European Laboratory for Particle Physics, in the world. The primary instruments for high-energy physics are accelerators, especially colliders, in which counter-rotating beams of particles are brought into collision. In this chapter I will briefly describe the properties of the Tevatron accelerator and the DØ detector [1].

2.1 The Tevatron

Fermilab uses a series of accelerators in order to accelerate the protons and antiprotons to an energy of 1 TeV. The diagram below shows the paths taken by protons and antiprotons.

Figure 2.1: The diagram shows the paths taken by protons and antiprotons in the chain of accelerators at Fermilab.

The Cockcroft-Walton1 pre-accelerator provides the first stage of acceleration. Inside this device, hydrogen gas is ionized to create negative

1 Named after Cockcroft and Walton who in 1932 produced the first nuclear reaction using artificially accelerated particles, bombarding and disintegrating lithium nuclei with protons accelerated to several hundred keV. By opening and closing switches in proper sequence they could build up a potential of 800 kilovolts.

page 9

ions, consisting of two electrons and one proton. The ions are accelerated by a positive voltage and reach an energy of 750 keV.

Next, the negative hydrogen ions enter a linear accelerator (LINAC), approximately 500 feet long. Oscillating electric fields accelerate the negative hydrogen ions to 400 MeV. Before entering the third stage, the ions pass through a carbon foil, which removes the electrons, leaving only the positively charged protons.

The third stage, the Booster, is located about 6 meters below ground level. The Booster is a circular accelerator that uses magnets to bend the beam of protons in a circular path. The protons travel around the Booster about 20,000 times so that they repeatedly experience electric fields. With each revolution the protons pick up more energy, leaving the Booster with 8 GeV.

Protons at 8 GeV from the Booster are injected into Fermilab's Main Injector. It accelerates particles and transfers beams and has four functions:

(1) It accelerates protons from 8 GeV to 150 GeV. (2) It produces 120 GeV protons, which are used for antiproton

production (see figure 2.1). (3) It receives antiprotons from the Antiproton Source and increases their

energy to 150 GeV.(4) It injects protons and antiprotons into the Tevatron.

To produce antiprotons, the Main Injector sends 120 GeV protons to the Antiproton Source, where the protons collide with a nickel target. The collisions produce a wide range of secondary particles including many antiprotons. The antiprotons are collected, focused and then stored in the Accumulator ring. When a sufficient number of antiprotons has been produced, they are sent to the Main Injector for acceleration and injection into the Tevatron.

The Antiproton Recycler is a fixed-energy storage ring placed in the Main Injector tunnel directly above the Main Injector beamline. The purpose of the Recycler is to further increase the luminosity of the Tevatron Collider. The Recycler is a storage ring for antiprotons and functions as a post-Accumulator ring. As the stack size in the Accumulator ring increases, there comes a point when the stacking rate starts to decrease. By emptying the contents of the Accumulator into the Recycler periodically, the Accumulator is always operating in its optimum antiproton intensity regime. The third role of the Recycler, and by far the leading factor in luminosity increase, is to act as a receptacle for antiprotons left over at the end of Tevatron stores.

The Tevatron receives 150 GeV protons and antiprotons from the Main Injector and accelerates them to almost 1 TeV. There are about 1000 superconducting magnets in the Tevatron. The protons and antiprotons circle the Tevatron in opposite directions. The beams cross each other at the centers of the 5000-ton CDF and DØ detectors located inside the Tevatron tunnel, creating bursts of new particles.

page 10

Figure 2.2: Cross section of a Tevatron magnet.

2.2 The DØ Detector

The DØ detector, as it existed in Run I, is shown in Fig. 2.3. There are three major subsystems: a collection of tracking detectors extending from the beam axis to a radius of 75 centimeters; energy-measuring calorimeters surrounding the tracking region; and, on the outside, a muon detector that detects deflected muons using solid iron magnets. The entire detector is about 20 meters long, about 12 meters wide and high, and weights 5500 tons. It rests on a moveable platform that permits detector assembly and commissioning in accessible areas, prior to positioning in the collision hall for operation. The cables for carrying signals and services follows the detector, and allows the sensitive electronics for triggering and digitization to be housed in outer control rooms.

page 11

Figure 2.3: A schematic view of the DØ detector during Run I. The tracking chambers near the beam are shown in purple, gray and pink. The calorimeters are shown in yellow, blue, and green. The muon chambers are shown in orange, and surround the iron magnets (in red).

The DØ Run II detector builds on the detector’s previous strengths of good calorimetry and muon detection in an extended rapidity range. The major addition to the apparatus is a precision tracking system, consisting of a silicon vertex detector surrounded by an eight layer central fiber tracker. These detectors are located inside a new 2 T solenoid magnet.

page 12

The Central Detector

The DØ tracking and transition radiation detectors form the Central Detector (CD). A major element of the upgrade is the replacement of the inner tracking systems, required because of the new Tevatron time structure (more colllisions), the expected radiation damage to those detectors by Run II (higher energy) and to improve the physics capabilities of the DØ detector. The Central Detector is made out of:

Solenoid Magnet Silicon Vertex Detector (SMT) Scintillating Fiber Tracker (SFT) Preshower Detectors (PS)

The momenta of charged particles will be determined from their curvature in the 2T magnetic field provided by a 2.8 meters long solenoid magnet. The tracking system is designed to meet several goals: momentum measurement with the introduction of a solenoidal field; good electron identification; tracking over a large range in pseudo-rapidity; secondary vertex measurement for identification of b-jets from top quarks or b-physics and radiation hardness. The silicon vertex detector (SMT) is a hybrid barrel and disk design (see figure 2.4). The central detector consists of six barrels with disks interspersed between them. Each barrel module consists of four radial layers of detector ladders. The SMT is the high resolution part of the tracking system and is the first set of detectors encountered by particles emerging from the collision. The central fiber tracker consists of 74,000 scintillating fibers mounted on eight concentric carbon fiber cylinders and surrounds the SMT. It covers the central pseudorapidity region. With the silicon vertex detector the tracker enables track reconstruction for all charged particles within the range ||< 1.62. The fiber tracker provides fast “level 1” track triggering. Combining information from the tracker with the muon and preshower detectors, triggers for both single muons and electrons will be formed at Level 1. The central preshower detector is designed to aid electron identification and triggering and to correct electromagnetic energy for effects of the solenoid. The detector functions as a calorimeter by early energy sampling and as a tracker by providing precise position measurements. The entire Central Detector fits in the inner cylindrical space of the calorimeters.

2 The pseudorapidity () is a handy variable if the mass and momentum of a particle are not known. It is an angular variable defined by defined as , with the polar angle with the beam axis.

page 13

Figure 2.4: The DØ silicon vertex detector. In the center 6 barrels and 12 H-disks are placed. Further away from the collision point are four F-disks.

The Calorimeter System

The calorimeter design is crucial for the optimization of the DØ detector. It plays an important role in the identification of electrons, photons and jets. The upgrade is driven by the need to preserve the good performance of the calorimeter for Run II running conditions.

No modifications to the uranium liquid-argon sampling calorimeter itself have been made. Its fine longitudinal and transverse segmentation allows electromagnetic showers to be distinguished from hadronic jets.

The Muon System

The DØ muon detection system consists of five separate solid-iron toroidal magnets, together with sets of proportional drift tube chambers (PDT's) to measure track coordinates. The purpose of this system is the identification of muons produced in pp collisions and determination of their trajectories and momenta. Since muons are measured after most of the debris of particle showers is absorbed in the calorimeters, muons can be identified in the middle of hadron jets with much greater purity than electrons. The muon system offers excellent efficiency, purity and coverage.

The central part of the muon system (WAMUS), covering the ||< 1 region, consists of 94 PDT’s, barrel scintillator counters and cosmic ray veto scintillator counters. There are three PDT layers with a toriod magnet placed between the first and the second closest to the interaction region. The magnetic field produced by the magnet will bend the muons so their momentum can be calculated.

The Forward Muon System (FAMUS) covers the 1 <||< 2 region of the DØ detector. It includes plastic Mini-Drift Tubes (MDTs). The cells are square in cross section and 1 cm wide and are also arranged in three layers. Iron toroids are located between the A and B layers of the MDT systems.

page 14

Physics at the Tevatron

In this chapter an overview of the physics topics at Tevatron (mostly DØ) is briefly presented. This work concentrates on top en Higgs physics that is therefore described in more detail [2].

3.1 Electroweak physics

One consequence of the unification of the electromagnetic and weak forces was the prediction of the existence of two new particles: the W and Z gauge bosons. After several years of search by experiments around the world, two collaborations at CERN, using the world's most powerful accelerator at the time ( ), announced in 1983 the first direct observation of these elusive particles. The experiments measured the masses of the particles to be ~80 GeV and ~90 GeV respectively, with an uncertainty of 5–10%. Recent measurements are so precise that they become sensitive to the higher order quantum corrections that may be the gateway to new physics. Notably is the measurement of the W boson mass measured at the Tevatron. W bosons are produced at the Tevatron mainly when a quark from a proton and an antiquark from an antiproton collide head-on at the DØ detector. Almost immediately after being produced, the W decays into other particles within about 10-24 seconds. Roughly 10% of the time a W decays into an electron and a neutrino, and it is this decay mode that DØ uses to measure the W mass. The DØ value for the W mass is 80.482 0.091 GeV, the world's most accurate measurement of this important parameter published to date. Z bosons are produced in nearly the same way as W bosons, and their decay particles can be used to calibrate the detector. With 10,000 Z bosons available, DØ was able to understand the apparatus to the required level of accuracy. The precision determination of the W mass, together with the mass of the top quark discussed below, can be combined to estimate the mass of the Higgs particle (see chapter 3.5, figure 3.4). DØ expects to collect over 2.5 million W events. The uncertainty on the W mass will be reduced by at least a factor of two, and significant improvements will be made in the measurements of the gauge boson couplings.

3.2 QCD Physics

Quantum Chromodynamics (QCD) is the part of the Standard Model that describes the strong interaction responsible for the nuclear force. The quarks that make up the proton and all hadrons interact with gluon force carriers by virtue of their "color" quantum number. Though the proton can be viewed simplistically as a collection of three quarks, when examined closely, it reveals substantially more complex structure. The additional quarks and gluons appear with increasing magnification (larger momentum transfers) commensurate with smaller distances, and are described by phenomenological functions called parton distribution functions (PDFs).

page 15

These PDFs are derived from data, and therefore have uncertainties that have to be taken into account in any QCD-based prediction.

Because of the excellent coverage for jets3 provided by the calorimeter, DØ has made detailed and accurate measurements of strong interaction processes that test the predictions of QCD in many domains. The jet cross sections can be measured in Run II with higher precision. These cross sections can be expressed in physics quantities like momentum and rapidity4

and depend on the partonic densities in the proton and anti proton. These densities have been measured in fixed target experiments and have to be extrapolated to energy scales over many orders of magnitude. The extrapolation uses the rules of QCD and therefore these measurements provide a rigorous test of QCD.

In data that contain at least two high-ET jets5, a small fraction of events have the striking feature of sizeable gaps in energy deposition between the two jets, or between jets and the beam direction. The gaps are characterized by the absence of particles in extended regions of polar angle in the tracking detectors, calorimeters or forward trigger counters. Such events are termed "rapidity-gap" events (the rapidity variable is related to the polar angle). Explanations for the gap events are based on the supposition of the existence of a color-free object called the Pomeron. The jets produced in these events have ET distributions similar to those in standard QCD processes (quark and gluon exchange). This leads to the view that the Pomeron may have an internal structure, consisting at least partly of normal quarks and gluons arranged in such a way as to make the Pomeron colorless. This physics is linked with studies of the forward region related to diffractive physics.

3.3 Physics of the bottom quark (B physics)

Within the family of known quarks, the bottom (or b) quark is characterized by a set of rather peculiar and often intriguing properties, sufficiently so as to warrant dedicated facilities for its study. Discovered in an experiment at Fermilab in 1977, its unexpected appearance created an imbalance in the internal organization of the existing quarks. The absence of a "weak isospin" partner represented a theoretical discomfort that was only dispelled with the later discovery of its missing companion, the top quark.

When confronted with its earlier known siblings, the bottom quark is considered heavy, with a mass about four times that of its next heaviest colleague, the charm quark (see table 1.2). Such relatively high mass grants the b quark special status in the studies of QCD. Bottom quarks are produced in proton-antiproton collisions dominantly by the strong QCD interactions of gluons and light quarks that reside within the colliding beam particles. A key factor in the experimental interest in B physics is the potential insight it affords into physics at very short distances. In particular, it is hoped that the high precision study of phenomena such as CP violation, rare decays, and

3 In many collisions, observable secondary particles are produced in highly collimated form, called particle jets. This is a consequence of the hadronization of partons (quarks or gluons) produced in hard collisions (see chapter 4.1).4 The rapidity (y) is a variable frequently used to describe the behaviour of particles in inclusively measured reactions defined as tanh(y) = pL/E, where pL is the longitudinal momentum along the direction of the incident particle and E is the energy defined for a given particle.5 ET uitleggen

page 16

flavor changing processes will provide precious insights into new interactions associated with the flavor sector of whatever theory lies beyond the Standard Model. However, in order for this information to become available, it is necessary to confront the fact that the b quarks, which are the ultimate objects of study, are bound by strong dynamics into color neutral hadrons. While understood in principle, the nonperturbative nature of these bound states makes problematic the extraction of precision information about physics at higher energies from even the most clever and precise experiments on B mesons. To explore new physics effects one faces a daunting theoretical challenge to untangle them from the effects of nonperturbative QCD.

In Run I the B production cross section was measured as shown in Fig. 3.1. Obviously the measured cross-section is significantly higher than the predicted cross section (roughly a factor 3). Although there are uncertainties in theory and experiment, the present status represents an exciting challenge that is currently being addressed by theorists, and motivates the program of increasingly accurate measurements for Run II.

We noted that the bottom quark is a heavy object when compared with its earlier known siblings; in striking contrast, when confronted with its companion top it is in fact remarkably light. This delicate placement in the mass scale, together with the tendency of quarks to interact mainly with their weak isospin partners, conspire to give the bottom quark yet another set of very welcome properties. The b quark has an unusually long lifetime (hadrons containing b quark travel typically a few millimeters before decay), and clear signatures associated with its decay products. Once an experiment is equipped to observe and analyze specific bottom-quark decay modes, another entirely new and rich chapter of physics is opened, which includes such fundamental topics as CP violation, and windows of exploration into particle physics phenomena beyond the scope of the Standard Model.

page 17

Figure 3.1: The measured b production cross-section as function of PT for several b decay channels. The curves represent the theoretical predictions.

The installation of a superconducting solenoid and precision tracking sensors in its interior, are two important features of the upgraded DØ detector for Run II and therefore it will address a wide variety of B physics topics. Like LEP6 and BaBar7, B(s) mixing and CP violation will be studied. Unique to the Tevatron is the possibility to study the production dynamics through the mass of the B(c) spectrum.

3.4 Top physics

The top quark is a special particle because of its high mass. From the beginning of Run I, the search for the top quark was a very high priority at DØ. The top quark was discovered at the Tevatron in 1995. The final top mass from DØ analyses is 172.0 7.1 GeV (an uncertainty of about 4%), far exceeding the initial expectation for precision, and making the top mass the most precisely known of all quark masses. Combining all mass measurements

6 The LEP (Large Electron-Positron collider) accelerator has been closed down November 22nd 2000 after just over eleven years of forefront research. The CERN Laboratory in Geneva will proceed with the LHC project (Large Hadron Collider). LHC will collide protons into protons at a center-of-mass energy of about 14 TeV. When completed in the year 2005, it will be the most powerful particle accelerator in the world.7 The BaBar detector was built in 1999 at SLAC (Stanford Linear Accelerator Center) to study the millions of B mesons produced by the PEP-II storage ring.

page 18

from both CDF and DØ, yields a mass of 174.3 5.1 GeV (< 3% uncertainty) for the top quark.

Figure 3.2: The top quark pair production cross-section versus its mass measured by the DØ and CDF collaboration. Several theoretical predictions are also shown.

The top quark decays nearly all the time to a W boson and a b quark, giving rise to a final state with two W bosons and two b quark jets. The W bosons decay either into charged leptons and their neutrinos or into quark-antiquark pairs. Thus the basic classes of final states arising from top and antitop production are the following:

1) six quark jets (four from the W bosons and two from b quarks); 2) a lepton and neutrino, accompanied by four quark jets (two from one

W and two b jets);3) two leptons and neutrinos and two b quark jets (see the diagram in

Fig. 3.3).

The experimental challenges differ for the three classes of events: the six jet class, with no leptons, is the most likely, but suffers from huge backgrounds due to ordinary strong production of jets; the two-lepton class has relatively little background but a small rate. The single lepton class is intermediate in both rate and background. The Run I (no vertex detector) analyses were performed on event samples with at least one of the W bosons decaying leptonically. These events can be reconstructed with high efficiency. With a vertex detector it will be possible to tag the B jets in top-antitop events and thereby significantly increasing the top physics potential.

page 19

The decay dynamics of the top quarks are very interesting to study. As mentioned, the top quark decays before it hadronizes so that its helicity is passed to its decay products. The helicity state of the W boson is experimentally accessible by the measurements of the decay angles, allowing interesting tests of the Standard Model. Furthermore, its very large mass suggests that it may well play a special role in the breaking of the electroweak symmetry, and could be partially responsible for the mechanism by which all particles acquire mass. It provides a probe for seeking new forces in which top and antitop quarks combine (annihilate) to make new particles, and a vehicle for the search for new massive particles in its decays. These are the themes that will dominate top-quark studies in the forthcoming Run II, where at least forty times more top events are expected in a substantially improved detector with greater capability for deciphering these complex signals.

Figure 3.3: A schematic of top-quark pair production, where both W bosons decay leptonically.

3.5 Higgs Physics

The observation of the Higgs particle gives access to an as yet completely unexplored sector of the Standard Model. The Standard Model provides a quantitative description of all interactions (except gravity) between all fundamental particles. It has predictive power and it is in excellent agreement with the available experimental data. The Higgs mechanism is an essential ingredient of the theory and endows the particles with mass. This mechanism leads to the prediction that a corresponding elementary particle, the Higgs boson, has to exist. The model does not predict its mass, leaving it up to experiment to find this, that so far has been unusually elusive, although tantalizing hints have been recently observed at LEP.

page 20

The searches performed by the LEP experiments have excluded the existence of a Higgs particle with a mass below 113 GeV. However, these experiments also show evidence, albeit statistically not beyond doubt, for the production of a Higgs particle with a mass of 115 GeV. The upgrade of the Tevatron collider allows Higgs particle searches in the mass range up to 150 GeV. Precise measurement of the mass of the top quark at this facility puts a strong constraint on the Higgs mass and offers an indirect access to the Higgs sector (see figure 3.4).

Figure 3.4: The W mass versus the top mass. The Run I result of DØ is shown (black box) together with a hypothetical result for the mass measurement in run II (open circle). The bands present several Higgs masses.

Like any particle, the Higgs can be created in a collision of other particles. In case of proton-antiproton collisions, the dominant channel for the production of a Higgs boson is gluon-gluon fusion (see figure 3.5). The cross sections of the other channels are in general 1 to 2 orders of magnitude smaller.

Figure 3.5: The four main Higgs production channels: gluon-gluon fusion (a); WW and ZZ fusion (b); t-tbar fusion (c); and W and Z bremsstrahlung (d).

page 21

Figure 3.6 shows the Higgs production cross section as function of the Higgs mass. The direct Higgs production channel (gluon-gluon fusion) is experimentally not accessible due to the overwhelming QCD background. The advantage of the other three channels is that the Higgs is accompanied by two quarks or an intermediate vector boson. These give rise to specific signatures in the detector (two jets, a lepton pair or an isolated lepton) that can be used to suppress most of the background. Therefore, the production cross-section for a Higgs together with a Z or W (associated production), although it is a factor ten lower than the direct production cross section, has most potential.

Figure 3.6: Production cross sections for the main Higgs creation channels at the Tevatron.

Because of its nature, the coupling strength of the Higgs to other particles depends strongly on their mass. As a result, it decays preferentially into the heaviest particle available. But as its mass is not predicted by the Standard Model, it is not known which process that is. Some limits on the Higgs mass can however be set. Because it is not yet been observed by any existing experiment, a lower bound of around 109 GeV can be assumed. Indirect predictions from the precision fits using the full set of electroweak data (see figure 3.4) put the upper limit at a mass of 215 GeV (95% confidence level).

For a relatively low Higgs boson mass of just above 113 GeV, the search by the Tevatron experiments is of special importance. In this mass range, the decay mode to bottom quarks dominates (see figure 3.7), while the sensitivity of the LHC experiments is relatively low. In a worst case scenario the low mass Higgs may escape detection, unless the experiments are equipped for efficient B tagging. If the Higgs boson mass is higher (>130 GeV), other decay modes become gradually more significant.

page 22

Figure 3.7: Higgs branching ratios.

Figure 3.8 shows the reach of the Standard Model Higgs search at the upgraded Tevatron. Shown are the integrated luminosities delivered per experiment required to exclude the Standard Model Higgs boson at 95% CL, observe it at the 3 sigma level or discover it at the 5 sigma level, as a function of Higgs mass. In the low-mass Higgs region, below 140 GeV, the curves shown are the result of combining the W+Higgs and Z+Higgs channels (where the Higgs decays to and the W and Z decay leptonically). In the high-mass Higgs region, above 140 GeV, the curves shown are the result of combining various channels in which the Higgs boson decays to WW (where one W may be virtual). The lower edge of the bands is the calculated threshold; the bands extend upward from these nominal thresholds by 30% as an indication of the uncertainties in b-tagging efficiency, background rate, mass resolution, and other effects.

Figure 3.8: The reach of the Standard Model Higgs search at the upgraded Tevatron.

page 23

3.6 Searches for new phenomena

Although the Standard Model has been very successful, it would be naïve to believe that it describes all physics. For example, where do all the parameters of the model come from? An appealing theoretical framework that solves this problem is called supersymmetry. The Run II studies will cover much of the parameter space of the popular Supersymmetric Model. Supersymmetry (SUSY) has been suggested as a possible cure for many of the shortcomings of the Standard Model. When used as a phenomenological ingredient of physics at the scale of present-day experiments, it provides a natural solution to the shortcomings of the SM involving the instability of the mass of the Higgs boson, and permits the unification of the strong and electroweak forces. It predicts that every standard model particle has a supersymmetric partner with the exotic names like charginos, neutralinos, squarks, and gluinos, which can be discovered in the new Tevatron run. The mass reach will be about 100 GeV higher than present limits on superpartner masses, bringing DØ into a very interesting range of SUSY parameter space.

page 24

B Tagging

In order to study Higgs and top physics it is necessary to select the related events. An efficient technique of selection of events with B hadrons is required to detect the (new) particles. Usually the long lifetime of the B hadron is used for the tagging. In this chapter B tagging will be further explained and an overview will be given of some promising methods from earlier studies.

4.1 B Tagging

An advantageous characteristic of both the top and Higgs particle is that their decays can produce one or more bottom quarks:

For a Higgs mass range accessible to the Tevatron, the main decay mode is with a branching ratio of about 90%. It is therefore necessary to be

able to identify jets arising from b quark production with high efficiency. To identify a jet associated with a b quark, a technique, referred to as b-tagging, is performed. The bottom quarks will appear as bottom mesons (B), which are bound states of a b quark with a light quark. These B mesons have lifetimes of about 10-12 s, leading to trajectories of the order of millimeters before they decay. This provides a very useful signature for identifying B mesons. The trajectories of the decay products do not point to the ‘vertex' of the primary interaction (see figure 4.1). Hence, reconstruction or tagging of a secondary vertex away from the primary vertex allows separating interesting physics from background events.

Figure 4.1: A schematic view of the primary and secondary vertex.

page 25

do primary vertex

secondary vertex

4.2 Jet variables

Jets for a given initial parton can vary widely in shape, particle content, and energy spectrum; there is, of course, also substantial blurring due to instrumental effects: the finite resolution and granularity of detectors (calorimeter cells and muon measurements), and escaping neutrinos.

The earliest evidence for jets was in e+e- collisions (SLAC and DESY), producing secondary hadrons; subsequently, they were also observed in hadronic collisions (e.g. UA experiments and ISR at CERN). Frequently, two main jets are observed which dominate the energy balance of the collision; in hadronic collider events, the balance is observed only laterally, due to the difficulty of observing at large (absolute) rapidity, and due to the structure function, which leaves the hard quark encounter with a longitudinal boost. Often, the main jets are accompanied by one or more broader jet(s), interpreted as radiated gluons.

In the hadronization of a b quark, very little energy is radiated in the form of gluons. The resulting B hadron carries of a large fraction of the beam energy, typically 70%. It then decays into several particles in a weak decay process. In contrast, most of the energy from a light quark is transferred into a broad energy spectrum of particles, particularly caused by the leading light hadron. Therefore, by comparing the energies of the leading particles, as well as the distribution in energies of particles within a jet, one can distinguish between b quark and lighter quark jets. In addition, jets from b quarks are typically broader than light quark jets. None of the above properties are sufficient to select a sample of events with high purity and good efficiency. However, when used in a neural net analysis that naturally takes care of the correlations between them, high selection efficiency with good sample purity can be achieved. In the following paragraph some jet variables are discussed which were used in the early jet studies.

4.3 Overview studies on b-tagging and related topics

The variables used in an ANN analysis [3] where a study of production in the semileptonic channel is presented are:

Electron transverse energy ET and pseudorapidity (, the polar angle).

Missing transverse energy ET arising from undetected neutrinos. The azimuthal angle () between the electron and the ET vector. within || < 4.0, excluding ’s and ’s.

Planarity P, defined as . Here, pT1 and pT2 are defined to be axes of a cartesian coordinate system whose third axis is the longitudinal momentum pL. The variable indicates how well a reaction satisfies the assumption of being in a plane.

Sphericity S: The sphericity tensor is defined by:

where a, b = 1, 2, 3 correspond to the x, y, and z directions. Three eigenvectors for Sab correspond to 1 2 3. Sphericity is given by

page 26

. The value of S indicates the total p2 with respect to

the event axis. Collimated, pencil-like events have S 0 and isotropic events tend to have S 1.

Aplanarity A: Using the eigenvectors of the sphericity matrix, . A planar event has A 0 and an isotropic event has A

½. Transverse energies of the five leading jets, including the electron.

The variables used in an ANN analysis [4] studying production in the semileptonic channel are:

Sphericity, aplanarity (see above). The invariant mass of the hadronically decaying W (mW). The transverse momentum of the leptonically decaying W (pt). The total transverse energy (ET). The charged lepton transverse momentum (pl

t) and its pseudorapidity (l).

The transverse momenta of the jets (pit, i =1, 4) and their

pseudorapidities (i, i =1, 4) in decreasing order.

Figure 4.2: Values of the sphericity and aplanarity of certain particle trajectories in the center-of-mass system.

In a publication of CERN on tagging of B jets [5], the following variables are studied:

the vertex decay length L calculated as the length of the vector from the primary to the secondary vertex, the vector being constrained to be parallel to the jet axis. The decay length is given a positive sign if the secondary vertex was displaced from the primary in the direction of the jet momentum.

the decay length significance L/L (where L is the estimated error on L). Secondary vertices significantly separated from the primary vertex are selected with a cut |L/L| > 3.

page 27

the number of tracks in the secondary vertex Ns. the reduced decay length significance LR/LR, recalculated after

removal of the secondary vertex with the highest impact parameter significance with respect to the primary vertex (DØ/DØ).

a parameter XD, which is a complicated variable related to the mass of the B hadron.

which is explained in an other article [6]. X is the output of an artificial neural network (trained on Monte Carlo events) using six inputs:

the scaled track momentum xp = p/Ebeam

the sine of the angle of the track with respect to the jet axis sint = pt/p

the impact parameter significances of the track with respect to the reconstructed secondary vertex in the r- and r-z planes

the impact parameter significances of the track with respect to the hemisphere primary vertex in the r- and r-z planes

page 28

The output X of this network represents the probability that the track came from B hadron decay. Starting from the highest values of X, tracks within the jet are combined until the invariant mass of the tracks exceed the charm hadron mass, taken to be 1.9 GeV. The X of the last added track is defined as XD.

In a publication using ALEPH events [7] the following 25 variables were used as inputs to a feed-forward network.

the boosted sphericity Sb, is the sphericity of the final state charged and neutral particles in the rest frame of the jet. For calculating Sb, the event is separated into two hemispheres with respect to the plane perpendicular to the sphericity axis. All particles in each hemisphere are assumed to originate from the same hadron and are boosted into the rest frame of the hypothetical hadron. Decay products from b quark jets have a higher sphericity than light quark jets, which are more collimated.

the transverse momentum of a jet particle relative to the jet axis, defined using .

the charged and neutral particle multiplicity of the jet divided by the logarithm of the jet energy (Ntrack/ln(E)). The normalization by ln(E) reflects the energy dependence of the multiplicity.

the directed sphericity variable, which represents an attempt to pick out the decay of the B meson in b jets, which is more isotropic than the longitudinal fragmentation of hadrons in light quark jets. For a set Q of tracks in a jet, the directed sphericity is defined as:

where the P’s are the momenta in the rest frame of the set Q and the PT’s are their components perpendicular to the original jet direction in the laboratory frame. Several sets Q were used; the four leading particles of the jet were found and the directed sphericities for all eleven non-empty subsets (

) of these tracks were used.

the invariant mass for the same eleven particle combinations of the set Q.

Because of the large amount of jet variables discussed above, the easiest way to look at the significance of these variables is to present them to an Artificial Neural Network, as is done in the studies above. Therefore, an introduction to Artificial Neural Networks will follow in the next chapter.

page 29

Artificial Neural Networks

Artificial Neural Networks (ANNs) arose from a description of simple models of biological nervous systems and have been applied to pattern recognition problems in various fields. The interest in using ANNs for high energy physics increased during the last years. They have been applied for separation of quark jets from gluon jets, electron identification, triggering and so on. Before discussing the structure of the ANN, first a short introduction of the human brain [8].

5.1 The Human Brain

The brain is probably the most complex structure in the known universe. While it is the product of many millions of years of evolution, some of the structures unique to the human species have only appeared relatively recently. Only 100.000 years ago, the ancestors of the modern man had a brain weighing only a third of the current version. Most of this increased weight is associated with the most striking feature of the human brain, the cortex. Almost all the tasks that seem hard or difficult for human beings but that the present computers can easily perform are associated with processing parts of the relatively new cortex. Tasks that humans normally find easy but that are difficult for computers typically have a much longer evolutionary history. A simple operation like recognizing someone’s face, which we find rather straightforward, is a huge problem for a computer. This is not so surprising though, when one considers that a human is using multiple levels of processing that have evolved over many hundreds of thousands of years.

Figure 5.1: Schematic view of the human brain.

page 30

Nerve cells, called neurons, are the fundamental elements of the central nervous system. The central nervous system is made up of about 100 milliard (1011) neurons. Neurons are much like other cells of the body in their general organization and their biochemical systems. However they also possess unique features, which are crucial to the functioning of the central nervous system. In essence, a given neuron is built op of three parts: the cell body, the dendrites and the axon. The body of the cell contains the nucleus of the cell and carries the biochemical transformations necessary to synthesize enzymes and other molecules necessary to the life of the neuron. It is typically several microns in diameter. Each neuron has a hair-like structure surrounding it – these are the dendrites. They are some tens of microns in length and branch out into a tree-like form around the cell body. The dendrites are like electrical cables that serve to conduct incoming signals to the cell. The axon or nerve fiber is the outgoing connection for signals emitted by the neuron. It is usually much longer than the dendrites, varying from a millimeter to one meter. At its end it branches into smaller structures that communicate with other neurons. Typically a given neuron is connected to about ten thousand other neurons. The specific point of contact between the axon of one cell and a dendrite of another is called a synapse.

Figure 5.2: A few neurons and their dendrites.

A simple description of the operation of a neuron is that it processes the electric currents that arrive on its dendrites and transmits the resulting electrical currents to other connected neurons using its axon. A simple explanation of the processing step is that the cell sums up the incoming signals and produces an output signal only if this sum exceeds some threshold; i.e. only if the total input signal is big enough will the cell ‘fire’ an output signal to its neighbors. Once the cell fires, an electrical signal travels down the axon at a speed of around 100 m/s, and will be received by many other neurons, changing their electrical state. If there are sufficient incoming signals to this neighbor neuron, the change in the electrical state can be big enough to generate a new pulse in the neuron. This cell is itself connected to

page 31

many others and in this way a wave of electrical activity can be set up. Different types of brain activity correspond to different patterns of firings.

While we are born with a complete set of neurons, the connections between them are determined in major part by a learning process; external stimuli coming in the form of electrical currents from the sensory cells cause patterns of nerve impulses to be set up. These impulses can alter the strength of the coupling between different neurons. While the overall program for determining which neurons should be connected together is under genetic control, it is external stimuli that are crucially important in determining what network connections are made. Intelligence is determined partly by genetics (the program that governs the overall structure of what connections should be set up) and partly by our experience which can influence very strongly the nature and quality of our neural networks.

Two neurons are not merely joined or not - the nature of the synaptic connection between them determines whether one neuron firing has a strong or weak effect on the other - we talk of the strength of a connection between them. A strong connection between two neurons means that it is more likely that one of the neurons firing will stimulate the other to fire - with a weak connection it may only happen occasionally depending on the state of very many more neurons for example.

We may ask the question: what is about the structure of neurons and their organization that determines the amazing computational power of the brain? Certainly it is not the raw processing power of a single neuron - it takes about one-thousandth of a second for a cell to return to a normal state after firing. While this seems quite a short time it is ridiculously slow compared to even a modest home computer whose silicon chip can perform operations in the incredibly short time of one-hundred-millionth of a second. The secret lies in the very number of neurons. If these neurons can be made to work efficiently and simultaneously on a given task it is clear that the effective power of the brain is very much larger than current computers. The hint that such a scenario may indeed be realized lies in the detailed structure of the brain in terms of the connections between neurons. It is clear that in order for neurons to cooperate in performing some function, they must be able to talk to each other. We know that each neuron has many tens of thousands of connections to other neurons that function as communication channels. These connections have an incredibly complicated structure - different portions of the brain have different types of connection pattern, while these different sectors of the brain are themselves linked together by further specialized networking. We still have only the crudest understanding of why these neural pathways are connected up the way they are. But it is surely the very detailed way in which these connections are made that is at the heart of the power of the brain as a thinking machine.

The very complexity of these neural networks poses a formidable barrier to understanding. Nobody knows in detail how the individual firings of neurons coupled to their interconnections can lead to all of the features observed - short and long term memory, complex pattern recognition, logical reasoning, emotion and consciousness. Indeed, it is not known how even the lower level unconscious functions such as those which regulate breathing and heart rate emerge out of the complicated mutual interaction of millions of neurons. Furthermore, it is generally believed that at least a partial understanding will

page 32

be necessary in order to build truly intelligent machines. Nevertheless, we are making rapid progress in understanding some of the simpler aspects of these systems in part through the study of computers whose architecture resembles that of the brain. These computers go under the name of artificial neural networks.

5.2 Artificial neural networks

In recent years neural computing has emerged as a practical technology, with successful applications in many fields. The majority of these applications are concerned with problems in pattern recognition. From the perspective of pattern recognition, neural networks can be regarded as an extension of the many conventional techniques that have been developed over several decades. Historically, many concepts in neural computing have been inspired by studies of biological networks. The perspective of statistical pattern recognition, however, offers a much more direct and principled route to many of the same concepts. So first an introduction to pattern recognition.

Pattern Recognition

The term pattern recognition encompasses a wide range of information processing problems of great practical significance, from speech recognition and the classification of handwritten characters. Often these are problems that humans solve easily. However, their solution using computers has, in many cases, proved to be difficult. The most general, and most natural, framework in which to formulate solutions to pattern recognition problems is a statistical one, which recognizes the probalistic nature both of the information we seek to process, and of the form in which we should express the results. Statistical pattern recognition is a well established field with a long history and we shall build on the results that this field offers.

Classification

The goal in a classification problem is to develop an algorithm which will assign any sample, represented by a vector x, to a certain class, which we shall denote by C-

k, where k is the number of classes. We shall suppose that we are provided with a large number of examples of the samples corresponding to the possible classes, which already have been classified by a human. Such a collection will be referred to as a data set. The outcome of a simple system for classifying a pattern and assign it to two classes (C1 or C2) can be represented in terms of a variable y which takes the value 1 if the pattern is classified as C1, and the value of 0 if it is classified as C2. Thus, the overall system can be viewed as a mapping from a set of input variables x1, …, xd, representing some kind of a pattern, to an output variable y representing the class label. In more complex problems there may be several output variables, which we shall denote by yk where k = 1, …, c. In general, it will not be possible to determine a suitable form for the required mapping, except with the help of a data set of examples. The mapping is therefore modeled in terms of some mathematical function that contains a number of adjustable parameters, whose values are determined with the help of the data. We can write such functions in the form

(5.1)

page 33

where w denotes the vector of parameters. A neural network can be regarded as a particular choice for the set of functions yk(x; w). In this case, the parameters comprising w are often called weights. The importance of neural networks in this context is that they offer a very powerful and very general framework for representing non-linear mappings from several input variables to several output variables, where the form of the mapping is governed by a number of adjustable parameters. The process of determining the values for these parameters on the basis of the data set is called by learning or training. For this reason the data set of examples is generally referred to as a training set. Neural networks models can be viewed as specific choices for the functional forms used to represent the mapping (5.1), together with particular procedures for optimizing the parameters in the mapping.

Bayes’ Theorem

In this paragraph we introduce some of the basic concepts of the statistical approach to pattern recognition. We begin by supposing that we wish to classify a new sample but as yet we have not made measurements on the type of that sample. The goal is to classify the sample in such a way as to minimize the probability of misclassification. If we had collected a large number of examples of the samples, we could find the fractions that belong in each of the two classes. We formalize this by introducing the prior probabilities P(Ck) of a sample belonging to each of the classes Ck. These correspond to the fractions of samples in each class. If we were forced to classify a new sample without being allowed to see the corresponding properties, the best we can do is to assign it to the class having the higher prior probability. That is, we assign it to the class C1 if P(C1) > P(C2), and to class C2 otherwise. This procedure minimizes the probability of misclassification, even though we know that some of the samples will belong to the class having the lower prior probability.

Now suppose that we have measured the value of the feature variable x1 for the sample. This will give us more information on which to base our classification decision, and we seek a formalism that allows this information to be combined with the prior probabilities, which we already possess. When we suppose that x1 is assigned to one of a discrete set of values {Xl}, the joint probability P(Ck, Xl) is defined to be the probability that the sample has the feature value Xl and belongs to class Ck. Next we introduce the conditional probability P(Xl|Ck,) which specifies the probability that the observation falls in column Xl of the array given that it belongs to class Ck. We now note that the fraction of the total number of samples which fall into cell (Ck, Xl) is given by the fraction of the total number of samples that fall in row Ck. This is equivalent to writing the joint probability in the form

(5.2)

By a similar argument, the joint probability can also be written in the form

(5.3)

where P(Ck |Xl) is the probability that the class is Ck given that the measured value of x1 falls in the cell Xl. The quantity P(Xl) is the probability of observing

page 34

a value Xl with respect to the whole data set, irrespective of the class membership, and is therefore given by the fraction of the total number of samples that fall into column Xl. The two expressions for the joint probabilities in (5.2) and (5.3) must be equal, therefore we can write

(5.4)

This expression is referred to as Bayes’ theorem (after the Revd. Thomas Bayes, 1702 – 1761). The quantity on the left hand side is called the posterior probability, since it gives the probability that the class is Ck after we have made a measurement of x1. Bayes’ theorem allows the posterior probability to be expressed in terms of the prior probability P(Ck), together with the quantity P(Xl|Ck) which is called the class-conditional probability of Xl for class Ck. The denominator in Bayes’ theorem, P(Xl), plays the role of a normalization factor, and ensures that the posterior probabilities sum to unity. The posterior probability is a quantity of central interest since it allows us to make optimal decisions regarding the class membership of new data. In particular, assigning a new sample to the class having the largest posterior probability minimizes the probability of misclassification of that sample.

The importance of Bayes’ theorem lies in the fact that it re-expresses the posterior probabilities in terms of quantities which are often much easier to calculate. The prior and class-conditional probabilities can be estimated from the proportions of the training data. From these quantities we can also find the normalization factor in Bayes’ theorem. For a new sample, having feature value Xl, the probability of misclassification is minimized if we assign the sample to the class Ck for which the posterior probability P(Ck |Xl) is the largest.

Although we have focused on probability (distribution) functions, the decision on class membership in our classifiers has been based solely on the relative sizes of the probabilities. This observation allows us to reformulate the classification process in terms of a set of discriminant functions y1(x), …, yc(x) such that an input vector x is assigned to class Ck if

yk(x) > yc(x) for all j k (5.5)

The decision rule for minimizing the probability of misclassification may easily be cast in terms of discriminant functions, simply by choosing

yk(x) = P(Ck |x) (5.6)

Discriminant functions

We saw that optimal dicriminant functions can be determined form class-conditional densities via Bayes’ theorem. Instead of performing density estimation, however, we can postulate specific parametrized functional forms for the discriminant functions and use the training data set to determine suitable values for the parameters. The simplest choice of a discriminant function consists of a linear combination of the input variables, in which the coefficients in the linear combination are the parameters of the model. This simple discriminant function can be generalized by transforming the linear

page 35

combination with a non-linear function, called the activation function. Another extension involves transforming the input variables with fixed non-linear functions before forming the linear combination, to give generalized linear discriminants. These various forms of linear discriminant can be regarded as forms of neural network in which there is a single layer of adaptive weights between the inputs and the outputs. Single-layer networks, with threshold activation functions, were studied by Rosenblatt (1962) who called them perceptrons.

Feed Forward Network

Single layer networks have a number of important limitations in terms of the range of functions that they can represent. We might therefore consider networks having several layers of adaptive weights. Networks with just two layers of weights are capable of approximating any continuous functional mapping. The only restriction is that the networks must be feed-forward, so that it contains no feedback loops. This ensures that the network outputs can be calculated as explicit functions of the inputs and the weights.

Figure 5.3: An example of a feed-forward network having two layers of adaptive weights.

Probalistic Neural Network

Sometimes the line between neural networks and traditional statistical techniques becomes a little thin. That is the case for members of the probalistic neural network (PNN) family. The mother of all PNNs was described by Meisner in 1972, although it existed in other related forms even earlier. Despite its known theoretical powers it remained practically unused, because of the unusually large computational requirements of the algorithm, until Donald Specht (1990) cast it in the form of a neural network by showing how this algorithm could be broken down into a great number of simple components that could largely operate in parallel. Because this algorithm had its roots in probability theory, he called it a probalistic neural network. The PNN is strongly based on Bayes’ method of classification.

The PNN is fundamentally a classifier designed and formulated as an algorithm that is trained on members of two or more classes. Its ultimate use is to examine unknowns and to decide to which class they belong. Nearly all

page 36

standard statistical classification algorithms assume some knowledge of the distribution of the random variables used to classify. While some deviation from normality is tolerated, large deviations usually cause problems. One of the beauties of neural networks is that they can typically handle even the most complex distributions. Multiple-layer feed forward networks (MLFNs) have been shown to be excellent classifiers. But they do have two problems. One is that little is known about how they operate and what behavior is theoretically expected of them. The second one is that their training speed can be seriously slow. The PNN has superb mathematical credentials, usually trains orders of magnitudes faster than MLFNs, and classifies as well as or better than they do. Its principal disadvantages are that it is relatively slow to classify, and it requires large amounts of memory. Most important of all for many applications is that it often provide mathematically sound confidence levels for its decisions. Another situation in which the PNN is favored is when the data is likely to contain outliers, points that are very different from the majority. Although outliers will have no real effect on decisions regarding the more frequent cases, yet they will be properly handled if they are valid data. Outliers are generally more of a threat to other neural network models, and they can totally devastate many traditional statistical techniques.

Parzen’s method of density estimation

Parzen presented an excellent method for estimating a univariate probability density function from a random sample. His estimator converges asymptotically to the true density as the sample size increases. Parzen’s PDF estimator uses a weight function, W(d) (called a potential function or kernel), which has its largest value at d = 0 and which decreases rapidly as the absolute value of d decreases. One of these weight functions is centered at each training sample point, with the value of each sample’s function at a given abscissa x being determined by the distance d between x and that sample point.

Let us state Parzen’s method mathematically. We collect a sample of size n from a single population. The estimated density function for that population is shown in the equation below:

(5.5)

The scaling parameter sigma () defines the width of the bell curve that surrounds each sample point. The choice of can have a profound influence on the performance of the PNN. Values that are too small cause individual training cases to exert too much influence, losing the benefit of aggregate information. When the value of is too large, it causes so much blurring that the details of the density are lost, often distorting the density estimate badly.

We have considerable freedom in choosing the weight function. There are just a few restrictions on its properties. Parzen and Specht have stated them explicitly as following:

The weight function must be bounded.

(5.6)

page 37

The weight function must rapidly go to zero as its argument increases in absolute value.

(5.7

The weight function must be properly normalized if the estimate is going to be a density function.

(5.8)

In order to achieve correct asymptotic behavior, the window must become narrower as the sample size increases. If we express as a function of n, the sample size, two conditions must be true.

(5.9)

The weighting function W most often employed is the Gaussian function, because it is well behaved and easily computed. It satisfies the conditions mentioned above and it is known as a reliable performer. The Gaussian function is shown in the equation below:

(5.10)

Despite the strong tradition inherent in the Gaussian function, it must be remembered that an infinite number of other choices is possible. The Gaussian kernel however, is so widely accepted that there is little reason to look for alternatives. However, a few alternatives will be mentioned. The most primitive fast alternative to the Gaussian kernel is a histogram bin, given by:

(5.11)

It is easy to chop off the sharp corners of the bin approach. It is probably a better kernel:

(5.12)

Both of those alternative kernels drop to zero at x = . That means that the estimated density can be zero in areas where the sample is sparse. It would be more realistic if the estimated density were some very small positive number. This can be attained with equation:

page 38

(5.13)

Below these four functions are plotted.

Figure 5.4: Four common Parzen kernels.

Multivariate Extension of Parzen’s Method

Cacoullos extended Parzen’s method to the multivariate case. Things become much more complicated because the most general formulation allows each variable Xi to have its own scale factor I, and the kernel function is multivariate. The fully general density estimator is shown in the equation below:

(5.14)

To reduce the complexity of the function, two simplifications can be employed. First, we may assume that all sigmas are equal: 0 = 1 = … = .

We cannot always afford this equality assumption, because the value of can be optimized in such a way that each becomes almost a measure of that variable’s importance. However, for the remainder of this paragraph, we will assume it. The other simplification involves the nature of the multivariate kernel. By letting it be the product of univariate kernels as shown below we can achieve tremendous economy:

(5.15)

page 39

That equation indexes the univariate kernels according to the variable on which they are used. In practice, we would nearly always use the same univariate kernel for every variable, with any differences being limited at most different scalings. By far the most common multivariate density estimator uses both of those simplifications along with the univariate Gaussian kernel of equation 5.7. We can write that common density estimator as equation 5.13, using ea eb = ea+b and that the squared length of a vector is the sum of the squares of its components:

(5.16)

This density estimator is the foundation of the original PNN as proposed by Donald Specht. It has a long history of good-to-excellent performance, and its training speed is fast.

The Original PNN

The PNN architecture is elegantly simple. Specht’s PNN architecture for a small network is shown below:

Figure 5.5: Specht’s PNN architecture.

1 S. Abachi et al.: “The DØ Upgrade”, DØNote 2542, 1995.2 The DØ Collaboration: “Physics Highlights from the DØ Experiment 1992 – 1999”, January 2000.3 “New Computing Techniques in Physics Research IV”, page 473, 1995.4 “New Computing Techniques in Physics Research IV”, page 670, 1995.5 The OPAL Collaboration, G. Abbiendi et al.: “Production rates of quark pairs from gluons and events in hadronic Z decays”, CERN-EP-2000-123, June 2000.6 The OPAL Collaboration, G. Abbiendi et al.: “Search for Higgs Bosons in e+e- Collisions at 183 GeV”, CERN-EP/98-173, October 1998.7 L. Bellantoni, J. S. Conway, J. E. Jacobsen, Y. B. Pan and Sau Lan Wu: “Using neural networks with jet shapes to identify b jets in e+ e- interactions”, CERN-PPE-91-80, July 1991.8 The Human Brain (http://brain.web-us.com).

page 40

http://brain.web-us.com/

The network in figure 5.4 is very small. It has two inputs, two classes, and two training cases in each class. The pattern layer contains one neuron for each training case. The summation layer has one neuron for each class. Execution starts by simultaneously presenting the input vector to al pattern layer neurons. Each pattern neuron computes a distance measure between the input and the training case represented by that neuron. It then subjects that distance measure to the neuron’s activation function, which is essentially the Parzen window. The following layer contains summation units, where each summation neuron is dedicated to a single class. It simply sums the pattern layer neurons corresponding to members of that summation neuron’s class. The attained activation of summation neuron i is the estimated density function value of population i. The output neuron is a trivial threshold discriminator, it decides which of its inputs from the summation units is the maximum.

The use of the PNN

As said in the text above there are different settings for the PNN to run. In this thesis I have used a separate sigma for each variable, but not for every class. The used kernel is the Gaussian function, because the quantities that originate from energy and position measurement, which are expected to have Gaussian errors.

page 41

Results

The main challenge in the and/or Higgs analysis is the reduction of the overwhelming QCD background. We have discussed the importance of the ability to tag secondary vertices to identify events with a B meson. As discussed in earlier chapters, the tagging of B-jets is a promising tool to increase the efficiency of selecting genuine and/or Higgs events. The goal of this work is to find an algorithm for B tagging in DØ. To obtain the maximum efficiency and purity the algorithm can be transformed in an Artificial Neural Network, in this case the Probalistic Neural Network (PNN). The use of an PNN has proven very successful in various high energy physics analyses. In this environment, in general, the precise architecture of the PNN is not of crucial importance, but the key to success is provided by a good choice of the physical quantities fed into the network.

In [titel][9] the prospects for B tagging in DØ are discussed, based on a Monte Carlo simulation with a simple model for the detector smearing, ignoring inefficiencies of the pattern reconstruction and the effects of the tracking reconstruction. A rough estimate is obtained by requiring two displaced tracks, which results in a B-jet tagging efficiency of 60% with an efficiency for background jet of less than 5%. These numbers imply that the B tagging alone is not sufficient to deal with the QCD background, but provides additional information besides the event topology and cuts on the transverse energy.

In this work we study the B tagging in events that are fully simulated by the DØ detector simulation (version P8).

6.1 The event sample

We use the Monte Carlo simulation that included the detector smearing, ignoring inefficiencies of the pattern reconstruction and the effects of the tracking reconstruction. The event sample obtained from this MC simulation consists of 2857 B jets in events and 6850 background jets in QCD (PT > 20 GeV) events. The typical cross sections at Tevatron energies are 6 picobarn (10-12) for -events and for the QCD sample 50 microbarn (10-6) (PT > 20 GeV in the hard subprocess). There were 2000 events generated of both and QCD events. This file has been split up so that different parts of the data could be used to train and to test the PNN. The train set consisted of 1000 B and 2500 background events. The test set contained the remaining events. From the events 24 variables were calculated, inspired on the studies described in chapter 4, which can be used as an input to the PNN. These

9 M. Carena, J. S. Conway, H. E. Haber, J. D. Hobbs, et al.: “Report of the Higgs Working Group of the Tevatron Run 2 SUSY/Higgs Workshop”, Fermilab-Conf-00/279-T, SCIPP-00/37, hep-ph/0010338, October 2000.

page 42

(initial) variables are shown in table 6.1. Because of the small amount of events in the input file (compared to the studies described in chapter 4), it is impossible to use a large amount of input variables in order to find some significant results because of statistical fluctuations. Therefore only a few input variables (3) will be used as input for the PNN, so that the fluctuation in the statistics will not be too large.

page 43

Number Variable Description Sort iclass [ iclass= 0; no B, no C ][ iclass=1, no B, but C ] Classificatio

n[ iclass=2, jet with some B tracks ][ iclass=3; B jet ] eneglo The energy of the full event

Event ptglo The full event PT

etglo The full event ET

njet The number of jets of the full event enejetx The jet energy measured from calorimeter

Jet in calorimeter etjetx The ET measured from calorimeter

massjx The mass measured from calorimeter sphebjet The jet boosted sphericity

Jet-tracks

apbjet The jet boosted aplanarity enejet The energy of the jet etjet The ET of the jet massj The mass of the jet

sprijetThe 2 of the jet tracks with respect to the primary vertex

ntrajet The number of tracks of the jet ddlife The 3 dimensional decay length (x,y,z)

Vertex in jet

dlife The 2 dimensional decay length (x,y) sdlife The 2 of the decay length (x,y) ctptvjet The opening angle weigthed with PT

evtxjet The energy etvtxjet The ET

massjv The mass sprivjet The 2 with respect to the primary vertex sassvjet The 2 with respect to the secondary vertex ntvtxjet The number of tracks in the vertex

Table 6.1: The initial event variables.

6.2 The use of the PNN

Because the MC files that were used contain all the variables, it is not practical to use the entire dataset as input file for the PNN. Therefore the data is first read by a few simple programs (The used C++ programs can be found at http://www.nikhef.nl/~mvvang/neuralnets/cprograms). The program CreateTrainTestFile creates a train file and a test file out of the initial files with the data of the ttbar events and the background events. It writes the events with iclass = 0 into the qcd file, and events with an iclass of 3 into the

events file. The other data will not be used. The programs FilemakerXvars, with X the number of variables to be selected, reads the data and selects only the variables that are selected by the user. These variables are written in a file together with the class it belongs to (e.g. class = 0 means it is data from a fake track and class = 1 means it is data from a real b-jet). A number of events do not have values for all variables, especially the jet variables. Since there is not always a jet, the value of the jet variables is in that case incorrect. Therefore these variables are all given a value that indicates that there is no jet-variable measured. When this is done, the output file is ready to be used as input for the PNN. Because the PNN requires a training set first, the data is split into three files, two training sets (for each class) and one test set.

page 44

As can be seen in the appendix some variables shoot out to very large numbers. There is a possibility that the PNN does not work very well when it is confronted with these high values. Therefore, I have chosen to use the log function for each variable that has a distribution with its tails exceeding the value of 1000. In this case these are variables 1, 10, 11, 13, 17, 22 and 23.

6.3 Efficiency and purity

A way to measure the performance of the PNN is to define two quantities: efficiency and purity. An efficiency of 1 means that all the b-jets has been identified, if the efficiency equals zero, none of the b-jets has been found. The efficiency is defined as:

Although an efficiency of 1 seems very well, it doesn’t say anything about the way the PNN has handled the background events. If all background events are identified as b-jets, the efficiency still can be 1 but the PNN would be useless. The purity indicates how many background events the PNN has wrongly identified as b-jets. A purity of 1 means that the output of the PNN is 100% pure; the jets that are given the status as b-jets by the PNN were actual b-jets. A purity of 0 means that all tracks that are supposed to be the b-jets were actually background events. The purity is defined as:

6.4 The PNN output

The PNN returns an output between 0 and 1. A value of 0 means that the PNN classifies the jets as background events, a value of 1 means that the PNN classifies them to be a real b-jet. By defining a decision border in PAW8, which goes from 0 to 1 in a certain number of steps (i.e. 100), it is possible to calculate the efficiency and purity as defined in chapter 6.3 for all of the values of the decision border. In this way we can study the performances of the PNN using different sets of input variables.

An other way to study the performance is to make a table of the values of the efficiency and purity. In this table the value of the purity is given for an efficiency of at least 80% and the efficiency is given for a purity of at least 95%. Both procedures will be used to study the output of the PNN.

As explained in chapter 5, by training the PNN the network gives every input variable a certain weight. The higher the weight, the higher the separating power that the variable offers. Therefore, by running the network with all inputs, this could provide an easy and good way to find the input variables, which play an important role in calculating the network output. In this thesis, this possibility has not been tested. The main reason to omit this method to

8 PAW is conceived as an instrument to assist physicists in the analysis and presentation of their data. It provides interactive graphical presentation and statistical or mathematical analysis, working on objects familiar to physicists like histograms, event files (Ntuples), vectors, etc. PAW is based on several components of the CERN Program Library.

page 45

find the best input variables is the fact that the network can only be used with a limited number of variables, because of the lack of statistics. Therefore a test run with all input variables would not give a very well trained PNN and the weight factors would not be very accurate. Secondly, all input variables should be normalized, so that the values lie between 0 and 1. Only in that case the weight factors can be compared with each other.

page 46

6.5 Results using two input variables

We start to use two variables as input for the PNN, and only the jet variables. When looking at the results using the vertex in jet variables (see table 6.1) the plots all look like the plot below. This step is only caused by the absence or presence of a secondary vertex, leading to a binary effect. Although an efficiency of 60% is visible at a purity of almost 100%, it is obvious that the PNN can use more information in order to achieve better results. Therefore we continue using three variables.

Figure 6.1: Plot showing the efficiency (open dots) and purity (black dots) for the opening angle weighted with PT, and the transverse energy (20) for different values of the decision border.

6.6 Results using three input variables

We will now use three variables as input for the PNN. Two variables are kept constant and we loop over the other remaining variables. At the first run I chose to set the opening angle weighted with PT (18) and the transverse energy (20) constant. The numbers between the brackets indicate the number shown in table 6.1 and used in the software. Some of the first results are shown below.

Plot 6.1 shows the behavior of the efficiency and purity for the combination of the two dimensional decay length (16), the opening angle weighted with PT

(18) and the transverse energy (20). The plot looks similar to figure 6.1. It is easy to see that the efficiency drops very fast at one point of the cut value and that the purity rises very fast at the same point. This is caused by the fact that all the variables represent values measured from the jet vertex. Because most of the background events do not have values for these variables, because there is no b-jet in these events, the decision is only made by the test if there was a vertex found in this jet. Although the efficiency is very high after the drop of the purity, better results should be achieved by choosing variables carrying more information, e.g. variables measured by the

page 47

calorimeter. In this case, different stages of the tracks and the whole event are given as input.

Figure 6.2: Plot showing the efficiency (open dots) and purity (black dots) for the 2 dimensional decay length (16), The opening angle weighted with PT, and the transverse energy (20) for different values of the decision border.

In figure 6.2 the results are shown for the number of tracks of the jet (14), the opening angle of the vertex weighted with PT (18) and the transverse energy of the vertex (20). Because the number of tracks of the jet is measured in the calorimeter you can see that the sudden drop of the efficiency (and lift of the purity) has been replaced by smaller drops and steps. Because the number of tracks is an integer, it may be assumed that the small steps represent a change in the number of tracks. It can be concluded after comparing figure 6.2 with figure 6.1 that replacing the two dimensional decay length (16) by the jet boosted number of tracks (14) has given the PNN more information to make a decision. Since the steps are (probably) caused by the number of tracks of the jet (14), the influence of the third variable is visible.

page 48

Figure 6.3: Plot showing the efficiency (open dots) and purity (black dots) for the number of tracks of the jet (14), the opening angle of the vertex weighted with PT, and the transverse energy of the vertex(20) for different values of the decision border.

The smooth curves shown in figure 6.3 suggest that indeed addition of an other variable can give promising results. Here the transverse energy of the jet (11), the opening angle weighted of the vertex in the jet with PT (18) and the transverse energy of the vertex (20) were used as inputs.

Figure 6.4: Plot showing the efficiency (open dots) and purity (black dots) for the two dimensional decay length (16), the opening angle weighted with PT (18), and the transverse energy (20) for different values of the decision border.

page 49

To present these results more quantative we list for all the variables the efficiency at a purity of 80% and the purity at an efficiency of 95% as described in paragraph 6.4. Only the combinations that reached sufficiently high efficiency and purity are listed.

Variables Purity at 80% efficiency

Efficiency at 95% purity

4, 18, 20 0.8811 0.61775, 18, 20 0.7547 0.65596, 18, 20 0.9869 0.89617, 18, 20 0.9460 0.79118, 18, 20 0.6713 0.63389, 18, 20 0.7432 0.641910, 18, 20 0.8322 0.645111, 18, 20 0.9706 0.845512, 18, 20 0.9409 0.786813, 18, 20 0.8428 0.635414, 18, 20 0.9423 0.7307

Table 6.2: The purity and efficiency for a number of combinations of variables. The opening angle weighted with PT (18) and the transverse energy (20) were kept constant.

We find that the combination of the ET of the jet measured from calorimeter (6), the opening angle weighted with PT of the vertex (18) and the transverse energy of the vertex (20) have the highest efficiency at a purity of 80% and also the highest purity at an efficiency of 95% as well as the ET of the jet (11) in combination of the two variables. So in order to find better results we should investigate the output of the PNN using these variables. Therefore we drop the opening angle weighted with PT of the vertex (18) and keep the ET of the jet measured from calorimeter (6) and the transverse energy of the vertex (20) constant and look at the results again (see table 6.3). We also look at the results replacing the transverse energy of the vertex (20) by the ET measured from calorimeter (6) (table 6.4).



4, 6, 20 0.9922 0.91925, 6, 20 0.9880 0.89887, 6, 20 0.9855 0.89668, 6, 20 0.9855 0.89669, 6, 20 0.9853 0.8988

10, 6, 20 0.9853 0.8966

11, 6, 20 0.9855 0.896112, 6, 20 0.9860 0.896613, 6, 20 0.9885 0.911714, 6, 20 0.9862 0.897715, 6, 20 0.9874 0.894516, 6, 20 0.9864 0.895017, 6, 20 0.9848 0.894519, 6, 20 0.9869 0.8961

page 50

20, 6, 20 0.9890 0.893921, 6, 20 0.9853 0.894522, 6, 20 0.9871 0.895523, 6, 20 0.9860 0.896624, 6, 20 0.9848 0.8945Table 6.3: The purity and efficiency for a number of combinations of variables. The ET measured from calorimeter (6) and the transverse energy (20) were kept

constant.



4, 6, 18 0.9940 0.91605, 6, 18 0.9890 0.90157, 6, 18 0.9871 0.89398, 6, 18 0.9862 0.88649, 6, 18 0.9874 0.8901

10, 6, 18 0.9878 0.8993

11, 6, 18 0.9869 0.896112, 6, 18 0.9869 0.896113, 6, 18 0.9894 0.907914, 6, 18 0.9887 0.896615, 6, 18 0.9899 0.896616, 6, 18 0.9867 0.896617, 6, 18 0.9867 0.895519, 6, 18 0.9876 0.896120, 6, 18 0.9869 0.896121, 6, 18 0.9887 0.892822, 6, 18 0.9871 0.895023, 6, 18 0.9869 0.895024, 6, 18 0.9883 0.8955Table 6.4: The purity and efficiency for a number of combinations of variables. The ET measured from calorimeter (6) and

the opening angle weighted with PT (18) were kept constant.

It is clearly visible that the results are much better than those using the variables in the earlier test. Therefore we can state that the change of the variables was a good choice, since both efficiency and purity have increased for the used dataset. The best results are now achieved by using the number of jets of the full event (4), the ET of the jet measured from calorimeter (6) and the transverse energy of the vertex (20) and by the jet boosted 2 of the jet tracks with respect to the primary vertex (13), the ET of the jet measured from calorimeter (6) and the transverse energy of the vertex(20). Because there is no reason why the opening angle weighted with PT of the vertex (18) should have been dropt instead of the transverse energy of the vertex (20), we also looked at the results where the ET measured from calorimeter (6) and the opening angle weighted with PT (18) were kept constant. Also with this combination the PNN gives better output. In both cases the number of jets of the full event (4) and the jet-boosted 2 of the jet tracks with respect to the primary vertex (13) give very nice results. Therefore these variables will be further used in order to find better results.

Also the results were checked by replacing the ET measured from calorimeter (6) by the jet boosted ET (11), as was decided by the first test. The results were worse than those achieved by using the ET measured from calorimeter (6). Therefore the results are not quoted here and we will continue with the outcome of the earlier results. We will look at the results of runs where the ET

measured from calorimeter (6) and the jet-boosted 2 of the jet tracks with respect to the primary vertex (13) and the ET measured from calorimeter (6) and the number of jets of the full event (4) are held constant. The results are quoted in table 6.5 and 6.6.



1, 6, 13 0.9857 0.91764, 6, 13 0.9892 0.9300

page 51

5, 6, 13 0.9846 0.89186, 7, 13 0.9823 0.88856, 8, 13 0.9805 0.88266, 9, 13 0.9811 0.87836, 10, 13 0.9816 0.88806, 11, 13 0.9844 0.90156, 12, 13 0.9798 0.87996, 13, 14 0.9805 0.87996, 13, 15 0.9931 0.91226, 13, 16 0.9807 0.88266, 13, 17 0.9874 0.90746, 13, 18 0.9894 0.90796, 13, 19 0.9901 0.91066, 13, 20 0.9885 0.91176, 13, 21 0.9887 0.90956, 13, 22 0.9894 0.91176, 13, 23 0.9876 0.90416, 13, 24 0.9878 0.9106Table 6.5: The purity and efficiency for a number of combinations of variables. The ET measured from calorimeter (6) and

the jet-boosted 2 of the jet tracks with respect to the primary vertex (13) were

held constant. Variables Purity at 80%

efficiencyEfficiency at 95% purity

1, 4, 6 0.9852 0.89234, 5, 6 0.9763 0.88214, 6, 7 0.9743 0.87354, 6, 8 0.9749 0.87454, 6, 9 0.9736 0.87454, 6, 10 0.9766 0.87884, 6, 11 0.9818 0.89774, 6, 12 0.9772 0.87564, 6, 13 0.9892 0.93004, 6, 14 0.9784 0.88854, 6, 15 0.9940 0.91984, 6, 16 0.9754 0.87564, 6, 17 0.9901 0.86654, 6, 18 0.9940 0.91604, 6, 19 0.9903 0.91494, 6, 20 0.9922 0.91924, 6, 21 0.9892 0.91334, 6, 22 0.9943 0.88964, 6, 23 0.9910 0.87884, 6, 24 0.9899 0.8686Table 6.6: The purity and efficiency for a number of combinations of variables. The

number of jets of the full event (4) and the ET measured from calorimeter (6)

were held constant.

Again, some nice results are found for this combination of variables. The combination of the ET measured from calorimeter (6), the jet boosted 2 of the jet tracks with respect to the primary vertex (13) and a jet vertex variable proves to give good results.

The best result is achieved by the combination of the number of jets of the full event (4), the ET measured from calorimeter (6) and the jet boosted 2 of the jet tracks with respect to the primary vertex (13). This goes beyond the scope of this work, because these variables do not contain any information of the jets.

page 52

Figure 6.5: Plot showing the efficiency (open dots) and purity (black dots) for the number of jets of the full event (4), the ET measured from calorimeter (6) and the jet boosted 2 of the jet tracks with respect to the primary vertex (13).

6.7 Physical meaning of some results

As seen in the last chapters, the best results are not achieved by using jet variables measured by only the calorimeter or just jet tracks. The best way to find good results is to combine information measured by the calorimeter and tracks. The most probable explanation for this effect is that the jet is better described by combining track and calorimeter information. The PNN then has more information to make its decision than when for example only the measurements of the calorimeter would be used and better results are therefore achieved. This effect is also clear when combining a jet variable and a global event variable, for example the event energy. In this case, you don’t look only at the b-jets but also at other event related variables.

There are of course some variables that offer more separating powers then others. An easy way to look at the potential of the variable is to look at the distributions of the variables. When I plot them using PAW and separate the variables for b-jet-events and background, you can see in which way the distribution for the variable differs in case there was a b-jet event or background. For some variables this difference is very small, but for some variables the distributions clearly differ.

When you look for example at figure 6.5, you can see two distributions plotted. The left distribution seems to give more separating powers then the right one, because of the more different pattern of both of the distributions. But the combination of certain variables can work very well because of the fact that they supply different information about the event. This can not be seen by looking at the distributions because this is a physical problem. For example, transverse energy and momentum both have promising distributions but the two variables do not complete each other because they are correlated. Adding another variable (like an angle) will give better results

page 53

because in combination with an energy or momentum the two give more information about the jet. This can be seen by the results.

Figure 6.6: Distributions of the transverse energy measured from calorimeter (left) and the jet boosted aplanarity (right) are shown. The dotted line shows the distribution for the variable in case a b-jet was found, the black line if not. More distributions are shown in the appendix.

6.8 Results using four input variables

Now that we know a number of good combinations of variables we will now test the performance of the PNN with one extra input variable. I have chosen to use the best combination of the previous runs. The number of jets of the full event (4), the ET measured from calorimeter (6) and the jet boosted 2 of the jet tracks with respect to the primary vertex (13) are kept constant.

Figure 6.7: Plot showing the efficiency (open dots) and purity (black dots) for the full event ET (3), the number of jets of the full event (4), the ET measured from calorimeter (6) and the jet boosted 2 of the jet tracks with respect to the

primary vertex (13).

page 54



1, 4, 6, 13 0.9993 -2, 4, 6, 13 0.9966 0.95803, 4, 6, 13 - 0.98335, 4, 6, 13 0.9885 0.92037, 4, 6, 13 0.9887 0.93168, 4, 6, 13 0.9899 0.92579, 4, 6, 13 0.9897 0.929510, 4, 6, 13 0.9901 0.926211, 4, 6, 13 0.9901 0.944512, 4, 6, 13 0.9878 0.925114, 4, 6, 13 0.9876 0.922515, 4, 6, 13 0.9961 0.941316, 4, 6, 13 0.9947 0.936517, 4, 6, 13 0.9933 0.938618, 4, 6, 13 0.9966 0.943519, 4, 6, 13 0.9931 0.938620, 4, 6, 13 0.9929 0.939721, 4, 6, 13 0.9897 0.928922, 4, 6, 13 0.9963 0.941823, 4, 6, 13 0.9926 0.939124, 4, 6, 13 0.9936 0.9041

Table 6.7: The purity and efficiency for a number of combinations of variables. Var4, var6 and var13 were held constant.

page 55

It is clear that these are very good results but in this case only event related variables have been used. So this falls beyond this research.

page 56

Conclusion

The main goal of this thesis is to study the use of a neural network in b-tagging for the DØ experiment. I studied jet variables based on vertex-tracking and calorimeter information from a sample to provide the signal events and a QCD sample to provide the background events.

A Neural Network is the tool to study a multivariable problem. In this thesis a Probalistic Neural Network is used, because this algorithm has its roots in probability theory. It can also handle points that are very different from the majority, which is an advantage in this case, because of the loose points in the distributions of some variables.

Because of the limited statistics I started with combinations of only two variables to find the most discriminating variables. Using two variables seemed to be too few since there was only selected on whether there was a secondary vertex or not. Using most combinations of the jet vertex variables an efficiency of 60% is reached at a purity of almost 100%. It was clear to go on using three variables.

Using three variables instead of two made significant positive changes to the output of the PNN. Combining jet variables based on tracking, vertexing and calorimeter leads to the best discriminating power. Promising variables are the combination of the ET measured from calorimeter, the jet boosted 2 of the jet tracks with respect to the primary vertex and a jet vertex variable. With the use of the 3 dimensional decay length (x,y,z) as third variable a purity of 91,2 % at an efficiency of 80 % is achieved.

Adding a third variable made significant positive changes to the output of the PNN so continuing with four inputs was the logical next step. Although the study concentrated mainly on jet-variables, some global event variables were also studied. It turns out that global variables like the full event energy, the full event transverse energy and the number of jets of the full event have good discriminating powers in combination with jet variables. When using a larger number of input variables the addition of these global event variables leads to a significant raise of the purity and efficiency. Using four inputs (the full event transverse energy and number of jets, the transverse energy measured from calorimeter and the 2 of the jet tracks with respect to the primary vertex) a purity of 98,3 % was achieved at an efficiency of 80 %.

Without the addition of global variables a fourth input variable seemed not to lead to significant better results. In most cases no more than 0,01 percentpoint better purity was achieved with a fourth jet variable.

Since most of the profit was due to adding global variables to the three variable combinations earlier found, adding a fifth variable seemed to be beyond the scope of this project. Therefore, for a follow up of this study it is

page 57

recommended to include more global event variables to achieve better results.

page 58

Bibliography

[1] S. Abachi et al.: “The DØ Upgrade”, DØNote 2542, 1995.

[2] The DØ Collaboration: “Physics Highlights from the DØ Experiment 1992 – 1999”, January 2000.

[3] “New Computing Techniques in Physics Research IV”, page 473, 1995.

[4] “New Computing Techniques in Physics Research IV”, page 670, 1995.

[5] The OPAL Collaboration, G. Abbiendi et al.: “Production rates of quark pairs from gluons and events in hadronic Z decays”, CERN-EP-2000-123, June 2000.

[6] The OPAL Collaboration, G. Abbiendi et al.: “Search for Higgs Bosons in e+e- Collisions at 183 GeV”, CERN-EP/98-173, October 1998.

[7] L. Bellantoni, J. S. Conway, J. E. Jacobsen, Y. B. Pan and Sau Lan Wu: “Using neural networks with jet shapes to identify b jets in e+ e-

interactions”, CERN-PPE-91-80, July 1991.

[8] The Human Brain (http://brain.web-us.com).

[9] M. Carena, J. S. Conway, H. E. Haber, J. D. Hobbs, et al.: “Report of the Higgs Working Group of the Tevatron Run 2 SUSY/Higgs Workshop”, Fermilab-Conf-00/279-T, SCIPP-00/37, hep-ph/0010338, October 2000.

page 59

http://brain.web-us.com/

Appendix

This appendix contains plots of the distributions of the variables used as input for the PNN. Marked by the black lines are the values of the variables in case of a background process, the dotted line shows the values of the variables for a tt event. The plots are generated using PAW.

page 60

1 · Web viewtwo leptons and neutrinos and two b quark jets (see the diagram in Fig. 3.3). The...

Documents

Transcript of 1 · Web viewtwo leptons and neutrinos and two b quark jets (see the diagram in Fig. 3.3). The...