For a given rate of emitted photons, there's a probability function which
gives the number of photons we detect, even assuming 100% detection
efficiency, because of *statistical* uncertainties. In addition,
there may also be *instrumental* uncertainties. Consequently, we
now turn to the concepts of probability distributions, with particular
interest in the distribution which applies to the detection of photons.

*Distributions and characteristics thereof*

- concept of a distribution : define
*p*(*x*)*dx*as probability of event occuring in*x*+*dx*:*p*(*x*)*dx*= 1

Some definitions relating to values which characterize a distribution:

mean≡*μ* = *xp*(*x*)*dx*

variance≡*σ*^{2} = (*x* - *μ*)^{2}*p*(*x*)*dx*

standarddeviation≡*σ* =

median : mid-point value.

=

mode : most probable value

Note that the geometric interpretation of above quantities depends on the nature of the distribution; although we all carry around the picture of the mean and the variance for a Gaussian distribution, these pictures are not applicable to other distributions, but the quantities are still well-defined.

Also, note that there is a difference between the *sample* mean,
variance, etc. and the *population* quantities. The latter apply
to the true distribution, while the former are estimates of the latter
from some finite sample (*N* measurements) of the population. The
sample quantities are derived from:

sample mean : ≡

sample variance≡ =

The sample mean and variance approach the true mean and variance as N approaches infinity. But note, especially for small samples, your estimate of the mean and variance may differ from their true (population) values (more below)!

*The binomial distribution*

Now we consider what distribution is appropriate for the detection
of photons. The photon distribution can be derived from the *binomial*
distribution, which gives the probability of observing the number,
*x*, of some possible event, given a total number of events *n*, and
the probability of observing the particular event (among all other
possibilities) in any single event, *p*, under the assumption that all events
are independent of each other:

For the binomial distribution, one can derive:

mean≡*xp*(*x*)*dx* = *np*

variance≡*σ*^{2}≡(*x* - *μ*)^{2}*p*(*x*)*dx* = *np*(1 - *p*)

*The Poisson distribution*

In the case of detecting photons, *n* is the total number of photons
emitted, and *p* is the probability of detecting a photon during our
observation out of the total emitted. We don't know either of these
numbers! However, we do know that *p* < < 1 and we know, or at least we
can estimate, the mean number detected:

In this limit, the binomial distribution asymtotically approaches
the *Poisson* distribution:

From the expressions for the binomial distribution in this limit,
the mean of the distribution is *μ*, and the variance is

variance = [(*x* - *μ*)^{2}*p*(*x*, *μ*)]

variance = *np* = *μ*

Note that the Poisson distribution is generally the appropriate distribution
not only for counting photons, but for *any* sort of counting experiment
where a series of events occurs with a known average rate, and are independent
of time since the last event.

What does the Poisson distribution look like?
Plots
for
*μ* = 2, *μ* = 10, *μ* = 10000.

*The normal, or Gaussian, distribution*

Note, for large *μ*, the Poisson distribution is well-approximated around
the peak by a *Gaussian*, or *normal* distribution:

This is important because it allows us to use ``simple'' least squares techniques to fit observational data, because these generally assume normally distributed data. However, beware that in the tails of the distribution, and at low mean rates, the Poisson distribution can differ significantly from a Gaussian distribution. In these cases, least-squares may not be appropriate to model observational data; instead, one might need to consider maximum likelihood techniques instead.

The normal distribution is also important because many physical variables
seem to be distributed accordingly. This may not be an accident because
of the *central limit theorem*: if a quantity depends on a number of
independent random variables with ANY distribution, the quantity itself
will be distributed normally (see statistics texts). In observational techniques, we
encounter the normal distribution because one important source of
instrumental noise, *readout noise*, is distributed normally.

*Importance of error distribution analysis*

You need to understand the expected uncertainties in your observations in order to:

- predict the amount of observing time you'll need to get
uncertainties as small as you need them to do your science,
- answer the question: is scatter in observed data consistent with expected
uncertainties? If the answer is no, they you know you've either learned some
astrophysics or you don't understand something about your observations.
This is especially important in astronomy where objects are faint
and many projects are pushing down into the noise as far as possible.
Really we can usually only answer this probabilistically. Generally,
tests compute the probability that the observations are consistent with
an expected distribution (the null hypothesis). You can then look to see
if this probability is low, and if so, you can reject the null hypothesis.
- interpret your results in the context of a scientific prediction

*Confidence levels*

For example, say we want to know whether some single point is consistent with expectations, e.g., we see a bright point in multiple measurements of a star, and want to know whether the star flared. Say we have a time sequence with known mean and variance, and we obtain a new point, and want to know whether it is consistent with known distribution?

If the form of the probability distribution is known, then you can
calculate the probability of getting a measurement more than some
observed distance from the mean.
In the case where the observed distribution is Gaussian (or approximately
so), this is done using the *error function* (sometimes called
*erf(x)*), which is the integral of a gaussian from some starting
value.

Some simple guidelines to keep in mind follow (the actual situation often
requires more sophisticated statistical tests). First, for
Gaussian distributions, you can calculate that 68% of the points
should fall within plus or minus one sigma from the mean, and 95.3%
between plus or minus two sigma from the mean. Thus, if you have a
time line of photon fluxes for a star, with N observed points, and a
photon noise *σ* on each measurement, you can test whether the
number of points deviating more than 2*σ* from the mean is much
larger than expected. To decide whether any single point is really
significantly different, you might want to use more stringent criterion,
e.g., 5*σ* rather than a 2*σ* criterion; a 5*σ* has much
higher level of significance. On the other hand, if you have far more points
in the range
2 - -4*σ* brighter or fainter than you would expect,
you may also have a significant detection of intensity variations
(provided you really understand your uncertainties on the measurements!).

Also, note that your observed distribution should be consistent with
your uncertainty estimates given the above guidelines. If you have a whole
set of points, that all fall within 1*σ* of each other, something
is wrong with your uncertainty estimates (or perhaps your measurements are
correlated with each other)!

For a series of measurements, one can calculate the *χ*^{2} statistic, and
determine how probable this value is, given the number of points.

*Signal-to-noise*

Astronomers often describe uncertainties in terms of the fractional error, e.g.
the amplitude of the uncertainty divided by the amplitude of the quantity
being measured; often, the inverse of this, referred to as the
*signal-to-noise ratio* is used.
Given an estimate the number of photons expected from an object in
an observation, we can calulate the *signal-to-noise* ratio:

=

which is the inverse of the predicted fractional error (
Consider an object with observed photon flux (per unit area and time, e.g.
from the count equation above), *S*^{′},
leading to a signal,
*S* = *S*^{′}*Tt* where *T* is the telescope area and
*t* is the exposure time. In the simplest case, the only noise source
is Poisson statistics from the source, in which case:

= =

In other words, the S/N increases as the square root of the object
brightness, telescope area, efficiency, or exposure time. Note that

*Background noise*

A more realistic case includes the noise contributed from Poisson
statistics of ``background'' light (more on the physical nature of this
later), *B*^{′}, which has
units of flux per area on the sky (i.e. a surface brightness); note
that this is also usually given in magnitudes.

The total number of photons observed, *O*, is

This leads to a common form of the *noise equation*:

=

Breaking out the dependence on exposure time and telescope area, this is:

=

In the *signal-limited*
case,
*S*^{′} > > *B*^{′}*A*, we get

= =

In the

= =

As one goes to fainter objects, the S/N drops, and it drops faster when
you're background limited. This illustrates the importance of dark-sky
sites, and also the importance of good image quality.
Consider two telescopes of collecting area, *T*_{1} and *T*_{2}. If we
observe for the same exposure time on each and want to know how much
fainter we can see with the larger telescope at a given S/N, we
find:

*Instrumental noise*

In addition to the uncertainties from Poisson statistics (statistical noise),
there may be additional terms from instrumental uncertainties. A common example
of this that is applicable for CCD detectors is readout noise, which
is additive *noise* (with zero mean!) that comes from the detector and
is independent of signal level. For a detector whose readout noise is
characterized by
*σ*_{rn},

=

if a measurement is made in a single pixel. If an object is spread over

=

For large
*σ*_{rn}, the behavior is the same as the background
limited case. This makes it clear that if you have readout noise, image
quality (and/or proper optics to keep an object from covering too many pixels)
is important for maximizing S/N. It is also clear that it
is critical to have minimum read-noise for low background applications
(e.g., spectroscopy).

There are other possible additional terms in the noise equation, arising from things like dark current, digitization noise, uncertainties in sky determination, uncertainties from photometric technique, etc. (we'll discuss some of these later on), but in most applications, the three sources discussed so far – signal noise, background noise, and readout noise – are the dominant noise sources.

Note the applications where one is likely to be signal dominated, background dominated, and readout noise dominated.

Why are the three uncertainty terms in the noise equation added in
quadrature? The measured quantity (S) is a sum of
*S* + *B* - < *B* > + < *R* >,
where < *R* > = 0 since readout noise has zero mean. The uncertainty in a summed
series is computed by addding the individual uncertainties in quadature;
in the equation above, we have neglected the uncertainty in < *B* >. To
understand why they add in quadrature, let's consider general error propagation.

More reasons to consider error propagation: let's say we want to make some calculations (e.g., calibration, unit conversion, averaging, conversion to magnitudes, calculation of colors, etc.) using these observations: we need to be able to estimate the uncertainties in the calculated quantities that depend on our measured quantities.

Consider what happens if you have several known quantities with known error distributions and you combine these into some new quantity: we wish to know what the uncertainty is in the new quantity.

As long as uncertainties are small:

= *σ*_{u}^{2} + *σ*_{v}^{2} +2*σ*_{uv}^{2} + ...

The last term is the *covariance*, which
relates to whether uncertainties are *correlated*.

Examples for *uncorrelated* errors:

- addition/subtraction:
*x*=*u*+*v**σ*_{x}^{2}=*σ*_{u}^{2}+*σ*_{v}^{2}In this case, errors are said to

*add in quadrature*. - multiplication ( or division, if you formulate the division problem as a multiplication!):
*x*=*uv**σ*_{x}^{2}=*v*^{2}*σ*_{u}^{2}+*u*^{2}*σ*_{v}^{2} - natural logs:
*x*= ln*u**σ*_{x}^{2}=*σ*_{u}^{2}/*u*^{2}*x*= log*e*ln*x*Note that when dealing with logarithmic quantities, uncertainties in the log correspond to

*fractional*uncertainties in the raw quantity.

*Distribution of resultant uncertainties*

When propagating errors, even though you can calculate the variances in the new variables, the distribution of uncertainties in the new variables is not, in general, the same as the distribution of uncertainties in the original variables, e.g. if uncertainties in individual variables are normally distributed, uncertainties in output variable are not necessarily.

When two variables are added, however, the output is normally distributed.

We've covered uncertainties in single measurements. Next we turn to averaging measurements. Say we have multiple observations and want the best estimate of the mean and variance of the population, e.g. multiple measurements of stellar brightness. Here we'll define the best estimate of the mean as the value which maximizes the likelihood that our estimate equals the true parent population mean.

For equal uncertainties, this estimate just gives our normal expression for the sample mean:

=

Using error propagation, the estimate of the uncertainty in the sample mean
is given by:

But what if uncertainties on each observation aren't equal, say for example we have observations made with several different exposure times? Then the optimal determination of the mean is using a:

weightedmean =

and the estimated uncertainty in this value is given by:

This is a standard result for determining sample means from a set of observations with different weights.

However, there can sometimes be a subtlety in applying this formula, which
has to do with the question: how do we go about choosing the weights/errors,
*σ*_{i}? We know we can *estimate* *σ* using Poisson
statistics for a given count rate, but remember that this is a sample
variance (which may be based on a single observation!) not the true population
variance. This can lead to biases.

Consider observations of a star made on three nights, with measurements
of 40, 50, and 60 photons. It's clear that the mean observation is
50 photons. However, beware of the being trapped by your undertainty *estimates*. From each observation alone, you would estimate uncertainties of
, , and . If you plug these uncertainty
estimates into a computation of the weighted mean, you'll get a mean
rate of 48.64!

Using the individual estimates of the variances, we'll bias values to lower rates, since these will have estimated higher S/N.

Note that it's pretty obvious from this example that you should just weight all observations equally. However, note that this certainly isn't always the right thing to do. For example, consider the situation in which you have three exposures of different exposure times, and you are calculating the photon rate (counts/s). Here you probably want to give the longer exposures higher weight (at least, if they are signal or background limited). In this case, you again don't want to use the individual uncertainty estimates or you'll introduce a bias. There is a simple solution here also: just weight the observations by the exposure time. However, while this works fine for Poisson uncertainties (variances proportional to count rate), it isn't strictly correct if there are instrumental uncertainties as well which don't scale with exposure time. For example, the presence of readout noise can have this effect; if all exposures are readout noise dominated, then one would want to weight them equally, if readout noise dominates the shorter but not the longer exposures, once would want to weight the longer exposures even higher than expected for the exposure time ratios! The only way to properly average measurements in this case is to estimate a sample mean, then use this value scaled to the appropriate exposure times as the input for the Poisson uncertainties.

Another subtlety: averaging counts and converting to magnitudes is not the same as averaging magnitudes!

*Can you split exposures?*

Although from *S*/*N* considerations, one can determine the required
number of counts you need (exposure time) to do your science, when
observing, one must also consider the question of whether this time
should be collected in single or in multiple exposures, i.e. how long
individual exposures should be. There are several reasons why one might
imagine that it is nicer to have a sequence of shorter exposures rather
than one single longer exposure (e.g., tracking, monitoring of photometric
conditions, cosmic ray rejection, saturation issues), so we need to consider
under what circumstances doing this results in poorer S/N.

Consider the object with photon flux *S*^{′}, background surface brightness
*B*^{′},
and detector with readout noise
*σ*_{rn}. A single short
exposure of time *t* has a variance:

=

The only difference is in the readout noise term! In the signal- or background-limited regimes, exposures can be added with no loss of S/N. However, if readout noise is significant, then splitting exposures leads to reduced S/N.

So far, we've been discussing *random* errors. There is an additional,
usually more troublesome, type of errors known as *systematic* errors.
These don't occur randomly but rather are correlated with some, possibly
unknown, variable relating to your observations, and can have the effect
of not just adding spread around the true value that you are trying to measure,
but actually measuring the wrong mean.

EXAMPLE : flat fielding

EXAMPLE : WFPC2 CTE

Note also that in some cases, systematic errors can masquerade as random errors in your test observations (or be missing altogether if you don't take data in exactly the same way), but actually be systematic in your science observations.

EXAMPLE: flat fielding, subpixel QE variations.

Note that error analysis from expected random errors may be the only clue you get to discovering systematic errors. To discover systematic errors, plot residuals vs. everything!