Up: AY535 class notes Previous: Uncertainties and error analysis

Subsections

Noise equation: how do we predict expected uncertainties?

(Entire section in one PDF file).

Signal-to-noise

Astronomers often describe uncertainties in terms of the fractional error, e.g. the amplitude of the uncertainty divided by the amplitude of the quantity being measured; often, the inverse of this, referred to as the signal-to-noise ratio is used. Given an estimate the number of photons expected from an object in an observation, we can calulate the signal-to-noise ratio:

$\displaystyle {S\over N}$ = $\displaystyle {S \over \sqrt{\sigma^2}}$

which is the inverse of the predicted fractional error (N/S).

Consider an object with observed photon flux (per unit area and time, e.g. from the signal equation above), S^′, leading to a signal, S = S^′Tt where T is the telescope area and t is the exposure time. In the simplest case, the only noise source is Poisson statistics from the source, in which case:

σ² = S = S^′Tt

$\displaystyle {S\over N}$ = $\displaystyle \sqrt{{S}}$ = $\displaystyle \sqrt{{S^\prime T t}}$

In other words, the S/N increases as the square root of the object brightness, telescope area, efficiency, or exposure time. Note that S is directly observable, so one can calculate the S/N for an observation without knowing the telescope area or exposure time! We've just broken S down so that you can specifically see the dependence on telescope area and/or exposure time.

$\begin{shaded} \textit{ Understand the concept of S/N and fractional error. Know how S/N depends on the signal for the Poisson-limited case.} \end{shaded}$

Background noise

A more realistic case includes the noise contributed from Poisson statistics of ``background'' light (more on the physical nature of this later), B^′, which has units of flux per area on the sky (i.e. a surface brightness); note that this is also usually given in magnitudes.

B^′ = $\displaystyle \int$ $\displaystyle {B_\lambda\over{hc\over\lambda}}$ q_λdλ

The amount of background in our measurement depends on how we choose to make the measurement (how much sky area we observe). Say we just use an aperture with area, A, so the total observed background counts is

AB = AB^′Tt

Again, B^′Tt is the directly observable quantity, but we split it into the quantities on which it depends to understand what factors are important in determining S/N.

The total number of photons observed, O, is

O = S + AB = (S^′ + AB^′)Tt

The variance of the total observed counts, from Poisson statistics, is:

σ_O² = O = S + AB = (S^′ + AB^′)Tt

To get the desired signal from the object only , we will need to measure separately the total signal and the background signal to estimate:

S≡S^′Tt = O - A

where is some estimate we have obtained of the background surface brightness. The noise in the measurement is

σ_S² $\displaystyle \approx$ σ_O² = S + AB = (S^′ + AB^′)Tt

where the approximation is accurate if the background is determined to high accuracy, which one can do if one measures the background over a large area, thus getting a large number of background counts (with correspondingly small fractional error in the measurement).

This leads to a common form of the noise equation:

$\displaystyle {S\over N}$ = $\displaystyle {S \over \sqrt{S + A B}}$

Breaking out the dependence on exposure time and telescope area, this is:

$\displaystyle {S\over N}$ = $\displaystyle {S^\prime \over \sqrt{S^\prime + A B^\prime }}$ $\displaystyle \sqrt{{T}}$ $\displaystyle \sqrt{{t}}$

In the signal-limited case, S^′ > > B^′A, we get

$\displaystyle {S\over N}$ = $\displaystyle \sqrt{{S}}$ = $\displaystyle \sqrt{{S^\prime t T}}$

In the background limited case, B^′A > > S^′, and

$\displaystyle {S\over N}$ = $\displaystyle {S\over \sqrt{AB}}$ = $\displaystyle {S^\prime\over \sqrt{B^\prime A}}$ $\displaystyle \sqrt{{t T}}$

As one goes to fainter objects, the S/N drops, and it drops faster when you're background limited. This illustrates the importance of dark-sky sites, and also the importance of good image quality.

Consider two telescopes of collecting area, T₁ and T₂. If we observe for the same exposure time on each and want to know how much fainter we can see with the larger telescope at a given S/N, we find:

S₂ = $\displaystyle {T_1\over T_2}$ S₁

for the signal-limited case, but

S₂ = $\displaystyle \sqrt{{T_1\over T_2}}$ S₁

for the background-limited case.

$\begin{shaded} \textit{ Understand how Poisson uncertainties in the background c... ...lity because the amount of background included in the measurement.} \end{shaded}$

Instrumental noise

In addition to the uncertainties from Poisson statistics (statistical noise), there may be additional terms from instrumental uncertainties. A common example of this that is applicable for CCD detectors is readout noise, which is additive noise (with zero mean!) that comes from the detector and is independent of signal level. For a detector whose readout noise is characterized by σ_rn,

$\displaystyle {S\over N}$ = $\displaystyle {S \over \sqrt{S + B A_{pix} + \sigma_{rn}^2}}$

if a measurement is made in a single pixel. If an object is spread over N_pix pixels, then

$\displaystyle {S\over N}$ = $\displaystyle {S \over \sqrt{S + B A + N_{pix}\sigma_{rn}^2}}$

For large σ_rn, the behavior is the same as the background limited case. This makes it clear that if you have readout noise, image quality (and/or proper optics to keep an object from covering too many pixels) is important for maximizing S/N. It is also clear that it is critical to have minimum read-noise for low background applications (e.g., spectroscopy).

There are other possible additional terms in the noise equation, arising from things like dark current, digitization noise, uncertainties in sky determination, uncertainties from photometric technique, etc. (we'll discuss some of these later on), but in most applications, the three sources discussed so far – signal noise, background noise, and readout noise – are the dominant noise sources.

Note the applications where one is likely to be signal dominated, background dominated, and readout noise dominated.

$\begin{shaded} \textit{ Understand readout noise and how it represented by a nor... ...nces readout noise is an important contributor to the total noise.} \end{shaded}$

Error propagation

Why are the three uncertainty terms in the noise equation added in quadrature? The measured quantity (S) is a sum of S + B - + < R >, where < R > = 0 since readout noise has zero mean. The uncertainty in a summed series is computed by addding the individual uncertainties in quadature; in the equation above, we have neglected the uncertainty in . To understand why they add in quadrature, let's consider general error propagation.

More reasons to consider error propagation: let's say we want to make some calculations (e.g., calibration, unit conversion, averaging, conversion to magnitudes, calculation of colors, etc.) using these observations: we need to be able to estimate the uncertainties in the calculated quantities that depend on our measured quantities.

Consider what happens if you have several known quantities with known error distributions and you combine these into some new quantity: we wish to know what the uncertainty is in the new quantity.

x = f (u, v,....)

The question is what is σ_x if we know σ_u, σ_v, etc.?

As long as uncertainties are small:

x_i - < x > ∼(u_i - ) $\displaystyle \left(\vphantom{{{\partial{x}}\over{\partial {u}}}}\right.$ $\displaystyle {{\partial{x}}\over{\partial {u}}}$ $\displaystyle \left.\vphantom{{{\partial{x}}\over{\partial {u}}}}\right)$ + (v_i - < v > ) $\displaystyle \left(\vphantom{{{\partial{x}}\over{\partial {v}}}}\right.$ $\displaystyle {{\partial{x}}\over{\partial {v}}}$ $\displaystyle \left.\vphantom{{{\partial{x}}\over{\partial {v}}}}\right)$ + ...

σ_x² = lim(N - > ∞) $\displaystyle {1\over N}$ $\displaystyle \sum$ (x_i - < x > )²

= σ_u² $\displaystyle \left(\vphantom{{{\partial{x}}\over{\partial {u}}}}\right.$ $\displaystyle {{\partial{x}}\over{\partial {u}}}$ $\displaystyle \left.\vphantom{{{\partial{x}}\over{\partial {u}}}}\right)^{2}_{}$ + σ_v² $\displaystyle \left(\vphantom{{{\partial{x}}\over{\partial {v}}}}\right.$ $\displaystyle {{\partial{x}}\over{\partial {v}}}$ $\displaystyle \left.\vphantom{{{\partial{x}}\over{\partial {v}}}}\right)^{2}_{}$ +2σ_uv² $\displaystyle {{\partial{x}}\over{\partial {u}}}$ $\displaystyle {{\partial{x}}\over{\partial {v}}}$ + ...

The last term is the covariance, which relates to whether uncertainties are correlated.

σ_uv² = lim(n - > ∞) $\displaystyle {1\over N}$ $\displaystyle \sum$ (u_i - )(v_i - < v > )

If uncertainties are uncorrelated, then σ_uv = 0 because there's equal chance of getting opposite signs on v_i for any given u_i. When working out uncertainties, make sure to consider whether there are correlated errors! If there are, you may be able to reformulate quantities so that they have independent errors: this can be very useful!

Examples for uncorrelated errors:

addition/subtraction:

x = u + v

σ_x² = σ_u² + σ_v²

In this case, errors are said to add in quadrature.
multiplication ( or division, if you formulate the division problem as a multiplication!):

x = uv

σ_x² = v²σ_u² + u²σ_v²
natural logs:

x = ln u

σ_x² = σ_u²/u²
For base-10 logs, note that log x = log e ln x
Note that when dealing with logarithmic quantities, uncertainties in the log correspond to fractional uncertainties in the raw quantity.

Distribution of resultant uncertainties

When propagating errors, even though you can calculate the variances in the new variables, the distribution of uncertainties in the new variables is not, in general, the same as the distribution of uncertainties in the original variables, e.g. if uncertainties in individual variables are normally distributed, uncertainties in output variable are not necessarily.

When two variables are added, however, the output is normally distributed.

$\begin{shaded} \textit{Know the error propagation formula and how to apply it.} \end{shaded}$

Determining sample parameters: averaging measurements

We've covered uncertainties in single measurements. Next we turn to averaging measurements. Say we have multiple observations and want the best estimate of the mean and variance of the population, e.g. multiple measurements of stellar brightness. Here we'll define the best estimate of the mean as the value which maximizes the likelihood that our estimate equals the true parent population mean.

For equal uncertainties, this estimate just gives our normal expression for the sample mean:

$\displaystyle \bar{x}$ = $\displaystyle {\sum x_i \over N}$

Using error propagation, the estimate of the uncertainty in the sample mean is given by:

σ² = $\displaystyle \sum$ $\displaystyle {\sigma_i^2\over N^2}$ = $\displaystyle {\sigma^2\over N}$

But what if uncertainties on each observation aren't equal, say for example we have observations made with several different exposure times? Then the optimal determination of the mean is using a:

weightedmean = $\displaystyle {\sum {x_i\over\sigma_i^2} \over \sum {1\over\sigma_i^2}}$

and the estimated uncertainty in this value is given by:

σ_μ² = $\displaystyle \sum$ $\displaystyle {{\sigma_i^2\over \sigma_i^4} \over ({\sum {1\over \sigma_i^2}})^2}$ = $\displaystyle {1 \over \sum{1\over\sigma_i^2}}$

where the σ_i's are individual weights/errors (people often talk about the weight of an observation, i.e. ${1\over\sigma_i^2}$ : large weight means small uncertainty.

This is a standard result for determining sample means from a set of observations with different weights.

However, there can sometimes be a subtlety in applying this formula, which has to do with the question: how do we go about choosing the weights/errors, σ_i? We know we can estimate σ using Poisson statistics for a given count rate, but remember that this is a sample variance (which may be based on a single observation!) not the true population variance. This can lead to biases.

Consider observations of a star made on three nights, with measurements of 40, 50, and 60 photons. It's clear that the mean observation is 50 photons. However, beware of the being trapped by your undertainty estimates. From each observation alone, you would estimate uncertainties of $\sqrt{{40}}$ , $\sqrt{{50}}$ , and $\sqrt{{60}}$ . If you plug these uncertainty estimates into a computation of the weighted mean, you'll get a mean rate of 48.64!

Using the individual estimates of the variances, we'll bias values to lower rates, since these will have lower estimated variances.

Note that it's pretty obvious from this example that you should just weight all observations equally. However, note that this certainly isn't always the right thing to do. For example, consider the situation in which you have three exposures of different exposure times, and you are calculating the photon rate (counts/s). Here you probably want to give the longer exposures higher weight (at least, if they are signal or background limited). In this case, you again don't want to use the individual uncertainty estimates or you'll introduce a bias. There is a simple solution here also: just weight the observations by the exposure time. However, while this works fine for Poisson uncertainties (variances proportional to count rate), it isn't strictly correct if there are instrumental uncertainties as well which don't scale with exposure time. For example, the presence of readout noise can have this effect; if all exposures are readout noise dominated, then one would want to weight them equally, if readout noise dominates the shorter but not the longer exposures, once would want to weight the longer exposures even higher than expected for the exposure time ratios! The only way to properly average measurements in this case is to estimate a sample mean, then use this value scaled to the appropriate exposure times as the input for the Poisson uncertainties. See the next subsection for a more comprehensive discussion.

Another subtlety: averaging counts and converting to magnitudes is not the same as averaging magnitudes!

Weighted mean of a series of exposures

Say you have a bunch of measurements of the brightness of an object. Under the assumption that the object is not variable, you wish to get the best estimate of the count rate of the object. Your measurements are not all at the same exposure time. So you have a bunch of measurements:

c_i = rt_i

where each of the c_i has noise associated with it.

For each measurement, you can calculate a rate:

r_i≡ $\displaystyle {c_i\over t}$

Now you wish to average all of the measured rates to get the best estimate of the true rate. Because the measurements were taken with different exposure times, they have different amount of uncertainty, so you want to use a weighted mean to get the best estimate.

To calculate a weighted mean, you need to determine the variances of the observations. Given the true rate, r, the variances of the measurements are

σ² = rt_i + nσ_rn²

where n is the number of pixels you've summed over to get the total counts, and σ_rn is the readout noise.

To get the variances of the rates, r_i = ${c_i\over t}$ , we need to divide the variances of the counts by t² (i.e. divide the standard deviations by t, so

σ² = r/t_i + nσ_rn²/t_i²

The problem is that you don't know what r is. Normally, you might just take

σ_i² = r_i + nσ_rn²/t_i²

but the problem with this is that you'd be weighting each measurment as if it had a different true rate, when in fact they all have the same true rate; this can lead to biases.

Let's go back to the correct formulation, which gives for the weighted mean:

< r > = $\displaystyle {\sum {r_i \over (r / t_i + n \sigma_{rn}^2 / t_i^2)} \over \sum { 1 \over (r / t_i + n \sigma_{rn}^2 / t_i^2)}}$

In the case of all equal exposure times, then the σ² come out of the sums and cancel between the numerator and denominator, leaving

< r > = $\displaystyle \sum$ $\displaystyle {r_i \over N}$

i.e., an unweighted mean, just as we'd expect

In the case of zero readout noise we have

< r > = $\displaystyle {\sum {r_i \over r / t_i} \over \sum { 1 \over r / t_i}}$

The r can now come out of the sums and cancel in the numerator and denominator, so the weighted mean is now independent of the true weight and becomes

< r > = $\displaystyle {\sum {t_i r_i } \over \sum {t_i}}$

i.e., calculate the mean by weighting by the exposure times.

For the general case of unequal exposure times in the presence of readout noise, you can avoid biases by choosing any reasonable estimate of r for use in calculating the weights. For example, you could use the observed rate with the highest S/N as the basis for calculating weights. Or you could even iterate the solution a couple of times. If you were to do this, you'd find that the weighted mean hardly depends at all on what you choose for the estimate of r so long as it is close to the true r.

$\begin{shaded} \textit{Understand how the distinction between sample variance an... ...biases when calculating a weighted mean, and how to overcome this.} \end{shaded}$

Can you split exposures?

Although from S/N considerations, one can determine the required number of counts you need (exposure time) to do your science, when observing, one must also consider the question of whether this time should be collected in single or in multiple exposures, i.e. how long individual exposures should be. There are several reasons why one might imagine that it is nicer to have a sequence of shorter exposures rather than one single longer exposure (e.g., tracking, monitoring of photometric conditions, cosmic ray rejection, saturation issues), so we need to consider under what circumstances doing this results in poorer S/N.

Consider the object with photon flux S^′, background surface brightness B^′, and detector with readout noise σ_rn. A single short exposure of time t has a variance:

σ_S² = S^′Tt + B^′ATt + N_pixσ_rn²

If N exposures are summed, the resulting variance will be

σ_N² = Nσ_S²

If a single long exposure of length Nt is taken, we get

σ_L² = S^′TNt + B^′ATNt + N_pixσ_rn².

The ratio of the noises, or the inverse ratio of the S/N (since the total signal measured is the same in both cases), is

$\displaystyle {\sigma_N\over \sigma_L}$ = $\displaystyle \sqrt{{ S^\prime T Nt + B^\prime A T Nt + NN_{pix} \sigma_{rn}^2 \over S^\prime T Nt + B^\prime A T Nt + N_{pix} \sigma_{rn}^2 }}$

The only difference is in the readout noise term! In the signal- or background-limited regimes, exposures can be added with no loss of S/N. However, if readout noise is significant, then splitting exposures leads to reduced S/N.

$\begin{shaded} \textit{Understand under what circumstances you can split exposur... ...asons why you might want to split an exposure into shorter pieces.} \end{shaded}$

Random errors vs systematic errors

So far, we've been discussing random errors. There is an additional, usually more troublesome, type of errors known as systematic errors. These don't occur randomly but rather are correlated with some, possibly unknown, variable relating to your observations, and can have the effect of not just adding spread around the true value that you are trying to measure, but actually measuring the wrong mean.

EXAMPLE : flat fielding

EXAMPLE : WFPC2 CTE

Note also that in some cases, systematic errors can masquerade as random errors in your test observations (or be missing altogether if you don't take data in exactly the same way), but actually be systematic in your science observations.

EXAMPLE: flat fielding, subpixel QE variations.

Note that error analysis from expected random errors may be the only clue you get to discovering systematic errors. To discover systematic errors, plot residuals vs. everything!

$\begin{shaded} \textit{Understand the distinction between random and systematic ... ...xpected scatter may be critical to discovering systematic errors. } \end{shaded}$

Up: AY535 class notes Previous: Uncertainties and error analysis