next up previous
Next: Intermezzo on Fourier Transforms, Up: AY535 class notes Previous: Class introduction

Properties of light and error analysis

Properties of light

Magnitudes and the count equation

In astronomy, however, magnitudes are often used instead of fluxes:

\begin{displaymath}m = -2.5 \log F + const\end{displaymath}

The flux density for astronomical objects may be specified by a monochromatic magnitude:

\begin{displaymath}F_\lambda = K 10^{-0.4 m(\lambda)}\end{displaymath}

where the coefficient of proportionality, $K$, depends on the definition of photometric system. In the STMAG system, $K=3.6E-9 ergs/cm^2/s/\AA$, which is the flux of Vega at 5500Å. Alternatively, we can write

\begin{displaymath}m_\lambda = -2.5 \log F_\lambda - 21.1\end{displaymath}

In the ABNU system, we have

\begin{displaymath}F_\nu = 3.6\times 10^{-20} erg/cm^2/s/Hz 10^{-0.4 m_\nu}\end{displaymath}

or

\begin{displaymath}m_\nu = -2.5 \log F_\nu - 48.6\end{displaymath}

Related, but slightly different, are integrated magnitude systems; the STMAG and ABMAG integrated systems are defined relative to sources of constant $F_\lambda$ and $F_\nu$ systems, respectively.

\begin{displaymath}m_{ST} = -2.5 \log {\int F_\lambda d\lambda\over \int 3.63\times 10^{-9} d\lambda}\end{displaymath}


\begin{displaymath}m_{AB} = -2.5 \log {\int F_\nu d\nu\over \int 3.63\times 10^{-20} d\nu}\end{displaymath}

The standard UBVRI broadband photometric system, as well as many other magnitude systems, however, are not defined for a constant $F_\lambda$ or $F_\nu$ spectrum; rather, they are defined relative to the spectrum of an A0V star. Most systems are defined (or at least were originally) to have the magnitude of Vega be zero in all bandpasses.

For the broadband UBVRI system, we have

\begin{displaymath}m_{UBVRI} \approx -2.5 \log
{\int_{UBVRI} F_\lambda(object) d\lambda\over
\int_{UBVRI} F_\lambda(Vega) d\lambda}\end{displaymath}

Here is a plot to demonstrate the difference between the different systems. Given a magnitude of an object in the desired band and absolute spectrophotometry of Vega (Hayes 1985, Calibration of Fundamental Stellar Quantities, IAU Sym 111), one can estimate the flux. Alternatively, if one is given just the V magnitude of an object, a spectral energy distribution (SED), and the filter profiles (Bessell 1990, PASP 102, 1181), one can compute the integral of the SED over the V bandpass, determine the scaling by comparing with the integral of the Vega spectrum over the same bandpass, then use the normalized SED to compute the flux in any desired bandpass. Some possibly useful references for SEDs are: Bruzual, Persson, Gunn, & Stryker; Hunter, Christian, & Jacoby; Kurucz).

The photon flux from the source is not exactly what you measure. To get the number of photons that you count in an observation, you need to take into account the area of your photon collector (telescope), photon losses and gains from the Earth's atmosphere, and the efficiency of your collection/detection apparatus. Generally, the astronomical signal can be written

\begin{displaymath}S = T t \int {F_\lambda\over{hc\over\lambda}} q_\lambda a_\lambda d\lambda\end{displaymath}

where $T$ is the telescope collecting area, $t$ is the integration time, $a_\lambda$ is the atmospheric transmission (next lecture) and $q_\lambda$ is the system efficiency.

Usually, however, one doesn't use this information to go backward from S to $F_\lambda$ because it is very difficult to measure all of the terms precisely. Instead, most observations are performed differentially to a set of standard stars, which themselves have been (usually painfully) calibrated. We'll discuss this process in more depth later.

The count equation is used, however, very commonly, for computing the approximate number of photons you will receive from a given source in a given amount of time for a given observational setup. This number is critical to know in order to estimate your expected errors and exposure times in observing proposals, observing runs, etc. Understanding errors in absolutely critical in all sciences, and maybe even more so in astronomy, where objects are faint, photons are scarce, and errors are not at all insignificant.

Errors in photon rates

For a given rate of emitted photons, there's a probability function which gives the number of photons we detect, even assuming 100% detection efficiency, because of statistical uncertainties. In addition, there may also be instrumental uncertainties. Consequently, we now turn to the concepts of probability distributions, with particular interest in the distribution which applies to the detection of photons.

Distributions and characteristics thereof

Some definitions relating to values which characterize a distribution:


\begin{displaymath}\mathrm{mean} \equiv \mu = \int x p(x) dx\end{displaymath}


\begin{displaymath}\mathrm{variance} \equiv \sigma^2 = \int (x-\mu)^2 p(x) dx\end{displaymath}


\begin{displaymath}\mathrm{standard deviation} \equiv \sigma = \sqrt{variance}\end{displaymath}

median : mid-point value.

\begin{displaymath}{\int_{-\infty}^{x_{median}} p(x) dx \over \int_{-\infty}^{\infty} p(x) dx} =
{1\over 2}\end{displaymath}

mode : most probable value

Note that the geometric interpretation of above quantities depends on the nature of the distribution; although we all carry around the picture of the mean and the variance for a Gaussian distribution, these pictures are not applicable to other distributions.

Also, note that there is a difference between the sample mean, variance, etc. and the population quantities. The latter apply to the true distribution, while the former are estimates of the latter from some finite sample of the population. The sample quantities are derived from:


\begin{displaymath}\mathrm{sample mean}: \bar x \equiv {\sum x_i \over N}\end{displaymath}


\begin{displaymath}\mathrm{sample variance} \equiv {\sum(x_i - \bar x)^2 \over N-1}\end{displaymath}


\begin{displaymath}= {\sum x_i^2 - (\sum x_i)^2/N \over N-1}\end{displaymath}

The sample mean and variance approach true mean and variance as N approaches infinity.

Now we consider what distribution is appropriate for the detection of photons. The photon distribution comes from the binomial distribution, which gives the probability of observing the number, $x$, of some possible event, given a total number of events $n$, and the probability of observing the particular event (among all other possibilities) in any single event, $p$:


\begin{displaymath}P(x,n,p) = { n! p^x (1-p)^{n-x} \over x! (n-x)!}\end{displaymath}

For the binomial distribution, one can derive:

\begin{displaymath}\mathrm{mean} \equiv \int x p(x) dx = np\end{displaymath}


\begin{displaymath}\mathrm{variance} \equiv \sigma^2 \equiv \int (x-\mu)^2 p(x) dx = n p (1-p)\end{displaymath}

In the case of detecting photons, $n$ is the total number of photons emitted, and $p$ is the probability of detecting a single photon out out of the total emitted. We don't know either of these numbers! However, we do know that $p « 1$ and we know, or at least we can estimate, the mean number detected:

\begin{displaymath}\mu = np\end{displaymath}

.

In this limit, the binomial distribution asymtotically approaches the Poisson distribution:


\begin{displaymath}P(x,\mu) = { \mu^x \exp^{-\mu} \over x!}\end{displaymath}

From the expressions for the binomial distribution in this limit, the mean of the distribution is $\mu$, and the variance is

\begin{displaymath}\mathrm{variance} = \sum_x[ (x-\mu)^2 p(x,\mu)]\end{displaymath}


\begin{displaymath}= np = \mu\end{displaymath}

This is an important result.

Note that the Poisson distribution is generally the appropriate distribution not only for counting photons, but for any sort of counting experiment.

What does the Poisson distribution look like? Plots for $\mu=2, \mu=10,
\mu=10000$. Note, for large $\mu$, the Poisson distribution is well-approximated around the peak by a Gaussian, or normal distribution:


\begin{displaymath}P(x,\mu,\sigma) = {1 \over \sqrt{2\pi} \sigma} e^{ -{(x-\mu)^2\over 2\sigma^2} }\end{displaymath}

This is important because it allows us to use ``simple'' least squares techniques to fit observational data, because these generally assume normally distributed data. However, beware that in the tails of the distribution, the Poisson distribution can differ significantly from a Gaussian distribution.

The normal distribution is also important because many physical variables seem to be distributed accordingly. This may not be an accident because of the central limit theorem: if a quantity depends on a number of independent random variables with ANY distribution, the quantity itself will be distributed normally. (see statistics texts).

Importance of error distribution analysis

You need to understand the expected errors in your observations in order to:

Confidence levels

For example, say we want to know whether some single point is consistent with expectations, e.g., we see a bright point in multiple measurements of a star, and want to know whether the star flared. Say we have a time sequence with known mean and variance, and we obtain a new point, and want to know whether it is consistent with known distribution.

In the limit where the observed distribution is approximately Gaussian, we can calculate the error function, which is the integral of a gaussian. From this, we can determine the probability of obtaining some point given the known distribution.

Some simple guidelines to keep in mind (the actual situation often requires more sophisticated statistical tests) follow. First, for Gaussian distributions, you can calculate that 68% of the points should fall within plus or minus one sigma from the mean, and 95.3% between plus or minus two sigma from the mean. Thus, if you have a time line of photon fluxes for a star, with N observed points, and a photon noise $\sigma$ on each measurement, you can test whether the number of points deviating more than $2\sigma$ from the mean is much larger than expected. To decide whether any single point is really significantly different, you would want to use a $5\sigma$ rather than a $2\sigma$ criterion. The $5\sigma$ has much higher level of significance of course. On the other hand, if you have far more points in the range $2-4\sigma$ brighter or fainter than you would expect, you may also have a significant detection of intensity variations (provided you really understand your uncertainties on the measurements!).

Noise equation: how do we predict expected errors?

Given an estimate the number of photons expected from an object in an observation, we then can calulate the signal-to-noise ratio:

\begin{displaymath}{S\over N} = {S \over \sqrt{\sigma^2}}\end{displaymath}

which gives the inverse of the predicted fractional error ($N/S$).

Consider an object with photon flux (per unit area and time), $S^\prime$, leading to a signal, $S= S^\prime T t$ where $T$ is the telescope area and $t$ is the exposure time. In the simplest case, the only noise source is Poisson statistics from the source, in which case:

\begin{displaymath}\sigma^2 = S = S^\prime T t\end{displaymath}


\begin{displaymath}{S\over N} = \sqrt{S^\prime T t}\end{displaymath}

In other words, the S/N increases as the square root of the object brightness, telescope area, efficiency, or exposure time.

A more realistic case includes the noise contributed from Poisson statistics of background light (next lecture), $B^\prime$, which has units of flux per area on the sky (also usually given in magnitudes).

\begin{displaymath}B^\prime = \int {B_\lambda\over{hc\over\lambda}} q_\lambda d\lambda\end{displaymath}

The amount of background in our measurement depends on how we choose to make the measurement (how much sky area we observe). Say we just use an aperture with area, $A$.

The total number of photons observed, $F$, is

\begin{displaymath}F = (S^\prime + B^\prime A) T t\end{displaymath}

The variance of the total flux, from Poisson statistics, is:

\begin{displaymath}\sigma_F^2 = (S^\prime T t + B^\prime A T t)\end{displaymath}

since the variances add for object+background. To get the desired signal, we will need to measure separately the total signal and the background signal to estimate:

\begin{displaymath}S \equiv S^\prime T t = F - B\end{displaymath}

where $B$ is some estimate we have obtained of the background level multiplied by the area A ( $B=B^\prime ATt$). The noise in the measurement is

\begin{displaymath}\sigma_S^2 \approx \sigma_F^2 = (S^\prime + B^\prime A) T t\end{displaymath}

where the approximation is accurate if the background is determined to high accuracy. This leads to a common form of the noise equation:

\begin{displaymath}{S\over N} = {S^\prime \over \sqrt{S^\prime + B^\prime A}} \sqrt{T} \sqrt{t}\end{displaymath}

In the signal-limited case, $S^\prime»B^\prime A$, we get

\begin{displaymath}{S\over N} = \sqrt{S^\prime t T}.\end{displaymath}

In the background limited case, $B^\prime A»S^\prime$, and

\begin{displaymath}{S\over N} = {S^\prime\over \sqrt{B^\prime A}} \sqrt{t T}.\end{displaymath}

S/N drops faster when you're background limited. This illustrates the importance of dark-sky sites, and also the importance of good image quality.

Consider two telescopes of collecting area, $T_1$ and $T_2$. If we observe for the same exposure time on each and want to know how much fainter we can see with the larger telescope at a given S/N, we find:

\begin{displaymath}S_2 = {T_1\over T_2} S_1\end{displaymath}

for the signal-limited case, but

\begin{displaymath}S_2 = \sqrt{T_1\over T_2} S_1\end{displaymath}

for the background-limited case.

Instrumental noise

In addition to the errors from Poisson statistics (statistical noise), there may be additional terms from instrumental errors. A prime example of this using CCD detectors is readout noise, which is additive noise (with zero mean) which comes from the detector, and is independent of signal level. For a detector whose readout noise is characterized by $\sigma_{rn}$,

\begin{displaymath}{S\over N} = {S^\prime T t \over \sqrt{S^\prime T t +B^\prime A T t +
\sigma_{rn}^2}} \end{displaymath}

if a measurement is made in a single pixel. If an object is spread over $N_{pix}$ pixels, then

\begin{displaymath}{S\over N} = {S^\prime T t \over \sqrt{S^\prime T t +B^\prime A T t +
N_{pix}\sigma_{rn}^2}} \end{displaymath}

For large $\sigma_{rn}$, the behavior is the same as the background limited case. This makes it clear that if you have readout noise, image quality (and/or proper optics to keep an object from covering too many pixels) is important for maximizing S/N. It is also clear that it is critical to have minimum read-noise for low background applications (e.g., spectroscopy).

There are other possible additional terms in the noise equation, arising from things like dark current, digitization noise, errors in sky determination, errors from photometric technique, etc.; we'll discuss some of these later on.

Error propagation

Consider what happens if you have several known quantities with known error distributions and you combine these into some new quantity: we wish to know what the error is in the new quantity.

\begin{displaymath}x = f(u,v,....)\end{displaymath}

The question is what is $\sigma_x$ if we know $\sigma_u$, $\sigma_v$, etc.?

As long as errors are small:

\begin{displaymath}x_i - <x> \sim (u_i - <u>) \left({{\partial{x}}\over{\partial...
...(v_i-<v>) \left({{\partial{x}}\over{\partial {v}}}\right) + ...\end{displaymath}


\begin{displaymath}\sigma_x^2 = lim (N->\infty) {1\over N} \sum (x_i - <x>)^2\end{displaymath}


\begin{displaymath}= \sigma_u^2 \left({{\partial{x}}\over{\partial {u}}}\right)^...
...ial{x}}\over{\partial {u}}} {{\partial{x}}\over{\partial {v}}} \end{displaymath}

The last term is the covariance, which relates to whether errors are /it correlated.

\begin{displaymath}\sigma_{uv}^2 = lim (n->\infty) {1\over N} \sum (u_i-<u>) (v_i-<v>)\end{displaymath}

If errors are uncorrelated, then $\sigma_{uv} = 0$ because there's equal chance of getting opposite signs on $v_i$ for any given $u_i$.

Examples:

Distribution of resultant errors

When propagating errors, even though you can calculate the variances in the new variables, the distribution of errors in the new variables is not, in general, the same as the distribution of errors in the original variables, e.g. if errors in individual variables are normally distributed, errors in output variable is not necessarily.

When two variables are added, however, the output is normally distributed.

Determining sample parameters: averaging measurements

We've covered errors in single measurements. Next we turn to averaging measurements. say we have multiple observations and want the best estimate of the mean and variance of the population, e.g. multiple measurements of stellar brightness. Here we'll define the best estimate of the mean as the value which maximizes the likelihood that our estimate equals the true parent population mean.

For equal errors, this estimate just gives our normal expression for the sample mean:

\begin{displaymath}\mu = {\sum x_i\over N}\end{displaymath}

Using error propagation, the estimate of the error in the sample mean is given by:

\begin{displaymath}\sigma_{\mu}^2 = \sum {\sigma_i^2\over N^2} = {\sigma^2\over N}\end{displaymath}

But what if errors on each observation aren't equal, say for example we have observations made with several different exposure times? Then we determine the sample mean using a:

\begin{displaymath}\mathrm{weighted mean} = {\sum {x_i\over\sigma_i^2} \over \sum {1\over\sigma_i^2}}\end{displaymath}

and the estimated error in this value is given by:

\begin{displaymath}\sigma_\mu^2 = \sum {{\sigma_i^2\over \sigma_i^4} \over ({\sum {1\over \sigma_i^2}})^2} = {1 \over \sum{1\over\sigma_i^2}} \end{displaymath}

where the $\sigma_i$'s are individual weights.

This is fine, but how do we go about choosing the weights, $\sigma_i$? We know we can estimate $\sigma$ using Poisson statistics for a given count rate, but remember that this is a sample variance (based on a single observation!) not the true variance. This can lead to biases.

Consider observations of a star made on three nights, with measurements of 40, 50, and 60 photons. It's clear that the mean observation is 50 photons. However, beware of the being trapped by your error estimates. From each observation alone, you would estimate errors of $\sqrt{40}$, $\sqrt{50}$, and $\sqrt{60}$. If you plug these error estimates into a computation of the weighted mean, you'll get a mean rate of 48.64!

Using the individual estimates of the variances, we'll bias values to lower rates, since these will have estimated higher S/N.

Note that it's pretty obvious from this example that you should just weight all observations equally. However, note that this certainly isn't always the right thing to do. For example, consider the situation in which you have three exposures of different exposure times. Here you clearly want to give the longer exposures higher weight. In this case, you again don't want to use the individual error estimates or you'll introduce a bias. There is a simple solution here also: just weight the observations by the exposure time. This works fine for Poisson errors (variances proportional to count rate), but not if there are instrumental errors as well which don't scale with exposure time. For example, the presence of read noise can have this effect. With read noise, the longer exposures should be weighted even higher than expected for the exposure time ratios. The only way to properly average measurements in this case is to estimate a sample mean, then use this value scaled to the appropriate exposure times as the input for the Poisson errors.

Can you split exposures?

When observing, one must consider the question of how long individual exposures should be. There are several reasons why one might imagine that it is nicer to have a sequence of shorter exposures rather than one single longer exposure (e.g., tracking, monitoring of photometric conditions, cosmic ray rejection), so we need to consider whether doing this results in poorer S/N.

Consider the object with photon flux $S^\prime$, background surface brightness $B^\prime$, and detector with readout noise $\sigma_{rn}$. A single short exposure of time $t$ has a variance:

\begin{displaymath}\sigma_S^2 = S^\prime T t + B^\prime A T t + N_{pix} \sigma_{rn}^2\end{displaymath}

If $N$ exposures are summed, the resulting variance will be

\begin{displaymath}\sigma_N^2 = N \sigma_S^2\end{displaymath}

If a single long exposure of length $Nt$ is taken, we get

\begin{displaymath}\sigma_L^2 = S^\prime T Nt + B^\prime A T Nt + N_{pix} \sigma_{rn}^2.\end{displaymath}

The ratio of the noises, or the inverse ratio of the $S/N$ (since the total signal measured is the same in both cases), is

\begin{displaymath}{\sigma_N\over \sigma_L} = \sqrt{
S^\prime T Nt + B^\prime ...
...over
S^\prime T Nt + B^\prime A T Nt + N_{pix} \sigma_{rn}^2 }\end{displaymath}

In the signal- or background-limited regimes, exposures can be added with no loss of S/N. However, if readout noise is significant, then splitting exposures leads to reduced S/N.

Random errors vs systematic errors

So far, we've been discussing random errors. There is an additional, usually more troublesome, type of errors known as systematic errors. These don't occur randomly but rather are correlated with some, possibly unknown, variable relating to your observations.

EXAMPLE : flat fielding

EXAMPLE : WFPC2 CTE

Note also that in some cases, systematic errors can masquerade as random errors in your test observations, but actually be systematic in your science observations

EXAMPLE: flat fielding, subpixel QE variations.

Note that error analysis from expected random errors may be the only clue you get to discovering systematic errors. To discover systematic errors, plot residuals vs. everything!


next up previous
Next: Intermezzo on Fourier Transforms, Up: AY535 class notes Previous: Class introduction
Rene Walterbos 2003-04-14