next up previous
Next: Effects of the earth's Up: AY535 class notes Previous: Class introduction

Subsections

Properties of light, magnitudes, errors and error analysis

(Entire section in one PDF file).

Light

(References: Birney chapter 5, Sutton 2.1-2.5, Rieke 1.1)


\begin{shaded}
\textit{Have a complete understanding of the difference between i...
...velength or per unit frequency and how to
convert between the two.}
\end{shaded}

Magnitudes and photometric systems

In astronomy, however, magnitude units are often used instead of measuring the basic quantities in energy or photon flux. Magnitudes are a dimensionless quantities, and are related to flux (same holds for surface brightness or luminosity) by:

m = - 2.5 log$\displaystyle {F\over F_0}$

or

m = - 2.5 log F + 2.5 log F0

where the coefficient of proportionality, F0, depends on the definition of photometric system; the quantity -2.5 log F0 may be referred to as the photometric system zeropoint. Inverting, one gets:

F = F010-0.4m

Just as fluxes can be represented in magnitude units, flux densities can be specified by monochromatic magnitudes:

F$\scriptstyle \lambda$ = F0($\displaystyle \lambda$)10-0.4m($\scriptstyle \lambda$)

although spectra are more often given in flux units than in magnitude units. Note that it is possible that F0 is a function of wavelength!

Note that since magnitudes are logarithmic, the difference between magnitudes corresponds to a ratio of fluxes; ratios of magnitudes are generally unphysical! If one is just doing relative measurements of brightness between objects, this can be done without knowledge of F0 (or, equivalently, the system zeropoint); objects that differ in brightness by $ \Delta$M mag have the same ratio of brightness ( 10-0.4$\scriptstyle \Delta$M) regardless of what photometric system they are in. The photometric system definitions and zeropoints are only needed when converting between calibrated magnitudes and fluxes. Note that this means that if one references the brightness of one object relative to that of another, a magnitude system can be set up relative the brightness of the reference source. However, the utility of a system when doing astrophysics generally requires an understanding of the actual fluxes.


\begin{shaded}
\textit{Know how magnitudes are defined, and that relative fluxes...
... be represented as magnitudes independent of the magnitude system.}
\end{shaded}

There are three main types of magnitude systems in use in astronomy. We start by describing the two simpler ones:the STMAG and the ABNU mag system. In these simple system, the reference flux is just a constant value in F$\scriptstyle \lambda$ or F$\scriptstyle \nu$. However, these are not always the most widely used systems in astronomy, because no natural source exists with a flat spectrum.

In the STMAG system, F0,$\scriptstyle \lambda$ = 3.63E - 9ergs/cm2/s/Å, which is the flux of Vega at 5500Å; hence a star of Vega's brightness at 5500Å is defined to have m=0. Alternatively, we can write

mSTMAG = - 2.5 log F$\scriptstyle \lambda$ - 21.1

(for F$\scriptstyle \lambda$ in cgs units).

In the ABNU system, things are defined for F$\scriptstyle \nu$ instead of F$\scriptstyle \lambda$, and we have

F0,$\scriptstyle \nu$ = 3.63 x 10-20erg/cm2/s/Hz10-0.4m$\scriptscriptstyle \nu$

or

mABNU = - 2.5 log F$\scriptstyle \nu$ - 48.6

(for F$\scriptstyle \nu$ in cgs units). Again, the constant comes from the flux of Vega.

Usually, when using magnitudes, people are talking about flux integrated over a spectral bandpass. In this case, F and F0 refer to fluxes integrated over the bandpass. The STMAG and ABMAG integrated systems are defined relative to sources of constant F$\scriptstyle \lambda$ and F$\scriptstyle \nu$ systems, respectively.

mSTMAG = - 2.5 log$\displaystyle {\int F_\lambda \lambda d\lambda\over \int 3.63\times 10^{-9} \lambda d\lambda}$

(the factor of $ \lambda$ comes in for photon counting detectors).

mABNU = - 2.5 log$\displaystyle {\int (F_\nu/\nu) d\nu\over \int (3.63\times 10^{-20}/\nu) d\nu}$

(where the units are implicitly cgs with these numerical fluxes for Vega).

Note that these systems differ by more than a constant, because one is defined by units of F$\scriptstyle \lambda$ and the other by F$\scriptstyle \nu$, so the difference between the systems is a function of wavelength. They are defined to be the same at 5500Å. (Question: what's the relation between mSTMAG and mABNU?)

Note also that, using magnitudes, the measured magnitude is nearly independent of bandpass width (a broader bandpass does not imply a brighter (smaller) magnitude), which is not the case for fluxes!

The standard UBVRI broadband photometric system, as well as several other magnitude systems, however, are not defined for a constant F$\scriptstyle \lambda$ or F$\scriptstyle \nu$ spectrum; rather, they are defined relative to the spectrum of an A0V star. Most systems are defined (or at least were originally) to have the magnitude of Vega be zero in all bandpasses (VEGAMAGS); if you ever get into this in detail, note that this is not exactly true for the UBVRI system.

For the broadband UBVRI system, we have

mUBVRI $\displaystyle \approx$ -2.5 log$\displaystyle {\int_{UBVRI} F_\lambda(object) \lambda d\lambda\over
\int_{UBVRI} F_\lambda(Vega) \lambda d\lambda}$

(as above, the factor of $ \lambda$ comes in for photon counting detectors).

Here is a plot to demonstrate the difference between the different systems.

Why do the different systems exist? While it seems that STMAG and ABNU systems are more straightforward, in practice it is difficult to measure absolute fluxes, and much easier to measure relative fluxes between objects. Hence, historically observations were tied to observations of Vega (or to stars which themselves were tied to Vega), so VEGAMAGs made sense, and the issue of determining physical fluxes boiled down to measuring the physical flux of Vega. Today, in some cases, it may be more accurate to measure the absolute throughput of an instrumental system, and using STMAG or ABNU makes more sense.


\begin{shaded}
\textit{Know that there are several different magnitude systems i...
...important to
know what the magnitude system is, and when it isn't.}
\end{shaded}

Colors

Working in magnitudes, the difference in magnitudes between different bandpasses (called the color index, or simply, color) is related to the flux ratio between the bandpasses, i.e., the color. In the UBVRI system, the difference between magnitudes gives the ratio of the fluxes in different bandpasses relative to the ratio of the fluxes of an A0V star in the different bandpasses (for VEGAMAG). Note the typical colors of astronomical objects - which are different for the different photometric systems!

Which is closer to the UBVRI system, STMAG or ABNU?

What would typical colors be in an STMAG or ABNU system?


\begin{shaded}
\textit{Understand how colors are represented by a difference
in ...
...rlying spectrum, with
differences for different magnitude systems.}
\end{shaded}

Magnitude-flux conversion

How would one go about converting Vega-based magnitudes to fluxes? Roughly, just look up the flux of Vega at the center of the passband ( e.g., here (from Bessell et al 1998 or here (see references within), or here; note, however, if the spectrum of the object differs from that of Vega, this won't be perfectly accurate. Given UBVRI magnitudes of an object in the desired band, filter profiles (e.g. Bessell 1990, PASP 102,1181), and absolute spectrophotometry of Vega (e.g., Bohlin & Gilliland 2004, AJ 127, 3508, one can determine the flux.

If one wanted to estimate the flux of some object in arbitrary bandpass given just the V magnitude of an object (a common situation used when trying to predict exposures times, see below), this can be done if an estimate of the spectral energy distribution (SED) can be made (e.g., from the spectral type, or more generally, the stellar parameters Teff, log g, and metallicity). Given the filter profiles, one can compute the integral of the SED over the V bandpass, determine the scaling by comparing with the integral of the Vega spectrum over the same bandpass, then use the normalized SED to compute the flux in any desired bandpass. Some possibly useful references for SEDs are: Pickles atlas, MILES library, Bruzual, Persson, Gunn, & Stryker; Hunter, Christian, & Jacoby; Kurucz).

Things are certainly simpler in the ABNU or STMAG system, and there has been some movement in this direction: the STScI gives STMAG calibrations for HST instruments, and the SDSS photometric system is close to an ABNU system.

Note, however, that even when the systems are conceptually well defined, determining the absolute calibration of any photometric system is very difficult in reality, and determining absolute fluxes to the 1% level is very challenging.

As a separate note on magnitudes themselves, note that some people, in particular, the SDSS imaging survey, have adopted a modified type of magnitudes, called asinh magnitudes, which behave like normal (also known as Pogson) magnitude for brighter objects, but have different behavior for very faint objects (near the detection threshold); see Lupton, Gunn, & Szalay 1999 AJ 118, 1406 for details.

Observed fluxes and the count equation

What if you are measuring flux with an actual instrument, i.e. counting photons? The intrinsic photon flux from the source is not trivial to determine from the observed photon flux, i.e., the number of photons that you count. The observed flux depends on the area of your photon collector (telescope), photon losses and gains from the Earth's atmosphere (which changes with conditions), and the efficiency of your collection/detection apparatus (which can change with time). Generally, the astronomical signal (which might be a flux or a surface brightness, depending on whether the object is resolved) can be written

S = Tt$\displaystyle \int$$\displaystyle {F_\lambda\over{hc\over\lambda}}$a$\scriptstyle \lambda$tel$\scriptstyle \lambda$inst$\scriptstyle \lambda$filt$\scriptstyle \lambda$det$\scriptstyle \lambda$d$\displaystyle \lambda$ $\displaystyle \equiv$ TtS'

where S is the observed photon flux (the signal), T is the telescope collecting area, t is the integration time, a$\scriptstyle \lambda$ is the atmospheric transmission (more later) and the other terms refer to the efficiency of various components of the system (telescope, instrument, filter, detector). S' is an observed flux rate, i.e. with all of the real details of the observing system included. I refer to this as the count equation.

Usually, however, one doesn't use this information to go backward from S to F$\scriptstyle \lambda$ because it is very difficult to measure all of the terms precisely, and some of them (e.g. a, and perhaps some of the system efficiencies) are time-variable; a is also spatially variable. Instead, most observations are performed differentially to a set of other stars of known brightness. If the stars of known brightness are observed in the same observation, then the atmospheric term is (approximately) the same for all stars; this is known as differential photometry. From the photon flux of the object with known brightness, one could determine an ``exposure efficiency'' or an ``effective area'' for this exposure. Equivalently, and more commonly, one can calculate an instrumental magnitude:

m = - 2.5 log S/t

(i.e., normalize by the exposure time to get counts/sec, although this is not strictly necessary) and then determine the zeropoint that needs to be added to give the calibrated magnitude (M, make sure you recognize that this is still an apparent magnitude!):

M = m + z

Note that in the real world, one has to also consider sensitivity differences (e.g., slightly different filter profiles) between a given experimental setup and the setup used to measure the reference brightnesses, so this is only a first approximation (i.e., the zeropoint may be different for different stars with different spectral properties)! If using instrumental mags that include exposure time normalization (as above), the zeropoint gives the magnitude of a star that will give 1 count/second.

If there are no stars of known brightness in the same observation, then calibration must be done against stars in other observations. This then requires that the different effects of the Earth's atmosphere in different locations in the sky be accounted for. This is known as all-sky, or absolute, photometry. To do this requires that the sky is ``well-behaved", i.e. one can accurately predict the atmospheric throughput as a function of position. This requires that there be no clouds, i.e. photometric weather. Differential photometry can be done in non-photometric weather, hence it is much simpler! Of course, it is always possible to obtain differential photometry and then go back later and obtain absolute photometry of the reference stars.

Of course, at some point, someone needs to figure out what the fluxes of the calibrating stars really are, and this requires understanding all of the terms in the count equation. It is challenging, and often, absolute calibration of a system is uncertain to a couple of percent!

It is also common to stop with differential photometry, even if there are no stars of known brightness in your field, if you are studying variable objects, i.e. where you are just interested in the change in brightness of an object, not the absolute flux level. In this case, one only has to reference the brightness of the target object relative some other object (or ensemble of objects) in the field that are non-variable.

While the count equation isn't usually used for calibration, it is very commonly used for computing the approximate number of photons you will receive from a given source in a given amount of time for a given observational setup. This number is critical to know in order to estimate your expected errors and exposure times in observing proposals, observing runs, etc. Understanding errors in absolutely critical in all sciences, and maybe even more so in astronomy, where objects are faint, photons are scarce, and errors are not at all insignificant. The count equation provides the basis for exposure time calculator (ETC) programs, because it gives an expectation of the number of photons that will be received by a given instrument as a function of exposure time. As we will see shortly, this provides the information we need to calculate the uncertainty in the measurement as a function of exposure time.


\begin{shaded}
\textit{Understand the count equation and the terms in it. Unders...
... brightness. Know
what instrumental magnitudes and zeropoints are.}
\end{shaded}

Uncertainties in photon rates

For a given rate of emitted photons, there's a probability function which gives the number of photons we detect, even assuming 100% detection efficiency, because of statistical uncertainties. In addition, there may also be instrumental uncertainties. Consequently, we now turn to the concepts of probability distributions, with particular interest in the distribution which applies to the detection of photons.

Distributions and characteristics thereof

Some definitions relating to values which characterize a distribution:

mean $\displaystyle \equiv$ $\displaystyle \mu$ = $\displaystyle \int$xp(x)dx

variance $\displaystyle \equiv$ $\displaystyle \sigma^{2}_{}$ = $\displaystyle \int$(x - $\displaystyle \mu$)2p(x)dx

standarddeviation $\displaystyle \equiv$ $\displaystyle \sigma$ = $\displaystyle \sqrt{{variance}}$

median : mid-point value.

$\displaystyle {\int_{-\infty}^{x_{median}} p(x) dx \over \int_{-\infty}^{\infty} p(x) dx}$ = $\displaystyle {1\over 2}$

mode : most probable value

Note that the geometric interpretation of above quantities depends on the nature of the distribution; although we all carry around the picture of the mean and the variance for a Gaussian distribution, these pictures are not applicable to other distributions, but the quantities are still well-defined.

Also, note that there is a difference between the sample mean, variance, etc. and the population quantities. The latter apply to the true distribution, while the former are estimates of the latter from some finite sample (N measurements) of the population. The sample quantities are derived from:

sample mean : $\displaystyle \bar{x}$ $\displaystyle \equiv$ $\displaystyle {\sum x_i \over N}$

sample variance $\displaystyle \equiv$ $\displaystyle {\sum(x_i - \bar x)^2 \over N-1}$ = $\displaystyle {\sum x_i^2 - (\sum x_i)^2/N \over N-1}$

The sample mean and variance approach the true mean and variance as N approaches infinity. But note, especially for small samples, your estimate of the mean and variance may differ from their true (population) values (more below)!

Now we consider what distribution is appropriate for the detection of photons. The photon distribution can be derived from the binomial distribution, which gives the probability of observing the number, x, of some possible event, given a total number of events n, and the probability of observing the particular event (among all other possibilities) in any single event, p, under the assumption that all events are independent of each other:

P(x, n, p) = $\displaystyle {n! p^x (1-p)^{n-x} \over x! (n-x)!}$

For the binomial distribution, one can derive:

mean $\displaystyle \equiv$ $\displaystyle \int$xp(x)dx = np

variance $\displaystyle \equiv$ $\displaystyle \sigma^{2}_{}$ $\displaystyle \equiv$ $\displaystyle \int$(x - $\displaystyle \mu$)2p(x)dx = np(1 - p)


\begin{shaded}
\textit{Understand the concept of probability distribution functi...
...he difference between population
quantities and sample quantities.}
\end{shaded}

The Poisson distribution

In the case of detecting photons, n is the total number of photons emitted, and p is the probability of detecting a photon during our observation out of the total emitted. We don't know either of these numbers! However, we do know that p < < 1 and we know, or at least we can estimate, the mean number detected:

$\displaystyle \mu$ = np

.

In this limit, the binomial distribution asymtotically approaches the Poisson distribution:

P(x,$\displaystyle \mu$) = $\displaystyle {\mu^x \exp^{-\mu} \over x!}$

From the expressions for the binomial distribution in this limit, the mean of the distribution is $ \mu$, and the variance is

variance = $\displaystyle \sum_{x}^{}$[(x - $\displaystyle \mu$)2p(x,$\displaystyle \mu$)]

variance = np = $\displaystyle \mu$

This is an important result.

Note that the Poisson distribution is generally the appropriate distribution not only for counting photons, but for any sort of counting experiment where a series of events occurs with a known average rate, and are independent of time since the last event.

What does the Poisson distribution look like? Plots for $ \mu$ = 2,$ \mu$ = 10,$ \mu$ = 10000.


\begin{shaded}
\textit{
Understand the Poisson distribution and when it applies....
...e Poisson distribution
is related to the mean of the distribution.}
\end{shaded}

The normal, or Gaussian, distribution

Note, for large $ \mu$, the Poisson distribution is well-approximated around the peak by a Gaussian, or normal distribution:

P(x,$\displaystyle \mu$,$\displaystyle \sigma$) = $\displaystyle {1 \over \sqrt{2\pi} \sigma}$e$\scriptstyle {-{(x-\mu)^2\over 2\sigma^2}}$

This is important because it allows us to use ``simple'' least squares techniques to fit observational data, because these generally assume normally distributed data. However, beware that in the tails of the distribution, and at low mean rates, the Poisson distribution can differ significantly from a Gaussian distribution. In these cases, least-squares may not be appropriate to model observational data; instead, one might need to consider maximum likelihood techniques instead.

The normal distribution is also important because many physical variables seem to be distributed accordingly. This may not be an accident because of the central limit theorem: if a quantity depends on a number of independent random variables with ANY distribution, the quantity itself will be distributed normally (see statistics texts). In observational techniques, we encounter the normal distribution because one important source of instrumental noise, readout noise, is distributed normally.


\begin{shaded}
\textit{
Know what a Gaussian (normal) distribution is, including...
...nces the
Poisson distribution is similar to a normal distribution.}
\end{shaded}

Importance of error distribution analysis

You need to understand the expected uncertainties in your observations in order to:

Confidence levels

For example, say we want to know whether some single point is consistent with expectations, e.g., we see a bright point in multiple measurements of a star, and want to know whether the star flared. Say we have a time sequence with known mean and variance, and we obtain a new point, and want to know whether it is consistent with known distribution?

If the form of the probability distribution is known, then you can calculate the probability of getting a measurement more than some observed distance from the mean. In the case where the observed distribution is Gaussian (or approximately so), this is done using the error function (sometimes called erf(x)), which is the integral of a gaussian from some starting value.

Some simple guidelines to keep in mind follow (the actual situation often requires more sophisticated statistical tests). First, for Gaussian distributions, you can calculate that 68% of the points should fall within plus or minus one sigma from the mean, and 95.3% between plus or minus two sigma from the mean. Thus, if you have a time line of photon fluxes for a star, with N observed points, and a photon noise $ \sigma$ on each measurement, you can test whether the number of points deviating more than 2$ \sigma$ from the mean is much larger than expected. To decide whether any single point is really significantly different, you might want to use more stringent criterion, e.g., 5$ \sigma$ rather than a 2$ \sigma$ criterion; a 5$ \sigma$ has much higher level of significance. On the other hand, if you have far more points in the range 2 - -4$ \sigma$ brighter or fainter than you would expect, you may also have a significant detection of intensity variations (provided you really understand your uncertainties on the measurements!).

Also, note that your observed distribution should be consistent with your uncertainty estimates given the above guidelines. If you have a whole set of points, that all fall within 1$ \sigma$ of each other, something is wrong with your uncertainty estimates (or perhaps your measurements are correlated with each other)!

For a series of measurements, one can calculate the $ \chi^{2}_{}$ statistic, and determine how probable this value is, given the number of points.

$\displaystyle \chi^{2}_{}$ = $\displaystyle \sum$$\displaystyle {(observed(i) - model(i) )^2 \over \sigma_i^2}$

For a given value of $ \chi^{2}_{}$ and number of measurements, one can calculate the probability that the measurements are consistent with the model (and the uncertainties are correctly predicted). A quick estimate of the consistency of the model with the observed data points can be made using reduced $ \chi^{2}_{}$, defined as $ \chi^{2}_{}$ divided by the degrees of freedom (number of data points minus number of parameters), which should be near unity of the measurements are consistent wth the model.

Noise equation: how do we predict expected uncertainties?

Signal-to-noise

Astronomers often describe uncertainties in terms of the fractional error, e.g. the amplitude of the uncertainty divided by the amplitude of the quantity being measured; often, the inverse of this, referred to as the signal-to-noise ratio is used. Given an estimate the number of photons expected from an object in an observation, we can calulate the signal-to-noise ratio:

$\displaystyle {S\over N}$ = $\displaystyle {S \over \sqrt{\sigma^2}}$

which is the inverse of the predicted fractional error (N/S).

Consider an object with observed photon flux (per unit area and time, e.g. from the count equation above), S$\scriptstyle \prime$, leading to a signal, S = S$\scriptstyle \prime$Tt where T is the telescope area and t is the exposure time. In the simplest case, the only noise source is Poisson statistics from the source, in which case:

$\displaystyle \sigma^{2}_{}$ = S = S$\scriptstyle \prime$Tt

$\displaystyle {S\over N}$ = $\displaystyle \sqrt{{S}}$ = $\displaystyle \sqrt{{S^\prime T t}}$

In other words, the S/N increases as the square root of the object brightness, telescope area, efficiency, or exposure time. Note that S is directly observable, so one can calculate the S/N for an observation without knowing the telescope area or exposure time! We've just broken S down so that you can specifically see the dependence on telescope area and/or exposure time.


\begin{shaded}
\textit{
Understand the concept of S/N and fractional error. Know how
S/N depends on the signal for the Poisson-limited case.}
\end{shaded}

Background noise

A more realistic case includes the noise contributed from Poisson statistics of ``background'' light (more on the physical nature of this later), B$\scriptstyle \prime$, which has units of flux per area on the sky (i.e. a surface brightness); note that this is also usually given in magnitudes.

B$\scriptstyle \prime$ = $\displaystyle \int$$\displaystyle {B_\lambda\over{hc\over\lambda}}$q$\scriptstyle \lambda$d$\displaystyle \lambda$

The amount of background in our measurement depends on how we choose to make the measurement (how much sky area we observe). Say we just use an aperture with area, A, so the total observed background counts is

AB = AB$\scriptstyle \prime$Tt

Again, B$\scriptstyle \prime$Tt is the directly observable quantity, but we split it into the quantities on which it depends to understand what factors are important in determining S/N.

The total number of photons observed, O, is

O = S + AB = (S$\scriptstyle \prime$ + AB$\scriptstyle \prime$)Tt

The variance of the total observed counts, from Poisson statistics, is:

$\displaystyle \sigma_{O}^{2}$ = O = S + AB = (S$\scriptstyle \prime$ + AB$\scriptstyle \prime$)Tt

To get the desired signal from the object only , we will need to measure separately the total signal and the background signal to estimate:

S $\displaystyle \equiv$ S$\scriptstyle \prime$Tt = O - A < B >

where < B > is some estimate we have obtained of the background surface brightness. The noise in the measurement is

$\displaystyle \sigma_{S}^{2}$ $\displaystyle \approx$ $\displaystyle \sigma_{O}^{2}$ = S + AB = (S$\scriptstyle \prime$ + AB$\scriptstyle \prime$)Tt

where the approximation is accurate if the background is determined to high accuracy, which one can do if one measures the background over a large area, thus getting a large number of background counts (with correspondingly small fractional error in the measurement).

This leads to a common form of the noise equation:

$\displaystyle {S\over N}$ = $\displaystyle {S \over \sqrt{S + A B}}$

Breaking out the dependence on exposure time and telescope area, this is:

$\displaystyle {S\over N}$ = $\displaystyle {S^\prime \over \sqrt{S^\prime + A B^\prime }}$$\displaystyle \sqrt{{T}}$$\displaystyle \sqrt{{t}}$

In the signal-limited case, S$\scriptstyle \prime$ > > B$\scriptstyle \prime$A, we get

$\displaystyle {S\over N}$ = $\displaystyle \sqrt{{S}}$ = $\displaystyle \sqrt{{S^\prime t T}}$

In the background limited case, B$\scriptstyle \prime$A > > S$\scriptstyle \prime$, and

$\displaystyle {S\over N}$ = $\displaystyle {S\over \sqrt{AB}}$ = $\displaystyle {S^\prime\over \sqrt{B^\prime A}}$$\displaystyle \sqrt{{t T}}$

As one goes to fainter objects, the S/N drops, and it drops faster when you're background limited. This illustrates the importance of dark-sky sites, and also the importance of good image quality.

Consider two telescopes of collecting area, T1 and T2. If we observe for the same exposure time on each and want to know how much fainter we can see with the larger telescope at a given S/N, we find:

S2 = $\displaystyle {T_1\over T_2}$S1

for the signal-limited case, but

S2 = $\displaystyle \sqrt{{T_1\over T_2}}$S1

for the background-limited case.


\begin{shaded}
\textit{
Understand how Poisson uncertainties in the background c...
...lity because
the amount of background included in the measurement.}
\end{shaded}

Instrumental noise

In addition to the uncertainties from Poisson statistics (statistical noise), there may be additional terms from instrumental uncertainties. A common example of this that is applicable for CCD detectors is readout noise, which is additive noise (with zero mean!) that comes from the detector and is independent of signal level. For a detector whose readout noise is characterized by $ \sigma_{{rn}}^{}$,

$\displaystyle {S\over N}$ = $\displaystyle {S \over \sqrt{S + B A_{pix} + \sigma_{rn}^2}}$

if a measurement is made in a single pixel. If an object is spread over Npix pixels, then

$\displaystyle {S\over N}$ = $\displaystyle {S \over \sqrt{S + B A + N_{pix}\sigma_{rn}^2}}$

For large $ \sigma_{{rn}}^{}$, the behavior is the same as the background limited case. This makes it clear that if you have readout noise, image quality (and/or proper optics to keep an object from covering too many pixels) is important for maximizing S/N. It is also clear that it is critical to have minimum read-noise for low background applications (e.g., spectroscopy).

There are other possible additional terms in the noise equation, arising from things like dark current, digitization noise, uncertainties in sky determination, uncertainties from photometric technique, etc. (we'll discuss some of these later on), but in most applications, the three sources discussed so far - signal noise, background noise, and readout noise - are the dominant noise sources.

Note the applications where one is likely to be signal dominated, background dominated, and readout noise dominated.


\begin{shaded}
\textit{
Understand readout noise and how it represented by a nor...
...nces readout noise is an
important contributor to the total noise.}
\end{shaded}

Error propagation

Why are the three uncertainty terms in the noise equation added in quadrature? The measured quantity (S) is a sum of S + B - < B > + < R >, where < R > = 0 since readout noise has zero mean. The uncertainty in a summed series is computed by addding the individual uncertainties in quadature; in the equation above, we have neglected the uncertainty in < B >. To understand why they add in quadrature, let's consider general error propagation.

Now that we know how to estimate uncertainties of observed count rates, let's say we want to make some calculations (e.g., calibration, unit conversion, averaging, conversion to magnitudes, calculation of colors, etc.) using these observations: we need to be able to estimate the uncertainties in the calculated quantities that depend on our measured quantities.

Consider what happens if you have several known quantities with known error distributions and you combine these into some new quantity: we wish to know what the uncertainty is in the new quantity.

x = f (u, v,....)

The question is what is $ \sigma_{x}^{}$ if we know $ \sigma_{u}^{}$, $ \sigma_{v}^{}$, etc.?

As long as uncertainties are small:

xi - < x > $\displaystyle \sim$ (ui - < u > )$\displaystyle \left(\vphantom{{{\partial{x}}\over{\partial {u}}}}\right.$$\displaystyle {{\partial{x}}\over{\partial {u}}}$$\displaystyle \left.\vphantom{{{\partial{x}}\over{\partial {u}}}}\right)$ + (vi - < v > )$\displaystyle \left(\vphantom{{{\partial{x}}\over{\partial {v}}}}\right.$$\displaystyle {{\partial{x}}\over{\partial {v}}}$$\displaystyle \left.\vphantom{{{\partial{x}}\over{\partial {v}}}}\right)$ + ...

$\displaystyle \sigma_{x}^{2}$ = lim(N - > $\displaystyle \infty$)$\displaystyle {1\over N}$$\displaystyle \sum$(xi - < x > )2

= $\displaystyle \sigma_{u}^{2}$$\displaystyle \left(\vphantom{{{\partial{x}}\over{\partial {u}}}}\right.$$\displaystyle {{\partial{x}}\over{\partial {u}}}$$\displaystyle \left.\vphantom{{{\partial{x}}\over{\partial {u}}}}\right)^{2}_{}$ + $\displaystyle \sigma_{v}^{2}$$\displaystyle \left(\vphantom{{{\partial{x}}\over{\partial {v}}}}\right.$$\displaystyle {{\partial{x}}\over{\partial {v}}}$$\displaystyle \left.\vphantom{{{\partial{x}}\over{\partial {v}}}}\right)^{2}_{}$ +2$\displaystyle \sigma_{{uv}}^{2}$$\displaystyle {{\partial{x}}\over{\partial {u}}}$$\displaystyle {{\partial{x}}\over{\partial {v}}}$ + ...

The last term is the covariance, which relates to whether uncertainties are correlated.

$\displaystyle \sigma_{{uv}}^{2}$ = lim(n - > $\displaystyle \infty$)$\displaystyle {1\over N}$$\displaystyle \sum$(ui - < u > )(vi - < v > )

If uncertainties are uncorrelated, then $ \sigma_{{uv}}^{}$ = 0 because there's equal chance of getting opposite signs on vi for any given ui. When working out uncertainties, make sure to consider whether there are correlated errors! If there are, you may be able to reformulate quantities so that they have independent errors: this can be very useful!

Examples for uncorrelated errors:

Distribution of resultant uncertainties

When propagating errors, even though you can calculate the variances in the new variables, the distribution of uncertainties in the new variables is not, in general, the same as the distribution of uncertainties in the original variables, e.g. if uncertainties in individual variables are normally distributed, uncertainties in output variable are not necessarily.

When two variables are added, however, the output is normally distributed.


\begin{shaded}
\textit{Know the error propagation formula and how to apply it.}
\end{shaded}

Determining sample parameters: averaging measurements

We've covered errors in single measurements. Next we turn to averaging measurements. Say we have multiple observations and want the best estimate of the mean and variance of the population, e.g. multiple measurements of stellar brightness. Here we'll define the best estimate of the mean as the value which maximizes the likelihood that our estimate equals the true parent population mean.

For equal uncertainties, this estimate just gives our normal expression for the sample mean:

$\displaystyle \bar{x}$ = $\displaystyle {\sum x_i \over N}$

Using error propagation, the estimate of the error in the sample mean is given by:

$\displaystyle \sigma_{{\bar x}}^{2}$ = $\displaystyle \sum$$\displaystyle {\sigma_i^2\over N^2}$ = $\displaystyle {\sigma^2\over N}$

But what if errors on each observation aren't equal, say for example we have observations made with several different exposure times? Then the optimal determination of the mean is using a:

weightedmean = $\displaystyle {\sum {x_i\over\sigma_i^2} \over \sum {1\over\sigma_i^2}}$

and the estimated error in this value is given by:

$\displaystyle \sigma_{\mu}^{2}$ = $\displaystyle \sum$$\displaystyle {{\sigma_i^2\over \sigma_i^4} \over ({\sum {1\over \sigma_i^2}})^2}$ = $\displaystyle {1 \over \sum{1\over\sigma_i^2}}$

where the $ \sigma_{i}^{}$'s are individual weights/errors (people often talk about the weight of an observation, i.e. $ {1\over\sigma_i^2}$: large weight means small error.

This is a standard result for determining sample means from a set of observations with different weights.

However, there can sometimes be a subtlety in applying this formula, which has to do with the question: how do we go about choosing the weights/errors, $ \sigma_{i}^{}$? We know we can estimate $ \sigma$ using Poisson statistics for a given count rate, but remember that this is a sample variance (which may be based on a single observation!) not the true population variance. This can lead to biases.

Consider observations of a star made on three nights, with measurements of 40, 50, and 60 photons. It's clear that the mean observation is 50 photons. However, beware of the being trapped by your error estimates. From each observation alone, you would estimate errors of $ \sqrt{{40}}$, $ \sqrt{{50}}$, and $ \sqrt{{60}}$. If you plug these error estimates into a computation of the weighted mean, you'll get a mean rate of 48.64!

Using the individual estimates of the variances, we'll bias values to lower rates, since these will have estimated higher S/N.

Note that it's pretty obvious from this example that you should just weight all observations equally. However, note that this certainly isn't always the right thing to do. For example, consider the situation in which you have three exposures of different exposure times, and you are calculating the photon rate (counts/s). Here you probably want to give the longer exposures higher weight (at least, if they are signal or background limited). In this case, you again don't want to use the individual error estimates or you'll introduce a bias. There is a simple solution here also: just weight the observations by the exposure time. However, while this works fine for Poisson errors (variances proportional to count rate), it isn't strictly correct if there are instrumental errors as well which don't scale with exposure time. For example, the presence of readout noise can have this effect; if all exposures are readout noise dominated, then one would want to weight them equally, if readout noise dominates the shorter but not the longer exposures, once would want to weight the longer exposures even higher than expected for the exposure time ratios! The only way to properly average measurements in this case is to estimate a sample mean, then use this value scaled to the appropriate exposure times as the input for the Poisson errors.

Another subtlety: averaging counts and converting to magnitudes is not the same as averaging magnitudes!


\begin{shaded}
\textit{Understand how the distinction between sample variance
an...
...biases when calculating a weighted
mean, and how to overcome this.}
\end{shaded}

Can you split exposures?

Although from S/N considerations, one can determine the required number of counts you need (exposure time) to do your science, when observing, one must also consider the question of whether this time should be collected in single or in multiple exposures, i.e. how long individual exposures should be. There are several reasons why one might imagine that it is nicer to have a sequence of shorter exposures rather than one single longer exposure (e.g., tracking, monitoring of photometric conditions, cosmic ray rejection, saturation issues), so we need to consider under what circumstances doing this results in poorer S/N.

Consider the object with photon flux S$\scriptstyle \prime$, background surface brightness B$\scriptstyle \prime$, and detector with readout noise $ \sigma_{{rn}}^{}$. A single short exposure of time t has a variance:

$\displaystyle \sigma_{S}^{2}$ = S$\scriptstyle \prime$Tt + B$\scriptstyle \prime$ATt + Npix$\displaystyle \sigma_{{rn}}^{2}$

If N exposures are summed, the resulting variance will be

$\displaystyle \sigma_{N}^{2}$ = N$\displaystyle \sigma_{S}^{2}$

If a single long exposure of length Nt is taken, we get

$\displaystyle \sigma_{L}^{2}$ = S$\scriptstyle \prime$TNt + B$\scriptstyle \prime$ATNt + Npix$\displaystyle \sigma_{{rn}}^{2}$.

The ratio of the noises, or the inverse ratio of the S/N (since the total signal measured is the same in both cases), is

$\displaystyle {\sigma_N\over \sigma_L}$ = $\displaystyle \sqrt{{
S^\prime T Nt + B^\prime A T Nt + NN_{pix} \sigma_{rn}^2 \over
S^\prime T Nt + B^\prime A T Nt + N_{pix} \sigma_{rn}^2 }}$

The only difference is in the readout noise term! In the signal- or background-limited regimes, exposures can be added with no loss of S/N. However, if readout noise is significant, then splitting exposures leads to reduced S/N.


\begin{shaded}
\textit{Understand under what circumstances you can split exposur...
...asons why you might want to split an exposure into shorter
pieces.}
\end{shaded}

Random errors vs systematic errors

So far, we've been discussing random errors. There is an additional, usually more troublesome, type of errors known as systematic errors. These don't occur randomly but rather are correlated with some, possibly unknown, variable relating to your observations, and can have the effect of not just adding spread around the true value that you are trying to measure, but actually measuring the wrong mean.

EXAMPLE : flat fielding

EXAMPLE : WFPC2 CTE

Note also that in some cases, systematic errors can masquerade as random errors in your test observations (or be missing altogether if you don't take data in exactly the same way), but actually be systematic in your science observations.

EXAMPLE: flat fielding, subpixel QE variations.

Note that error analysis from expected random errors may be the only clue you get to discovering systematic errors. To discover systematic errors, plot residuals vs. everything!


\begin{shaded}
\textit{Understand the distinction between random and systematic errors.}
\end{shaded}


next up previous
Next: Effects of the earth's Up: AY535 class notes Previous: Class introduction
Jon Holtzman 2017-11-17