next up previous
Next: Detectors Up: AY535 class notes Previous: Effects of the earth's

Subsections

Astronomical optics

Because astronomoical sources are faint, we need to collect light. We use telescopes/cameras to make images of astronomical sources. Example: a 20th magnitude star gives $ \sim$ 0.01photons/s/cm2 at 5000 A through a 1000 A filter! However, using a 4m telescope gives 1200 photons/s.

Single surface optics and definitions

We will define an optical system as a system which collects light; usually, the system will also make images. This requires the bending of light rays, which is accomplished using lenses (refraction) and/or mirrors (reflection).

The operation of refractive optical systems is given by Snell's law of refraction:

n sin i = n'sin i'

where n are the indices of refraction, i are the angles of incidence, relative to the normal to the surface. For reflection:

i' = - i

An optical element takes a source at s and makes an image at s'. The source can be real or virtual. A real image exists at some point in space; a virtual image is formed where light rays apparently emanate from or converge to, but at a location where no light actually appears. For example, in a Cassegrain telescope, the image formed by the primary is virtual, because the secondary intercepts the light and redirects it before light gets to the focus of the primary.

The image will not necessarily be a perfect image: all rays regardless of height at the surface, y , may NOT cross at the same point. This is the subject of aberrations, which we will get into in a while. Obviously, the degree of aberration will depend on how much the different rays differ in y , which depends on the shape of the surface. We define paraxial and marginal rays, as rays near the center of the aperture and those on the edge of the aperture. We define the chief ray as the ray that passes through the center of the aperture. To define nominal (unaberrated) quantities, we consider the paraxial regime. In this regime, all angles are small, aberrations vanish, and a surface can be wholly specified by its radius of curvature R.

The field angle gives the angle formed between the chief ray from an object and the z-axis. Note that paraxial does not necessarily mean a field angle of zero. One can have an object at a field angle and still consider the paraxial approximation.

Note also that for the time being, we are ignoring diffraction. But we'll get back to that too. We are considering geometric optics, which is what you get from diffraction as wavelength tends to 0. For nonzero wavelength, geometric optics applies as scales x > $ \sim$ $ \lambda$ .

We can derive the basic relation between object and image location as a function of a surface where the index of refraction changes (Schroeder, chapter 2).

$\displaystyle {n'\over s'}$ - $\displaystyle {n\over s}$ = $\displaystyle {(n'-n) \over R}$

The points at s and s' are called conjugate. If either s or s' is at infinity (true for astronomical sources for s), the other distance is defined as the focal length, f , of the optical element. For s = inf , f = s' .

We can define the quantity on the right side of the equation, which depends only the the surface parameters (not the image or object locations), as the power, P , of the surface:

P $\displaystyle \equiv$ (n' - n)/R = n'/f' = n/f

We can make a similar derivation for the case of reflection; in fact, we can treat reflection by considering refraction with n' = - n .

$\displaystyle {n'\over s'}$ + $\displaystyle {n' \over s}$ = $\displaystyle {(n'+n') \over R}$

$\displaystyle {1\over s'}$ + $\displaystyle {1\over s}$ = $\displaystyle {2 \over R}$

This shows that the focal length for a mirror is given by R/2 .

We define the focal ratio to be the focal length divided by the aperture diameter. The focal ratio is also called the F-number and is denoted by the abbreviation f / . Note f /10 means a focal ratio of ten; f is not a variable in this! The focal ratio gives the beam ``width''; systems with a small focal ratio have a short focal length compared with the diameter and hence the imcoming beam to the image is wide. Systems with small focal ratios are called ``fast'' systems; systems with large focal ratios are called ``slow'' systems.

The magnification of a system gives the ratio of the image height to the object height:

h'/h = (s' - R)/(s - R) = ns'/n's

The magnification is negative for this case, because object is flipped. The magnification also negative for reflection: n' = - n . Magnification is an important quantity for multi-element systems.

We define the scale as the motion of image for given incident angle of parallel beam from infinity. From a consideration of the chief rays for objects on-axis and at field angle $ \alpha$ , we get:

tan$\displaystyle \alpha$ $\displaystyle \approx$ $\displaystyle \alpha$ = $\displaystyle {x\over f}$

or

scale $\displaystyle \equiv$ $\displaystyle {\alpha\over x}$ = $\displaystyle {1\over f}$

In other words, the scale, in units of angular motion per physical motion in the focal plane, is given by 1/f . For a fixed aperture diameter, systems with a small focal ratio (smaller focal length) have a larger scale, i.e. more light in a patch of fixed physical size: hence, these are ``faster'' systems.

Multi-surface systems

To combine surfaces, one just takes the image from the first surface as the source for the second surface, etc., for each surface. We can generally describe the basic parameters of multi-surface systems by equivalent single-surface parameters, e.g. you can define an effective focal length of a multi-surface system as the focal length of some equivalent single-surface system. The effective focal length is the focal length of the first element multiplied by the magnification of each subsequent element. The two systems (single and multi) are equivalent in the paraxial approximation ONLY.

a lens (has two surfaces)

Consider a lens in air (n $ \sim$ 1 ). The first surface give

$\displaystyle {n\over s_1'}$ - $\displaystyle {1\over s_1}$ = $\displaystyle {(n-1)\over R_1}$ = P1

The second surface gives:

$\displaystyle {1\over s_2'}$ - $\displaystyle {n\over s_2}$ = $\displaystyle {(1-n)\over R_2}$ = P2

but we have s2 = s1' - d (remember we have to use the plane of the second surface to measure distances for the second surface).

After some algebra, we find the effective focal length (from center of lens):

P = $\displaystyle {1\over f'}$ = P1 + P2 - $\displaystyle {d\over n}$P1P2

P = $\displaystyle {(n-1)\over R_1}$ + $\displaystyle {(1-n)\over R_2}$ - $\displaystyle {d\over n}$$\displaystyle {(n-1)(1-n)\over R_1 R_2}$

From this, we derive the thin lens formula:

P = $\displaystyle {1\over f'}$ = $\displaystyle {(n-1)\over R_1}$ + $\displaystyle {(1-n)\over R_2}$ = (n - 1)($\displaystyle {1\over R_1}$ - $\displaystyle {1\over R_2}$)

$\displaystyle {1\over f'}$ = $\displaystyle {1\over f_1}$ + $\displaystyle {1\over f_2}$

plane-parallel plate

Zero power, but moves image laterally: $ \Delta$ = d[1 - (1/n)] . Application to filters: variation of focus.

Two-mirror telescopes:

In astronomy, most telescopes are two-mirror telescopes of Newtonian, Cassegrain, or Gregorian design. The Cassegrain is the most common and is outlined here First, accept some basic definitions:

Using some geometry, we can derive some basic relations between these quantities, in particular:

$\displaystyle \rho$ = $\displaystyle {m k \over (m-1)}$

and

(1 + $\displaystyle \beta$) = k(m + 1)

We can derive the effective focal length and focal ratio:

f = f1m

F = F1m

So we find that the telescope basic parameters (paraxial) are determined by 3 of: f, f1, m,$ \rho$, k,$ \beta$ (not m,$ \rho$, k or m,$ \beta$, k ). Usually, f1 is limited by technology. Then choose m to match desired scale. k is related to separation of mirrors, and is a compromise between making telescope shorter and blocking out more light vs. longer and blocking less light; in either case, have to keep focal plane behind primary!

One final thing to note is how we focus a Cassegrain telescope. Most instruments are placed at a fixed location, $ \beta$ , behind the primary. Focussing is usually then done by moving the secondary mirror. Clearly, if you move the secondary you change k . Since $ \rho$ is fixed by the mirror shapes, it's also clear that you change the magnification as you move the secondary; this is expected since you are changing the system focal length, f = mf1 . The amount of image motion for a given secondary motion is given by:

$\displaystyle {d\beta\over dk}$ = $\displaystyle {d\over dk}$k(m + 1) - 1

where

m(k) = $\displaystyle {\rho\over \rho -k}$

so

$\displaystyle {d\beta\over dk}$ = $\displaystyle {d\over dk}$k$\displaystyle {(2\rho -k)\over (\rho-k)}$ - 1

= $\displaystyle {2 (\rho-k)^2 + 2\rho k - k^2 \over (\rho -k)^2}$

= $\displaystyle {2 (\rho-k)^2 + \rho^2 - (\rho -k)^2\over (\rho-k)^2}$

= 1 + $\displaystyle {\rho^2\over (\rho-k)^2}$ = 1 + m2

where the motion of the secondary is f1dk and the motion of the image plane is f1d$ \beta$ .

Definitions for multi-surface system: stops and pupils

In a two-mirror telescope, the location of the exit pupil is where the image of the primary is formed by the secondary. This can be calculated using s = d as the object distance (where d is the separation of the mirrors), then with the reflection equation, we can solve for s' which gives the location of the exit pupil relative to the secondary mirror. If one defines the quantity $ \delta$ , such that f1$ \delta$ is the distance between the exit pupil and the focal plane, then (algebra not shown):

$\displaystyle \delta$ = $\displaystyle {m^2 k \over m +k-1}$ = $\displaystyle {m^2 (1+\beta) \over m^2 + \beta}$

The exit pupil is an important concept. When we discuss aberrations, it is the total wavefront error at the exit pupil which gives the system aberration.

Aberrations

Surface requirements for unaberrated images

Next we consider non-paraxial rays. We first consider what surface is required to make an unaberrated image.

We can derive the surface using Fermat's principle. Fermat's principle states that light travels in the path such that infinitessimally small variations in the path doesn't change the travel time to first order: d(time)/d(length) is a minimum. For a single surface, this reduces to the statement that light travels the path which takes the least time. An alternate way of stating Fermat's principle is that the optical path length is unchanged to first order for a small change in path. The OPL is given by:

OPL = $\displaystyle \int$cdt = $\displaystyle \int$$\displaystyle {c\over v}$vdt = $\displaystyle \int$nds

Fermat's principle has a physical interpretation when one considers the wave nature of light. It is clear that around a stationary point of the optical path light, the maximum amount of light can be accumulated over different paths with a minimum of destructive interference. By the wave theory, light travels over all possible paths, but the light coming over the ``wrong'' paths destructively interferes, and only the light coming over the ``right'' path constructively interferes.

Fermat's principle can be used to derive the basic laws of reflection and refraction (Snell's law).

Now consider a perfect imaging system that takes all rays from an object and makes them all converge to an object. Since Fermat's principle says the only paths taken will be those for which the OPL is minimally changed for small changes in path, the only way a perfect image will be formed is when all optical path lengths along a surface between an image and object point are the same - otherwise the light doesn't get to this point!

Instead of using Fermat's principle, we could solve for the parameters of a perfect surface using analytic geometry, but this would require an inspired guess for the correct functional form of the surface.

We find that the perfect surface depends on the situation: whether the light comes from a source at finite or infinite distance, and whether the mirror is concave or convex. We consider the various cases now, quoting the results without actually doing the geometry.

Consider a concave mirror with one conjugate at infinity. Fermat's principle gives:

y2 = 2Rz

where R = 2f , the radius of curvature at the mirror vertex. This equation is that of a parabola. Note, however, that a parabola makes a perfect image only for on axis images (field angle=0).

For a concave mirror with both conjugates finite, we get an ellipse. Again, this is perfect only for field angle = 0.

(z - a)2/a2 + y2/b2 = 1

y2 -2zb2/a + z2b2/a2 = 0

where

a = (s + s')/2.

b = $\displaystyle \sqrt{(}$ss')

R = ss'/(s + s') = 2b2/a

For a convex mirror with both conjugates finite, we get a hyperbola:

(z - a)2/a2 - y2/b2 = 1

y2 +2zb2/a - z2b2/a2 = 0

where

a = (s + s')/2

b2 = - ss'

(s is negative)

R = - 2b2/a

For a convex mirror with one conjugate at infinity, we get a parabola.

Note that in all cases we've considered a one-dimension surface. We can generalize to 2D surfaces by rotating around the z-axis; for the equations, simply replace y2 with (x2 + y2) .

As you may recall from analytic geometry, all of these figures are conic sections, and it is possible to describe all of these figures with a single equation:

$\displaystyle \rho^{2}_{}$ -2Rz + (1 + K)z2 = 0

where

$\displaystyle \rho^{2}_{}$ = x2 + y2

and R is the radius of curvature at the mirror vertex, K is called the conic constant (K = - e2 , where e is the eccentricity for an ellipse, e(b, a) ).

K > 0 gives a prolate ellipsoid

K = 0 gives a sphere

-1 < K < 0 gives a oblate ellipsoid

K = - 1 gives a paraboloid

K < - 1 gives a hyperboloid

Aberrations: general description and low-order aberrations

Now consider what happens for surfaces that are not perfect, e.g. for the cases considered above for field angle$ \not=$ 0 (since only a sphere is symmetric for all field angles), or for field angle 0 for a conic surface which doesn't give a perfect image?

You get aberrations; the light from all locations in aperture does not land at any common point.

One can consider aberrations in either of two ways:

  1. aberrations arise from all rays not landing at a common point,
  2. aberrations arise because wavefront deviates from a spherical wavefront.
These two descriptions are equivalent. For the former, one can talk about the transverse aberrations, which give the distance by which the rays miss the paraxial focus, or the angular aberration, which is the angle by which the rays deviate from the perfect ray which will hit paraxial focus. For the latter, one discusses the wavefront error, i.e., the deviation of the wavefront from a spherical wavefront as a function of location in the exit pupil.

First, consider the axisymmetric case of looking at an object on axis (field angle equal zero) with an optical element that is a conic section. We can consider where rays land as f ($ \rho$) , and derive the effective focal length as f ($ \rho$) for an arbitrary conic section:

z0 = $\displaystyle \rho$/tan(2$\displaystyle \phi$) = $\displaystyle \rho$(1 - (tan$\displaystyle \phi$)2)/(2 tan$\displaystyle \phi$)

tan$\displaystyle \phi$ = dz/d$\displaystyle \rho$

from conic equation:

$\displaystyle \rho^{2}_{}$ -2Rz + (1 + K)z2 = 0

z = $\displaystyle {R\over(1+K)}$$\displaystyle \left[\vphantom{1-\left(1-{\rho^2\over R^2}(1+K)\right)^{1/2}}\right.$1 - $\displaystyle \left(\vphantom{1-{\rho^2\over R^2}(1+K)}\right.$1 - $\displaystyle {\rho^2\over R^2}$(1 + K)$\displaystyle \left.\vphantom{1-{\rho^2\over R^2}(1+K)}\right)^{{1/2}}_{}$$\displaystyle \left.\vphantom{1-\left(1-{\rho^2\over R^2}(1+K)\right)^{1/2}}\right]$

z $\displaystyle \approx$ $\displaystyle {\rho^2\over 2R}$ + (1 + K)$\displaystyle {\rho^4\over 8 R^3}$ + (1 + K)2$\displaystyle {\rho^6\over 16 R^5}$ + ...

dz/d$\displaystyle \rho$ = $\displaystyle \rho$/(R - (1 - K)z)

z0 = $\displaystyle {\rho \over 2}$$\displaystyle {R - (1+K)z\over\rho}$ - $\displaystyle {\rho\over R-(1+K)z}$$\displaystyle \left.\vphantom{{ R - (1+K)z\over\rho} - {\rho\over R-(1+K)z } }\right]$

fe = z + z0

fe = $\displaystyle {R\over 2}$ + $\displaystyle {(1-K) z\over 2}$ - $\displaystyle {\rho^2\over 2 (R-(1+K) z}$

fe = $\displaystyle {R\over 2}$ - (1 + K)$\displaystyle {\rho^2\over 4 R}$ - (1 + K)(3 + K)$\displaystyle {\rho^4\over 16 R^3}$ - ...

$\displaystyle \Delta$f = fe - $\displaystyle {R\over 2}$

Note that fe is independent of z only for K = - 1 , a parabola. Also note that $ \Delta$f is symmetric with respect to $ \rho$ .

We define spherical aberration as the aberration resulting from K$ \ne$ - 1 . It is an aberration which is present on axis as seen here. Spherical aberration is symmetric in the pupil. There is no location in space where all rays focus at a point. In fact, the behavior as a function of focal position is not symmetric. One can define several criteria for where the ``best focus'' might be, leading to the terminology paraxial focus, marginal focus, diffraction focus, and the circle of least confusion.

The asymmetric nature of spherical aberration as a function of focal position distinguishes it from other aberrations and is a useful diagnostic for whether a system has this aberration. This is shown in this figure which shows a sequence of images at different focal positions in the presence of spherical aberration. We define transverse spherical aberration (TSA) as the image size at paraxial focus. This is not the location of the minimum image size.

$\displaystyle {TSA \over \Delta f}$ = $\displaystyle {\rho \over (f - z(\rho))}$

TSA = - (1 + K)$\displaystyle {\rho^3 \over 2 R^2}$ -3(1 + K)(3 + K)$\displaystyle {\rho^5\over 8 R^4}$ + ...

The difference in angle between the ``perfect'' ray from the parabola and the actual ray is called the angular aberration, in this case angular spherical aberration, or ASA.

ASA = 2($\displaystyle \phi_{p}^{}$ - $\displaystyle \phi$) $\displaystyle \approx$ $\displaystyle {d\over d\rho}$(2$\displaystyle \Delta$z) $\displaystyle \approx$ - (1 + K)$\displaystyle {\rho^3 \over R^3}$

where 2$ \Delta$z gives the optical path difference between the two rays.

This is simply related to the transverse aberration:

TSA = $\displaystyle {R\over 2}$ASA

We can also consider aberration as the difference between our wavefront and a spherical wavefront, which in this case is the wavefront given by a parabolic surface.

$\displaystyle \Delta$z = zparabola - z(K) = - $\displaystyle {\rho^4\over 8 R^3}$(1 + K) + ...

This result can be generalized to any sort of aberration: the angular and transverse aberrations can be determined from the optical path difference between a given ray and that of a spherical wavefront. The relations are given by:

angular aberration = d /d$\displaystyle \rho$(2$\displaystyle \Delta$z)

transverse aberration = s'd /d$\displaystyle \rho$(2$\displaystyle \Delta$z)

If the aberrations are not symmetric in the pupil, then we could define angular and transverse x and y aberrations separation by taking derivatives with respect to x or y instead of $ \rho$ .

We can describe deviations from a spherical wavefront generally. Since all we care about are optical path differences, we write an expression for the optical path difference between an arbitrary ray and the chief ray, and in doing this, we can also include the possibility of an off-axis image, and get

OPD = OPL - OPL(chiefray)

OPD = A0y + A1y2 + A1'x2 + A2y3 + A2'x2y + A3$\displaystyle \rho^{4}_{}$

where we've kept terms only to fourth order and chosen our coordinate system such that the object lies in the y-z plane. The coefficients, A , depend on lots of things, such as ( $ \theta$, K, n, R, s, s' ).

Note that rays along the y-axis are called tangential rays, while rays along the x-axis are called sagittal rays.

Analytically, people generally restrict themselves to talking about third-order aberrations, which are fourth-order (in powers of x, y,$ \rho$, or$ \theta$ ) in the optical path difference, because of the derivative we take to get transverse or angular aberrations. In the third-order limit, one finds that A2 = A2' , and A1 = - A1' . Working out the geometry, we find for a mirror that:

A0 = 0

A1 = $\displaystyle {n\theta^2\over R}$

A2 = - $\displaystyle {n\theta\over R^2}$$\displaystyle \left(\vphantom{{m+1\over m-1}}\right.$$\displaystyle {m+1\over m-1}$$\displaystyle \left.\vphantom{{m+1\over m-1}}\right)$

A3 = $\displaystyle {n\over 4 R^3}$K + $\displaystyle \left(\vphantom{{m+1\over m-1}}\right.$$\displaystyle {m+1\over m-1}$$\displaystyle \left.\vphantom{{m+1\over m-1}}\right)^{2}_{}$$\displaystyle \left.\vphantom{K + \left({m+1\over m-1}\right)^2}\right]$

From the general expression, we can derive the angular or the transverse aberrations in either the x or y direction. Considering the aberrations in the two separate directions, we find:

AAy = 2A1y + A2(x2 +3y2) + 4A3y$\displaystyle \rho^{2}_{}$

AAx = 2A1'x + 2A2xy + 4A3x$\displaystyle \rho^{2}_{}$

The first term is proportional to $ \theta^{2}_{}$y and is called astigmatism. The second term is proportional to $ \theta$(x2 +3y2) and is called coma. The final term, proportional to y$ \rho^{2}_{}$ is spherical aberration, which we've already discussed (note for spherical, AAx = AAy and in fact the AA in any direction is equal, hence the aberration is circularly symmetric).

For astigmatism, rays from opposite sides of the pupil focus in different locations relative to the paraxial rays. At the paraxial focus, we end up with a circular image. As you move away from this image location, you move towards the tangential focus in one direction and the sagittal focus in the other direction. At either of these locations, the astigmatic image looks like a line. Astigmatism goes as $ \theta^{2}_{}$ , and consequently looks the same for opposite field angles. Astigmatism is characterized in the image plane by the transverse or angular astigmatism (TAS or AAS), which refer to the height of the marginal rays at the paraxial focus. Astigmatism is symmetric around zero field angle.

This figure shows the rays in the presence of astigmatism.
This figure shows the behavior of astigmatism as one passes through paraxial focus.
For coma, rays from opposite sides of the pupil focus at the same location. However, the tangential rays focus at a different location than the sagittal rays, and neither of these focus at the paraxial focus. The net effect is to make an image that vaguely looks like a comet, hence the name coma. Coma goes as $ \theta$ , so the direction of the comet flips sign for opposite field angles. Coma is characterized by either the tangential or sagittal transverse/angular coma (TTC, TSC, ATC, ASC) which describe the height/angle of either the tangential or sagittal marginal rays at the paraxial focus: TTC = 3TSC .

This figure shows the rays in the presence of coma.
This figure shows the behavior of coma as one passes through paraxial focus.
In fact, there are two more third-order aberrations which we haven't yet discussed: distortion and field curvature. Neither affects image quality, only location (unless you are forced to use a flat image plane!). Distortion goes as $ \theta^{3}_{}$ . The field curvatures can be derived from the aberration coefficients and the mirror parameters.

We can also determine the relevant coefficients for a surface with a displaced stop (Schroeder p 77), or for a surface with a decentered pupil (Schroeder p89-90); it's just more geometry and algebra. With all these realtions, we can determine the optical path differences for an entire system: for a multi-surface system, we just add the OPD's as we go from surface to surface. The final aberrations can be determined from the system OPD.

Aberration compensation and different telescope types

Using the techniques above, we can write expressions for the system aberrations as a function of the surface figures (and field angles). If we give ourselves the freedom to choose surface figures, we can eliminate one (or more) aberrations.

For example, given a conic constant of the primary mirror, we can use the aberration relations to determine K2 such that spherical aberration is zero; this will give us perfect images on-axis. We find that:

K2 = - $\displaystyle \left(\vphantom{{(m+1)\over(m-1)}}\right.$$\displaystyle {(m+1)\over(m-1)}$$\displaystyle \left.\vphantom{{(m+1)\over(m-1)}}\right)^{2}_{}$ + $\displaystyle {m^3\over k (m-1)^3)}$(K1 + 1)

satisfies this criterion. If we set the primary to be a parabola (K1 = - 1 ), this gives the conic constant of the secondary we must use to avoid spherical aberration. This type of telescope is called a classical telescope. Using the aberration relations, we can determine the amount of astigmatism and coma for such telescopes, and we find that coma gives significantly larger aberrations than astigmatism.

If we allow ourselves the freedom to choose both K1 and K2 , we can eliminate both spherical aberration and coma. Designs of this sort are called aplanatic. The relevant expression, in terms of the magnification and back focal distance (we could use the relations discussed earlier to present these in terms of other paraxial parameters), is:

K1 = - 1 - $\displaystyle {2(1+\beta)\over m^2(m-\beta)}$

We can only eliminate two aberrations with two mirrors, so even this telescope will be left with astigmatism.

There are two different classes of two-mirror telescopes: Cassegrain telescopes and Gregorian telescopes. For the classical telescope with a parabolic primary, the Cassegrain secondary is hyperbolic, whereas for a Gregorian it is ellipsoidal (because of the appropriate conic sections derived above for convex and concave mirrors with finite conjugates). For the aplanatic design, the Cassegrain telescope has two hyperbolic mirrors, while the Gregorian telescope has two ellipsoidal mirrors. An aplanatic Cassegrain telescope is called a Ritchey-Chretien telescope.

The following table gives some characteristics of ``typical'' telescopes. Aberrations are given at a field angle of 18 arc-min in units of arc-seconds. Coma is given in terms of tangential coma.

Characteristics of Two-Mirror Telescopes
Parameter CC CG RC AG
m 4.00 -4.00 4.00 -4.00
k 0.25 -0.417 0.25 -0.417
1 - k 0.75 1.417 0.75 1.417
mk 1.000 1.667 1.000 1.667
ATC 2.03 2.03 0.00 0.00
AAS 0.92 0.92 1.03 0.80
ADI 0.079 0.061 0.075 0.056
$ \kappa_{m}^{}$R1 7.25 -4.75 7.625 -5.175
$ \kappa_{p}^{}$R1 4.00 -8.00 4.00 -8.00

The image quality is clearly better for the aplanatic designs than for the classical designs, as expected because coma dominates off-axis in the classical design. In the aplanatic design, the Gregorian is slightly better. However, when considerations other than just optical quality are considered, the Cassegrain usually is favored: for the same primary mirror, the Cassegrain is considerably shorter and thus it is less costly to build an enclosure and telescope structure. To keep the physical length the same, the Gregorian would have to have a faster primary mirror, which are more difficult (i.e. costly) to fabricate, and which will result in a greater sensitivity to alignment errors. Both types of telescopes have a curved focal plane.

Sources of aberrations

So far, we have been discussing aberrations which arise from the optical design of a system when we have a limited number of elements. However, it is important to realize that aberrations can arise from other sources as well. These other sources can give additional third-order aberrations, as well as higher order aberrations. Some possible sources include:

Ray tracing

For a fully general calculation of image quality, one does not wish to be limited to third-order aberrations, nor does one often wish to work out all of the relations for the complex set of aberrations which result from all of the sources of aberration mentioned above. Real world situations also have to deal with vignetting in optical systems, in which certain rays may be blocked by something and never reach the image plane (e.g., in a two-mirror telescope, the central rays are blocked by the secondary).

Because of these and other considerations, analysis of optical systems is usually done using ray tracing, in which the parameters of an optical system are entered into a computer, and the computer calculates the expected images on the basis of geometric optics. Many programs exist with many features: one can produce spot diagrams which show the location of rays from across the aperture at an image plane (or any other location), plots of transverse aberrations, plots of optical path differences, etc., etc.

( Demo ray trace program. Start with on-axis object, single mirror. Where is focus? What will image look like with spherical mirror? What do we need to do to make it perfect? How does it depend on aperture size? Now how do off-axis images look like? spot diagrams, through focus, ray fan, opd plots, etc. Now introduce second mirror. What determines where focus will be? Magnification? What shape to make a perfect on-axis image? What do off-axis images look like? How do we make them better? Now how is performance? Real 3.5m and 1m prescriptions. Issue: guider. )

Physical (diffraction) optics

Up until now, we have avoided considering the wave nature of light which introduces diffraction from interference of light coming from different parts of the aperture. Because of diffraction, images of a point source will be slightly blurred. From simple geometric arguments, we can estimate the size of the blur introduced from diffraction:

We find that:

$\displaystyle \theta$ $\displaystyle \sim$ $\displaystyle {\lambda\over D}$

Using this, we find that the diffraction blur is smaller than the blur introduced by seeing for D > 0.2 meters at 5500 Å, even for the excellent seeing conditions of 0.5 arcsecond images. However, the study of diffraction has become important recently because of several reasons: 1) the existence of the Hubble Space Telescope, which is diffraction limited (no seeing), 2) the increasing use of infrared observations, where diffraction is more important than in the optical, and 3) the development of adaptive optics, which attempts to remove some of the distortions caused by seeing. Consequently, it's now worthwhile to understand some details about diffraction.

To work out in detail the shape of the images formed from diffraction involves understanding wave propagation. Basically, one integrates over all of the source points in the aperture (or exit pupil for an optical system), determining the contribution of each point at each place in the image plane. The contributions are all summed taking into account phase differences at each image point, which causes reinforcment at some points and cancellation at others. The expression which sums all of the individual source points is called the diffraction integral. When the details are worked out, one finds that the intensity in the image plane is related to the intensity and phase at the exit pupil. In fact the wavefront is described at any plane by the optical transfer function, which gives the intensity and phase of the wave at all locations in that plane. The OTF at the pupil plane and at the image plane are a Fourier transform pair. Consequently, we can determine the light distribution in the image plane by taking the Fourier transform of the pupil plane; the light distribution, or point spread function, is just the modulus-squared of the OTF at the image plane. Symbolically, we have

PSF = FT(OTF(pupil ))$\displaystyle \left.\vphantom{FT(OTF(pupil))}\right\vert^{2}_{}$

where FT represents a Fourier transform, and

OTF(pupil )= P(x, y)expik$\displaystyle \phi$(x, y)

P(x, y) is the pupil function, which gives the transmission properties of the pupil, and usually consists of ones and zeros for locations where light is either transmitted or blocked (e.g., for a circular lens, the pupil function is unity within the radius of lens, and zero outside; for a typical telescope the pupil function includes obscuration by the secondary and secondary support structure). $ \phi$ is the phase in the pupil. More relevantly, $ \phi$ can be taken to be the optical path difference in the pupil with some fiducial phase, since only OPDs matter, not the absolute phase. Finally the wavenumber k is just $ {2\pi\over \lambda}$ .

For the simple case of a plane wave with no phase errors, the diffraction integral can be solved analytically. The result for a circular aperture with a central obscuration, when the fractional radius of the obscuration is given by $ \epsilon$ , the expression for the PSF is:

PSF $\displaystyle \propto$ $\displaystyle {2J_1(v)\over v}$ - $\displaystyle \epsilon^{2}_{}$$\displaystyle {2J_1(\epsilon v)\over
\epsilon v}$$\displaystyle \left.\vphantom{{2J_1(v)\over v} - \epsilon^2{2J_1(\epsilon v)\over
\epsilon v}}\right]^{2}_{}$

v = $\displaystyle {\pi r \over \lambda F}$

where J1 is a first order Bessel function, r is the distance in the image plane, $ \lambda$ is the wavelength, and F is the focal ratio (F = f /D ).

This expression gives the so-called Airy pattern which has a central disk surrounded by concentric dark and bright rings. One finds that the radius of the first dark ring is at the physical distance r = 1.22$ \lambda$F , or alternatively, the angular distance $ \alpha$ = 1.22$ \lambda$/D . This gives the size of the Airy disk.

For more complex cases, the diffraction integral is solved numerically by doing a Fourier transform. The pupil function is often more complex than a simple circle, because there are often additional items which block light in the pupil, such as the support structures for the secondary mirror.

This figure shows the Airy pattern, both without obscurations, and with a central obscuration and spiders in a setup typical of a telescope.

In addition, there may be phase errors in the exit pupil, because of the existence of any one of the sources of aberration discussed above. For general use, $ \phi$ is often expressed as an series, where the expansion is over a set of orthogonal polynomials for the aperture which is being used. For circular apertures with (or without) a central obscuration (the case most often found in astronomy), the appropriate polynomials are called Zernike polynomials. The lowest order terms are just uniform slopes of phase across the pupil, called tilt, and simply correspond to motion in the image plane. The next terms correspond to the expressions for the OPD which we found above for focus, astigmatism, coma, and spherical aberration, generalized to allow any orientation of the phase errors in the pupil. Higher order terms correspond to higher order aberrations.

This figure shows the form of some of the low order Zernike terms: the first corresponds to focus aberration, the next two to astigmatism, the next two to coma, the next two to trefoil aberration, and the last to spherical aberration.

A wonderful example of the application of all of this stuff was in the diagnosis of spherical aberration in the Hubble Space Telescope, which has been corrected in subsequent instruments in the telescope, which introduce spherical aberration of the opposite sign. To perform this correction, however, required and accurate understanding of the amplitude of the aberration. This was derived from analysis of on-orbit images, as shown in this figure. Note that it is possible in some cases to try to recover the phase errors from analysis of images. This is called phase retrieval. There are several ways of trying to do this, some of which are complex, so we won't go into them, but it's good to know that it is possible. But an accurate amplitude of spherical aberration was derived from these images. This derived value was later found to correspond almost exactly to the error expected from an error which was made in the testing facility for the HST primary mirror, and the agreement of these two values allowed the construction of new corrective optics to proceed...

Adaptive Optics

The goal of adaptive optics is to partially or entirely remove the effects of atmospheric seeing. Note that these day, this is to be distinguished form active optics, which works at lower frequency, and whose main goal is to remove aberrations coming from the change in telescope configuration as the telescope moves (e.g., small changes in alignment from flexure or sag of the primary mirror surface as the telescope moves). Active optics generally works as frequencies less than (usually significantly) 1 Hz, whereas adaptive optics must work at 10 to 1000 Hz. At low frequencies, the active optics can be done with actuators on the primary and secondary mirrors themselves. At the high frequencies reqiured for adaptive optics, however, these large mirrors cannot respond fast enough, so one is required to form a pupil on a smaller mirror which can be rapidly adjusted; hence adaptive optics systems are really separate astronomical instruments.

Many adaptive optics systems functioning and/or under development: see ESO/VLT adaptive optics, CFHT adaptive optics, Keck adaptive optics, Gemini adaptive optics, http://www.cfht.hawaii.edu/Instruments/Imaging/AOB/other-aosystems.html

The basic idea of an adaptive optics system is to rapidly sense the wavefront errors and then to correct for them on timescales faster than those at which the atmosphere changes. Consequently, there are really three parts to an adaptive optics system:

  1. a component which senses wavefront errors,
  2. a control system which figures out how to correct these errors, and
  3. an optical element which receives the signals from the control system and implements wavefront corrections.

There are several methods used for wavefront sensing. Two ones in fairly common use among today's adaptive optics system are Shack-Hartmann sensors and wavefront curvature sensing devices. In a Shack-Hartmann sensor, an array of lenslets is put in a pupil plane and each lenslet images a small part of the pupil. Measuring image shifts between each of the images gives a measure of the local wavefront tilts. Wavefront curvature devices look at the intensity distribution in out-of focus images. Other wavefront sensing techniques include pyramid wavefront sensors and phase diversity techniques. Usually, a star is used as the source, but this is not required for some wavefront sensors (i.e. extended source can be used).

To correct wavefront errors, some sort of deformable mirror is used. These can be generically split into two categories: segmented and continuous faceplate mirrors, where the latter are more common. A deformable mirror is characterized by the number of adjustable elements: the more elements, the more correction can be done. LCD arrays have also been used for wavefront correction.

In general, it is very difficult to achieve complete correction even for ideal performance, and one needs to consider the effectiveness of different adaptive optics systems. This effectiveness depends on the size of the aperture, the wavelength, the number of resolution elements on the deformable mirror, and the quality of the site. Clearly, more resolution elements are needed for larger apertures. Equivalently, the effectiveness of a system will decrease as the aperture in increased for a fixed number of resolution elements. One can consider the return as a function of Zernike order corrected and aperture size. For large telescopes, you'll only get partial correction unless a very large number of resolution elements on the deformable mirror are available. The following table gives the mean square amplitude, $ \Delta_{j}^{}$ , for Kolmogorov turbulence after removal of the first j terms; the rms phase variation is just $ \sqrt{{\Delta_j}}$/2$ \pi$ . For small apertures, you can make significant gains with removal of just low order terms, but for large apertures you need very high order terms. Note various criteria for quality of imaging, e.g. $ \lambda$/4 , etc.

Zj n m Expression Description $ \Delta_{j}^{}$ $ \Delta_{j}^{}$ - $ \Delta_{{j-1}}^{}$
Z1 0 0 1 constant 1.030 S  
Z2 1 1 2r cos$ \phi$ tilt 0.582 S 0.448 S
Z3 1 1 2r sin$ \phi$ tilt 0.134 S 0.448 S
Z4 2 1 $ \sqrt{{3(2r^2-1)}}$ defocus 0.111 S 0.023 S
Z5 2 2 $ \sqrt{{6r^2\sin 2\phi}}$ astigmatism 0.0880 S 0.023 S
Z6 2 2 $ \sqrt{{6r^2\cos 2\phi}}$ astigmatism 0.0648 S 0.023 S
Z7 3 1 $ \sqrt{{8(3r^3-2r)\sin\phi}}$ coma 0.0587 S 0.0062 5
Z8 3 1 $ \sqrt{{8(3r^3-2r)\cos\phi}}$ coma 0.0525 S 0.0062 S
Z9 3 3 $ \sqrt{{8r^3\sin 3\phi}}$ trifoil 0.0463 S 0.0062 S
Z10 3 3 $ \sqrt{{8r^3\cos 3\phi}}$ trifoil 0.0401 S 0.0062 S
Z11 4 0 $ \sqrt{{5(6r^4-6r^2+1)}}$ spherical 0.0377S 0.0024 S
r = distance from center circle; $ \phi$ = azimuth angle; S = (D/r0)5/3 .

Another important limitation is that one needs an object on which you can derive the wavefront. Measurements of wavefront are subject to noise just like any other photon detection so bright sources may be required. This is even more evident when one considers that you need a source which is within the same isoplanatic patch as your desired object, and when you recall that the wavefront changes on time scales of milliseconds. These requirements place limitations on the amount of sky over which it is possible to get good correction. It also places limitations on the sorts of detectors which are needed in the wavefront sensors (fast readout and low or zero readout noise!).

band $ \lambda$ r0 $ \tau_{0}^{}$ $ \tau_{{det}}^{}$ Vlim $ \theta_{0}^{}$ Coverage (%)
U 0.365 9.0 .009 .0027 7.4 1.2 1.8 E-5
B 0.44 11.4 .011 .0034 8.2 1.5 6.1 E-5
V 0.55 14.9 .015 .0045 9.0 1.9 2.6 E-4
R 0.70 20.0 .020 .0060 10.0 2.6 0.0013
I 0.90 27.0 .027 .0081 11.0 3.5 0.006
J 1.25 40 .040 .0120 12.2 5.1 0.046
H 1.62 55 .055 .0164 13.3 7.0 0.22
K 2.2 79 .079 .024 14.4 10.1 1.32
L 3.4 133 .133 .040 16.2 17.0 14.5
M 5.0 210 .21 .063 17.7 27.0 71
N 10 500 .50 .150 20.4 64 100
Conditions are: 0.75 arcsec seeing at 0.5 $ \mu$ ; $ \tau_{{det}}^{}$ $ \sim$ 0.3 $ \tau_{0}^{}$ = 0.3r/Vwind ; Vwind = 10 mIsec; H = 5000 ; photon detection efficiency (includes transmission and QE) = 20%; spectral bandwidth = 300 nm; SNR = 100 per Hartmann-Shack image; detector noise = 5e- .

The isoplanatic patch limitation is severe. In many cases, we might expect non-opticmal performance if the reference object is not as close as it should be ideally.

In most cases, both because of lack of higher order correction and because of reference star vs. target wavefront differences, adaptive optics works in the partially correcting regime. This typically gives PSFs with a sharp core, but still with extended wings.

The problem of sky coverage can be avoided if one uses so-called laser guide stars. The idea is to create a star by shining a laser up into the atmosphere. To date, two generic classes of lasers have been used, Rayleigh and sodium beacons. The Rayleigh beacons work by scattering off a layer roughly 30 km above the Earth's surface; the sodium beacons work by scattering off a layer roughly 90 km above the Earth's surface. Laser guide stars still have some limitations. For one, the path through the atmosphere which the laser traverses does not exactly correspond to the path that light from a star traverses, because the latter comes from an essentially infinite distance; this leads to the effect called focal anoisoplanatism. In addition, laser guide stars cannot generally be used to track image motion since the laser passes up and down through the same atmosphere and image motion is cancelled out. To correct for image motion, separate tip-tilt tracking is required. Note that even with perfect correction, one is still limited by the isoplanatic patch size. As one moves further and further away from the reference object, the correction will gradually degrade.

In principle, correction over a wider field of view is possible with multiple deformable mirrors and multiple reference objects, giving rise to the concept of multi-conjugate adaptive optics (MCAO) systems.

Systems with single laser guide stars have certainly been tested and appear to work; but remember, only over an isoplanatic patch, and often with partially corrected images. Several implementations of system with multiple guide stars actually exist (at VLT and Keck?) to allow sampling of a larger cylinder/cone through the atmosphere; some of these are designed to correct at particular layers to maximize FOV, e.g. ground layer adaptive optics (GLAO). The bulk of adaptive optics work has been done in the near-IR.

Extreme (high-contrast) AO.

A variant on adaptive optics: lucky imaging.

Science with adaptive optics. Typical AO PSFs. Morphology vs. photometry.

Telescope design considerations

Large mirror types

One real-world issue for large telescopes is the technology of how to build a large mirror which will not be so heavy that it will sag under its own weight. Additionally, since it has been recognized that good image quality requires that the mirrors be at the same temperature as the outside air, the mirror technology must be such that the mirror has a short thermal time constant, or, in other words, it must be able to change temperature to match the outside air fairly quickly. If necessary, one can consider thermally controlling the mirror, e.g., with heating or air conditioning.

In the large mirror regime, there are currently three leading technologies. The first is the construction of a single large mirror (monolithic) made from borosilicate glass, but having large hollowed out regions to keep the weight down. This borosilicate honeycomb design has been pioneered by Roger Angel at the Mirror Lab of the University of Arizona. This type of mirror has been successfully cast in a 3.5m size (used in the ARC 3.5m (APO), WIYN 3.5m (KPNO), and the Starfire Optical Range Telescope near Albuquerque), and in a 6.5m format for the MMT conversion (Mt. Hopkins, AZ) and the Magellan (Las Campanas Observatory, Chile) telescopes; they have also been made in an 8m format (x2) for the Large Binoculuar Telescope (Mt. Graham, AZ). The second design is also monolithic but has a mirror which is significantly thinner than the borosilicate mirror. These thin mirrors are being built primarly by two companies, Corning (USA) and Schott (Germany). They use materials with good thermal properties, ULE (Corning) and Zerodur (Schott). Thin mirrors are being used in ESO's 3.5m New Technology Telescope (La Silla, Chile), Japan's 8m Subaru telescope (Mauna Kea, Hawaii), the two 8m Gemini telescopes (Mauna Kea and Cerro Pachon, Chile), and ESO's Very Large Telescopes (4 8m's on Cerro Paranal). Finally, the third design make use of segmented mirrors, in which a large mirror is made by combining many small mirrors. This design is currently operational in the 10m Keck telescope (Mauna Kea), the 11m Hobby-Eberly Telescope, the 11m SALT telescope, and the 10m Gran Telescopio de las Canarias. Future 30m class telescopes: TMT, GMT, and E-ELT.

See http://astro.nineplanets.org/bigeyes.html for a nice tabular summary.

The borosilicate mirrors have the advantage that they are stiffer than the other designs, so the mirror support is less complicated. For thin mirrors, the support system must be activated to allow for changing shape as a function of telescope pointing. For segmented mirrors, each segment must be controlled to make sure the entire surface is smooth. The thick mirror is also less susceptible to wind shake, which can adversely affect image quality. The thin and segmented mirrors have the advantage of better thermal properties since they contain less total material.

The choice of a primary mirror technology can be complicated. In designing a large telescope, one generally first decides on an optical prescription which is chosen considering the main scientific goals for the project (e.g., large field, IR, good image quality, etc.). The primary mirror choice is made considering the choice of site (e.g, are there large temperature changes, lots of wind, etc.), availability, issues of engineering complexity, and, especially, cost (and politics). The choice of a mount and control system to use is basically a cost and operations issue.

Telescope mounts

We've talked about the optics that go into telescopes. However, it's clear that these optics need to be supported in some structure and kept in alignment with each other. The support structures needed are really an engineering issue (and a challenging one for large telescopes), and we won't disucss it here. In addition to supporting the optics, the structure also needs to be capable of tracking astronomical objects as they move across the sky because of the rotation of the earth.

There are two main different sorts of telescope mounts found in observatories: the equatorial mount and the altitude-azimuth (alt-az) mount. The equatorial mount is by far the most common for older telescopes, but the alt-az design is being used more frequently for newer, especially larger, telescopes. In the equatorial design, the telescope move along axes which are parallel and perpendicular to the polar axis, which is the direction parallel to the earth's rotation axis. In such a mount, tracking the earth's rotation only requires motion along one axis, the one perpendicular to the polar axis, and the tracking motion is at a uniform rate. In the alt-az mount, the telescope moves along axes which are perpendicular and parallel to the local vertical axis. With this mount, however, tracking of celestial objects requires motions of variable speed along both axes. An additional complication of an alt-az mount is the fact that, for a detector which is fixed to the back of the telescope, the image field rotates as the telescope tracks an object. Note, however, that the telescope pupil does not rotate with the object.

An equatorial mount is much easier to control for pointing and tracking. However, from an engineering point of view, it is much more demanding to construct, especially for large telescopes which have significant weight. The engineering complications generally result in a significantly larger cost (for large telescopes) than for an alt-az design. An alt-az telescope, however, has a significantly more complex control system, and must have an image rotator for the instruments.

Regardless of mount type, the mount is never built absolutely perfectly, i.e. with axes exactly perpendicular, exactly aligned as they should be, totally round surfaces, optics aligned with mechanics, etc. As a result, a telescope does not generally point perfectly. However, many effects of an imperfect telescope are quite repeatable, so they can be corrected for. This corrrection is done by something called a pointing model, which records the difference in true position from prediction position over the sky, and, once derived, the pointing model can be implemented to significantly improve pointing. A good telescope points to within a few arcseconds after implementation of a good pointing model.

Related to pointing is tracking performance. The issue here is how long the telescope can stay pointed at a given target. You can consider this question as how well the telescope can point over the area of the sky through which your object will drift. Since your required pointing stability should be significantly less than one arcsec, so that tracking does not degrade the image quality significantly, almost no telescopes have sufficiently good pointing to track to within an arcsecond for an arbitrarily long time. Most telescopes can track sucessfully for several minutes, but will give significant image degradation for exposures longer than this. Consequently, most telescopes/instruments are equipped with guide cameras, which are used to continually correct the pointing by observing an object somewhere in the field of view of the telescope (possibly the object you are interested in, but usually not, since that's where your detector is looking). These days, most guiders are autoguiders, meaning that they automatically find the position of the guide object, compute the pointing offsets needed to keep this object in one position, and send these offsets as commands to the telescope. The observer generally just has to choose a guide object for the autoguider to use, though they also may have to adjust the guide camera sensitivity or gain to insure that the guide star has a strong signal. These days, many autoguiders can automatically find guide stars in the field or from some on-line catalog (e.g., the HST Guide Star Catalog, which catalogs stars down to V 14). However, if one is taking long exposures and knows that they'll need to use guide stars, make sure to find out whether such a facility is available ; if not, it may still be possible to find guide stars in advance of your observing run, e.g., from the sky survey. If so, you should seriously consider doing so, as it can take a frustratingly long amount of time to search for a guide star at the telescope in real time. Since telescope time is heavily oversubscribed at most facilities, you really want to maximize your efficiency, and doing so is a large part of what will make you a ``expert'' observer.

Note guiding in spectrographs is often done off of the slit with a slit-viewing camera.

Using telescopes

Generally it is usually fairly straightforward to use an astronomical telescope. Most of the time after arrival at an observatory can be spent checking the instrument and detector performance rather than checking the telescope performance. You should carefully consider, however, how to maximize your efficiency at the telescope; telescope time is expensive and hard to come by.

Before going to a telescope, you might consider the following checklist of things to do:

One of the first things to be done just after dark is focussing the telescope. This generally involves taking images at a range of focus settings and comparing them to determine the best focus. One should be prepared with software for analyzing image quality to make this determination - also, software for looking at all images simultaneously. Generally, the focus position is encoded somehow so one gets a quantitative measure of the secondary location. One should be aware, however, of the possibility of slack in the gears controlling the focus mechanism, which can make the focus not repeat even when the readout position is the same; because of this, it is generally wise to always move to a focus position from one direction.

While focussing, one can generally also get an idea of the quality of the seeing of the night. Remember that seeing varies from frame to frame, and because of this, multiple exposures even at the same focus can look very different. To minimize seeing effects, one may wish to choose a focus star on which exposures of several seconds can be made: for a brighter star with very short exposures, seeing changes may confuse you. Clearly, however, one doesn't want to use a very faint star because one would like to get the focussing procedure over as quickly as possible so you can get on with your science. Remember, however, the signal-to-noise gains are substantial for a more concentrated image, so it will be worth your while to do a good job: if you rush it, you may regret it later when you have more time to notice how blurred your images are!

You also need to remember that the focus is likely to change throughout the night as the temperature changes. So continue to inspect your images as you take them, and if the quality appears to be degrading, you should redo a focus run. Most likely, the telescope focus will consistently change in one direction (as it gets colder) and you may even be able to get a good estimate of how much it changes as a function of the temperature with experience. Which direction focus goes is a good thing to write down at a telescope, as you can save significant time during mid-night focus changes if you already know which direction you need to go (but always beware of someone coming and rewiring the focus motor/control since the time of your last run!). Determining the correct balance of time spent optimizing (i.e. focussing) vs. taking science data can be tricky, and likely depends on the nature of your program.

One may wish to quickly inspect an out-of-focus image for signs of large aberrations in the system. Almost certainly, nothing will be done about these immediately, but if the image quality is poor enough, it may be possible to have something (e.g., alignment) done the next day, so you still may possibily help your observing run, or certainly, you will help subsequent observers. At least, if something seems strange, you should let someone know so they can judge for themselves if there is really a problem.

Overall, this is a key point; you need to be vigilant to look for peculiarities in your data, and if you see something that hasn't previously been documented or that you don't understand, you need to ask someone about it rather than just assume that it is ``normal''!

INSTRUMENTATION

Often, astronomers use additional optics between the telescope and their detector. We can loosely define a camera to consist of any additional optics between the telescope and the detector. This is the most general definition; sometimes, camera is just used to refer to optics which contribute to an imaging system, as opposed to a spectrograph. Here we consider some of the more common optics used in imaging cameras.

Location of optics

Before going into specifics, consider the effect of placing optics at different locations within an optical system, like a telescope.

Optics placed in or near a focal plane will affect images at different field angles differently. Optics in a focal plane will not affect the image quality at any given field angle; however, such optics might be used to control the location of an image of the pupil of the telescope.

Optics placed in or near a pupil plane will affect images at all field angles similarly, and will have an effect on the image quality.

Field Flatteners

As we've discussed, all standard two-mirror telescopes have curved focal planes. It is possible to make a simple lens to correct the field curvature. We know that a plane-parallel plate will shift an image laterally, depending on the thickness of the plate. If we don't want to affect the image quality, only the location, we want the correcting element to be located near the focal plane.

Consequently, we can put a lens right near the telescope focal plane to flatten the field. For a field which curves towards the secondary mirror, one finds that the correct shape to flatten the field is just a plano-concave lens with the curved side towards the secondary. Often, the field flattener is incorporated into a detector dewar as the dewar window.

Focal plane reimagers

A focal reimager is a reimaging system which demagnifies/magnifies the telescope focal plane. In a simple form, it consists of two lenses: a collimator and a camera lens. The collimator lens has the same focal ratio as the telescope, and converts the telescope beam into a collimated beam. The camera lens then refocuses the light light with the desired focal ratio. The magnification of the system is given by:

m = $\displaystyle {f_{camera}\over f_{collimator}}$

Consequently, the focal length of the entire system (telescope plus focal reimager) is:

fsystem = ftelescopem

fsystem = ft$\displaystyle {f_{cam}\over f_{col}}$

fsystem = FtDt$\displaystyle {F_{cam} D_{cam} \over F_{col} D_{col}}$

fsystem = DtelFcam

where we have used the fact that Fcol = Ft (or else it won't collimate the light) and that the diameters of the beams on the camera and collimator are the same. Consequently, the scale in the image plane of the focal reimager is just the scale in the telescope focal plane multiplied by the ratio of the focal ratio of the camera to that of the telescope.

Note that with a focal plane reimager, one does not necessarily get a new scale ``for free''. The focal reimaging system may introduce additional aberrations giving reduced image quality. In addition, one always loses some light at each additional optical surface from reflection and/or scattering, so the more optics in a system, the lower the total throughput.

Pupil reimagers

Often, an additional lens, called a field lens is placed in the telescope focal plane. This does not affect the focal reduction but is used to reimage the telescope pupil somewhere in the reimager. One reason this may be done is to minimize the size that the collimator lens needs to be to get off-axis images. The size of the field lens itself depends on the desired size of the field that one wishes to reimage.

Another use of reimaging the pupil is when one is building a coronagraph, an imaging system designed to observe faint sources nearby to very bright ones. The problem in seeing the faint source is light from the bright one, both from scattered light, from diffraction, and sometimes, from detector effects (e.g., charge bleeding in a CCD). A partial solution is to put an occulting spot in the telescope focal plane which removes most of the light from the bright object. However, the diffraction structure is still a problem. It turns out you can remove this by reimaging the pupil after the occulting spot and putting a mask in around the edges which are the source of the diffraction; this mask is called a Lyot stop. The resulting image in the focal plane of the focal reducer is free of both bright source and diffraction structure. To do really well with a coronagraph, one also needs to minimize scattering on the optical surfaces, which requires very smooth optics which are very clean.

Pupil reimagers are also widely used in IR systems to reduce emission via cold pupil stops. The issue here is that the telescope itself contributes infrared emission which acts as additional background in your observations. There is little you can do about emission from the primary, since you need to see light from the primary to see your object! However, you can block out emission from regions of the pupils which are obscured already, for example, by the secondary and/or secondary support structures. To do this you put a mask in the pupil plane. Obviously, however, the mask needs to be colder than the telescope itself or else the mask would contribute the background, so it is usually placed within the dewar that contains the detector and camera optics.

Filters

Filters are used in optical systems (usually imaging systems) to restrict the observed wavelength range. Using multiple filters thus provides color information on the object being studied. Generally, filters are loosely classified as broad band ( > $ \sim$ 1000 Åwide), medium band (100 < $ \sim$ 1000 Å), or narrow band ( 1 < $ \sim$ 100 Å).

Perhaps a better distinction between different filters is by the way that they filter light. Many broad band filters work by using colored glass, which has pigments which absorb certain wavelengths of light and let others pass. Bandpasses can be constructed by using multiple types of colored glass. These are generally the most inexpensive filters. A separate filter technique uses the principle of interference, giving what are called interference filters. They are made by using two partially reflecting plates separated by a distance d apart. The priciple is fairly simple:

Interference filter diagram When light from the different paths combines constructively, light is transmitted; when it combines destructively, it is not. Simple geometry gives:

m$\displaystyle \lambda$ = 2nd cos$\displaystyle \theta$

It is clear from this expression that the passband of the filter will depend on the angle of incidence. Consequently narrowband filters will have variable bandpasses across the field if they are located in a collimated beam; this can cause great difficulties in interpretation! If the filter is located in a focal plane or a converging beam, however, the mix of incident angles will broaden the filter bandpass. This can be a serious effect in a fast beam.

Since interference filters will pass light at integer multiples of the wavelength, the extra orders often must be blocked. This can be done fairly easily with colored glass.

The width of the bandpass of a narrowband filter is determined by the amount of reflection at each surface. Both the wavelength center and the width can be tuned by using multiple cavities and/or multiple reflecting layers, and most filters in use in astronomy are of this more complex type.

The same principles by which interference filters are made are used to make antireflection coatings.

Note filters can introduce aberrations, dust spots, reflections, etc; one needs to consider these issues when deciding on the location of filters in an optical system.

Fabry-Perot Interferometer

A Fabry-Perot system makes use of a tunable interference filter. The filter is tuned in wavelength by adjusting one of

A tunable interference filter is called an etalon. Often, etalons are made to provide very narrow bandpasses, on the order of 1Å.

A picture taken with a Fabry-Perot system covers multiple wavelengths because the etalon is located in the collimated beam between the two elements of the focal reducer. At each etalon setting, one observes an image which has rings of constant wavelength. By tuning the etalon to give different wavelengths at each location, one build up a ``data cube'', through which observations at a constant wavelength carve some surface. Consequently, to extract constant wavelength information from the Fabry-Perot takes some reasonably sophisticated reduction techniques. It is further complicated by the fact that to get accurate quantitative information, one requires that the atmospheric conditions be stable over the entire time when the data cube is being taken.

Spectrographs

A spectrograph is an instrument which separates different wavelengths of light so they can be measured independently. Most spectrographs work by using a dispersive element, which directs light of different wavelengths in different directions.

A conventional spectrograph has a collimator, a dispersive element, a camera to refocus the light, and a detector. The performance of a spectrograph is characterized by the dispersion, which gives the amount that different wavelengths are separated, and the resolution, which gives the smallest difference in wavelength that two different monochromatic sources can be separated. There are different sorts of dispersive elements with different characteristics; two common ones are prisms and diffraction gratings, with the latter the most commonly in use in astronomy.

The dispersion depends on the characteristic of the dispersing element. Various elements can be characterized by the angular dispersion, d$ \theta$/d$ \lambda$ , or alternatively, the reciprocal angular dispersion, d$ \lambda$/d$ \theta$ . In practice, we are often interested in the linear dispersion, dx/d$ \lambda$ = f2d$ \theta$/d$ \lambda$ or the reciprocal linear dispersion, d$ \lambda$/dx = $ {1\over f_2}$d$ \lambda$/d$ \theta$ where the latter is often referred to simply as the dispersion in astronomical contexts, and is usually specified in Å/mm or Å/pixel.

If the source being viewed is extended, it is clear that any light which comes from regions parallel to the dispersion direction will overlap in wavelength with other light, leading to a very confused image to interpret. For this reason, spectrographs are usually used with slits or apertures in the focal plane to restrict the incoming light. Note that one dimension of spatial information can be retained, leading to so-called long-slit spectroscopy. Also, if there is a single dominant point source in the image plane, or if they are spaced far enough (usually in combination with a low dispersion) that spectra will not overlap, spectroscopy can be done in slitless mode. However, note that in slitless mode, one can be significantly impacted by sky emission.

The resolution depends on the width of the slit or on the size of the image in slitless mode, because all a spectrograph does is create an image of the focal plane after dispersing the light. The ``width'' of a spectral line will be given by the width of the slit or the image, whichever is smaller. In reality, the spectral line width is a convolution of the slit/image profile with diffraction. The spatial resolution of the detector may also be important.

Note that throughput may also depend on the slit width, depending on the seeing, so maximizing resolution may come at the expense of throughput.

Given a linear slit or image width, $ \omega$ (or an angular width, $ \phi$ = $ \omega$/f , where f is the focal length of the telescope) and height h (or $ \phi^{\prime}_{}$ = h/f ), we get an image of the slit which has width, $ \omega^{\prime}_{}$ , and height, h$\scriptstyle \prime$ , given by

h$\scriptstyle \prime$ = h$\displaystyle {f_2\over f_1}$

$\displaystyle \omega^{\prime}_{}$ = r$\displaystyle \omega$$\displaystyle {f_2\over f_1}$

where we have allowed that the dispersing element might magnify/demagnify the image in the direction of dispersion by a factor r , which is called the anamorphic magnification.

Using this, we can derive the difference in wavelength between two monochromatic sources which are separable by the system.

$\displaystyle \delta$$\displaystyle \lambda$ = $\displaystyle \omega^{\prime}_{}$$\displaystyle {d\lambda\over dx}$

$\displaystyle \delta$$\displaystyle \lambda$ = r$\displaystyle \omega$$\displaystyle {f_2\over f_1}$$\displaystyle {d\lambda\over dx}$

The bigger the slit, the lower the resolving power.

The resolution is often characterized in dimensionless form by

R $\displaystyle \equiv$ $\displaystyle {\lambda\over \delta\lambda}$ = $\displaystyle {\lambda f_1 \over r \omega f_2 (d\lambda/dx)}$

Note that there is a maximum resolution allowed by diffraction. This resolution is given aproximately by noting that minimum angles which can be separated is given by approximately $ \lambda$/d2 , where d2 is the width of the beam at the camera lens, from which the minimum distance which can be separated is:

$\displaystyle \omega_{{min}}^{}$ = f2$\displaystyle {\lambda\over d_2}$

The slit width which corresponds to this limit is given by:

$\displaystyle \omega^{\prime}_{}$ = r$\displaystyle \omega$$\displaystyle {f_2\over f_1}$ = f2$\displaystyle {\lambda\over d_2}$

or

$\displaystyle \omega$ = $\displaystyle {f_1 \over r}$$\displaystyle {\lambda\over d_2}$

and the maximum resolution is

Rmax = $\displaystyle {d_2 \over f_2 (d\lambda/dx)}$ = d2$\displaystyle {d\theta\over d\lambda}$

Astronomical spectrographs

Slitless spectographs.

Long slit spectrographs

Image slicers: preserving resolution and flux.

Fiber spectrographs: multiobject data.

Slitlets: multiobject data.

Integral field spectrographs.

Dispersing elements

Prisms

Perhaps the simplest conceptual dispersing element is a prism, which disperses light because the index of refraction of many glasses is a function of wavelength. From Snell's law, one finds that:

$\displaystyle {d\theta\over d\lambda}$ = $\displaystyle {t\over d}$$\displaystyle {dn\over d\lambda}$

where t is the base length, and d is the beamwidth. Note that prisms do not have anamorphic magnification (r = 1 ). The limiting resolution of a prism, from above is:

Rmax = $\displaystyle {d_2 \over f_2 (d\lambda/dx)}$ = d2$\displaystyle {d\theta\over d\lambda}$

Rmax = t$\displaystyle {dn\over d\lambda}$

One finds that dn/d$ \lambda$ $ \propto$ $ \lambda^{{-3}}_{}$ for many glasses.

So dispersion and resolution are a function of wavelength for a prism. In addition, the resolution offered by a prism is relatively low compared with other dispersive elements (e.g. gratings) of the same size. Typically, prisms have R < 1000 . Consequently, prisms are rarely used as the primary dispersive element in astronomical spectrographs. They are occasionally used as cross-dispersing elements.

Gratings

Diffraction gratings work using the principle of multi-slit interference. A diffraction grating is just an optical element with multiple grooves, or slits (not to be confused with the slit in the spectrograph!). Diffraction gratings may be either transmissive or reflective. Bright regions are formed where light of a given wavelength from the different grooves constructive interferes.

The location of bright images is given by the grating equation:

m$\displaystyle \lambda$ = $\displaystyle \sigma$(sin$\displaystyle \theta$ + sin$\displaystyle \alpha$)

for a reflection grating, where $ \sigma$ is the groove spacing, m is the order, and $ \alpha$ and $ \theta$ are the angles of incidence and diffraction as measured from the normal to the grating surface.

The dispersion of a grating can then be derived:

$\displaystyle {d\theta\over d\lambda}$ = $\displaystyle {m\over \sigma \cos\theta}$

One can see that the dispersion is larger at higher order, and for a finer ruled grating. The equation can be rewritten as

$\displaystyle {d\theta\over d\lambda}$ = $\displaystyle {\sin\theta + \sin \alpha \over \lambda \cos\theta}$

from which it can be seen that high dispersion can also be achieved by operating at large values of $ \alpha$ and $ \theta$ . This is the principle of an echelle grating, which has large $ \sigma$ , and operates at high m , $ \alpha$ and $ \theta$ , and gives high dispersion and resolution.

Typical gratings have groove densities between 300 and 1200 lines/mm. Echelle gratings have groove densities between 30 and 300 lines/mm.

One can derive the anamorphic magnification for a grating by looking at how $ \theta$ changes as $ \alpha$ changes at fixed $ \lambda$ . One finds that:

r = $\displaystyle {d\theta\over d\alpha}$ = $\displaystyle {\cos \alpha\over \cos \theta}$ = $\displaystyle {d_1\over d_2}$

where the d 's are the beam diameters. Note that higher resolution occurs when r < 1 , or $ \theta$ < $ \alpha$ .

The limiting resolution can be derived:

Rmax = $\displaystyle {d_2 \over f_2 (d\lambda/dx)}$ = d2$\displaystyle {d\theta\over d\lambda}$

Rmax = $\displaystyle {d_2 m\over \sigma \cos\theta}$ = $\displaystyle {m W\over \sigma}$ = mN

where W is the width of the grating ( = d2/cos$ \theta$ ), and N is the total number of lines in the grating.

Note that light from different orders can fall at the same location, leading to great confusion! This occurs when

m$\displaystyle \lambda^{\prime}_{}$ = (m + 1)$\displaystyle \lambda$

or

$\displaystyle \lambda^{\prime}_{}$ - $\displaystyle \lambda$ = $\displaystyle {\lambda\over m}$

The order overlap can be avoided using either an order-blocking filter or by using a cross-disperser. The former is more common for small m , the latter for large m .

One can compare grating operating in low order, those operating in high order, and prisms, and one finds that higher resolution is available from gratings, and that echelles offer higher resolution than typical low order gratings.

We can also discuss grating efficiency, the fraction of incident light which is directed into a given diffracted order. One finds that for a simple grating, less light is diffracted into higher orders. However, one can construct a grating which can maximize the light put into any desired order by blazing the grating, which involves tilting each facet of the grating by some blaze angle. The blaze angle is chosen to maximize the efficiency at some particular wavelength in some particular order; it is set so that the angle of diffraction for this order and wavelength is equal to the angle of reflection from the grating surface.

Volume phase holographic (VPH) gratings.

Grisms

A grism is a combination of a prism and a diffraction grating. These are combined such that light is dispersed, but light at a chosen central wavelength passed through the grism with direction unchanged. This feature allows grisms to be placed in an imaging system (e.g., in a filter wheel) to provide a spectroscopic (usually low resolution) capability.

Operational items: using a spectrograph

Choice of dispersion: wavelength coverage vs. dispersion/resolution, available gratings, etc. Using grating tilt to select wavelength range.

Choice of slit width (science, seeing).

How to put object in slit. Imaging the slit. Slit viewing cameras.

(DEFER FOLLOWING TO SECTION ON DATA REDUCTION???)

Spectrograph calibration (not including basic detector calibration, to be discussed soon).

Wavelength calibration: correspondance between pixel position (in wavelength dimension) and wavelength. Arc lamps, wavelength solutions. Subtleties: extrapolation, line curvature, flexure (using skylines to calibrate).

Flux calibration: relative fluxes at different wavelengths. Spectrophotometric standards. Subtleties: differential refraction

Spectral extraction: object extraction and sky subtraction. Subtleties: S-distortion, differential refraction: spectral traces. Issues: variation of focus along slit and implications for sky line subtraction, scattered light.

Relative fluxes along slit: slit width variations.

Examples of typical spectra: line lamps, flat fields, stellar spectra, galaxy spectra. Night sky emission.

Non-dispersive spectroscopy

It is also possible to use interference effects to measure spectral energy distributions instead of a dispersing element. The Fabry-Perot is an example of such a type of instrument, although it does not record all wavelengths simultaneously.

Another instrument which uses interference to infer spectroscopy information is the Fourier Transform Spectrometer (FTS), which is basically a scanning Michaelson interferometer. The light from the source is split into two parts using a beamsplitter. One part of light is reflected off a fixed flat mirror and the other is reflected off a mirror which can be moved laterally. The two images are combined to form fringes. The fringe pattern changes as the path length of the second beam is changed. The intensity modulation for a given wavelength ($ \lambda$ ) or wavenumber ( k = 2$ \pi$/$ \lambda$ ) is given by:

T(k,$\displaystyle \Delta$x) = $\displaystyle {T_{max}\over 2}$[a + cos(2k$\displaystyle \Delta$x)]

and the flux after integrating over all wavelengths is:

F($\displaystyle \Delta$x) = C$\displaystyle \int$I(k)T(k,$\displaystyle \Delta$x)dk = C$\displaystyle \int$I(k)cos(2k$\displaystyle \Delta$x)dk

where I(k) is the input spectrum. Consequently it is possible to recover the input spectrum by taking the Fourier cosine transform of the recorded intensity. In practice, a discrete Fourier transform is used.

The FTS requires scanning in path spacing. But unlike the Fabry-Perot, it yields information on intensity at all wavelengths simultaneously.


next up previous
Next: Detectors Up: AY535 class notes Previous: Effects of the earth's
Jon Holtzman 2009-11-04