Eqs. (4) and (5) indicate that two quantities, the GOF criterion,
, and the image/model prior,
, are of key importance
in obtaining the most probable (i.e., M.A.P.) values for the
image/model. In this section we shall concentrate on the development of
a pixon-based prior.
Eq. (4) will form the basis of our pixon-based methods; our goal is to determine the M.A.P. image/model pair. Like ME methods we shall base our image/model prior on counting arguments. Unlike standard ME methods, however, we will allow certain aspects of the model to vary simultaneously with the image. By allowing both the image and model to vary simultaneously, we are optimizing our solution over a considerably larger solution space than methods which hold the model constant. Previous workers (Gull 1989, Sibisi 1989, Skilling 1991, MacKay 1992a,b) have already demonstrated the merits of varying the model and have shown the efficacy of selecting between models by maximizing the Evidence for the model.
Our pixon-based methods attempt optimize the quality of the image
reconstruction by varying the portion of the model that deals with the
way in which the image is mathematically. To show how the selection
of image representation (we shall use the
term ``basis'') affects the quality of the reconstruction, let us first
consider the abstract nature of an image and how a generalized
image/model prior might be constructed. To do this we shall follow
closely the development of Piña and Puetter (1993) and Puetter and
Piña (1993a). They pointed out that in an abstract sense, an image
is a collection of distinguishable events which occur in distinct
cells. Hence the value for the image/model prior can be determined
from simple counting arguments. If there are events in cell
, and a total of
cells, then the prior probability of that
particular image is:
where is the set of all numbers of events in
cells
,
is the total number of events, i.e.,
, and the image is now considered to be made up of these events.
[In practical terms, an event is a photon count in a photon
counting detector or the number of counts in units of the standard
deviation of the noise in non-photon counting systems. Furthermore,
these ``events'' are, in fact, units of information, i.e., the knowledge
that something (an event) has occurred is the minimal unit of
information - see Piña and Puetter (1993) for a discussion.
This sense of the term ``information'' is somewhat different than the
classical definition which is in terms of the logarithm of the number
of states, i.e., Shannon information.] Also
note, that the cells used in Eq. (6) are quite general. In their
definition we have not specified a size, shape, or position for the
cells. The cell concept simply serves to localize some collections of
events.
Since the goal of our reconstruction is to determine the M.A.P.
image/model pair, we must maximize the product of the image/model prior
given in Eq. (6) and the GOF term. Most people have a well
developed intuition regarding how to maximize the GOF term, i.e., the
residuals, , must be comparable to the noise
(strictly speaking, they should be exactly equal to the noise).
Intuition concerning priors is usually less well developed. Eq. (6),
however, points out the a priori desirable properties of the
model for the image. These are that the model should contain the
fewest number
of cells with each containing the largest number of events consistent
with maintaining an adequate GOF. We shall call these generalized
cells pixons. The pixon name recognizes the pixel (or cell) heritage,
and the ``-on'' suffix recognizes the fundamental nature of the pixon in
that the pixons represent an optimal set of cells. Ideally, an image's
pixons represent the smallest number of cells (of arbitrary shape,
position, etc.) required to fit the data, and represent the minimum
Degrees of Freedom (DOFs) necessary to specify the image. If properly
selected, this set is irreducible to a smaller set. Hence pixons are
the fundamental units of information in the image. Using a pixon basis
for the image is the fulfillment of Occam's Razor formalized in Bayesian
terms - it forces the use of the simplest model consistent with the data.
The simple counting arguments presented in the section above point out the crucial features of the pixon basis: there should be the fewest number of pixons consistent with fitting the data within the accuracy allowed by the noise. In our attempts to derive suitable, practical pixon bases for image reconstruction, we adopted techniques which are similar to those adopted by other authors, i.e., a correlation length method (c.f. Weir 1991, 1993a). This approach controls the number of DOFs by reducing the independence of different parts of the image through explicit spatial correlation. This also causes the resulting degrees of freedom (or pixons) to be ``fuzzy'', i.e., to be localized but without hard boundaries. This still allows the use of the pixon prior of Eq. (6), although it does introduce a few computational complexities and mental hurdles for the intuition of the uninitiated. Nonetheless, the practical and performance merits of this approach seem to warrant these modest burdens.
Explicitly, then, our procedure for reducing the DOFs in the image
reconstruction is to define the image in terms of a pseudo-image,
, convolved with a local correlation
length,
:
where is a pixon shape function and
. Hence our procedure
is somewhat akin to representing the image in terms of non-orthogonal
wavelets. As a matter of practicality, the
pseudo-image is defined on a pseudo-grid which typically has a
resolution as fine or finer than the data pixel grid. The image is
then also defined on a grid with the resolution of the pseudo-grid.
Because of the local correlation in Eq. (7), however, the number
of DOFs in the image can be greatly reduced from the number of pixels
in the pseudo-grid. For example, if the local correlation length at
position
is 10 pseudo-pixels then each 100 pixels
(10 by 10 pixels) represent a single DOF at this location. Reduction
in the DOFs greatly improves the formal value of the image/model prior
and removes many of the problems commonly seen in
competing methods, such as signal correlated residuals and the production
of spurious sources - see below.
The iterative scheme we typically employ for calculating the M.A.P.
image/model pair starts
with an initial guess for the model, i.e., the spatial correlation
lengths. A common starting point is to assume that the scale lengths
are all equal to 1 pseudo-pixel. This is equivalent to starting out
with the standard ME solution for the image. In other words, for the
first estimate for the image the fuzzy pixon prior is essentially the
ME prior and the GOF criterion can be chosen to be the standard
chi-squared value of the residuals. In practice, however, we typically
use a simple GOF solution and ignore the ME prior. This is
considerably faster in practice and results in a very good first
guess. The next step estimates the new local scales, holding the image
fixed. This is done by maximizing
,
i.e., finding the M.A.P. model given
the fixed data and current image estimate,
. [Note the
parallel and complementary nature of the M.A.P. model estimate relative
to the M.A.P. image estimate of Eq. (5).] In our current
implementations, this M.A.P. model is determined in only an approximate
manner. We simply note, for example, that the prior term,
, will insist on the largest possible correlation
lengths consistent with the GOF, while the GOF term is indifferent to
very small correlation lengths since they should always produce
acceptable fits. Our procedure is thus simply to find the largest
local correlation lengths that provide an acceptable fit. Once the
local scales have been determined, a new image is calculated, etc., and
the entire procedure iterated until convergence is obtained.
A bit of intuition into this procedure reveals the fundamental reasons
for this method's success. Effectively, this is a fractal technique.
Here we use the term ``fractal'' in a limited sense.
Independent of the exact method for obtaining the local correlation
lengths (e.g., the brute force method or an iterative method as
described above), our procedure explicitly seeks the local smoothing
scale at which the GOF ceases to be sensitive,
i.e., we are looking for the smallest scale for which there is evidence
in the data. The use of the pixon prior ensures that our procedure
takes the largest, or least informative,
scale consistent with fitting the data. To be explicit about our
use of the term fractal, we do not mean to suggest that
self-similarity plays a central role in our methods, but rather
that just as in many fractal concepts, our procedure analyzes how a
geometric quantity varies as the local scale is varied. In this case
we ask how varies as the local scale is varied, just as the
definition of fractal dimension asks how does the measure (length,
area, etc.) of a geometric object vary as the local scale is changed.
For this reason, we have named this entire class of methods
Fractal-Pixon methods and the pixon representation of the image the
Fractal-Pixon Basis (FPB).
