Source Segmentation and classification ###################################### Background and Rationale ======================== We define "segmentation" as the process of breaking up an astronomical image into individual sources and measuring basic properties of the sources such as position and flux. "Classification" is the process of analyzing the light distribution of each source in order to determine what type of object it is and provide further detail of its properties. Several software packages to segment and/or classify sources exist (SExtractor, FOCAS, DAOPHOT, GIM2D). Segmentation is usually done using a peak finding or thresholding algorithm, often after filtering to remove sky and to improve the ability to detect particular types of objects. Classification requires a more detailed look at the light distribution and techniques vary widely. A simple goal of classification may be to discriminate between stars and galaxies, while a more ambitious goal may be to classify different galaxy types or to measure structural properties. The end product of segmentation and classification is a "source catalog" which lists the sources and their properties. There are many uses for source segmentation and classification in HST direct images. They can be loosely grouped as operation or science reasons. The primary operational uses of source catalogs concern astrometry. By cross correlating the source catalog of an image with the Guide Star Catalog or other large scale astrometric surveys one can improve the absolute World Coordinate System (WCS) in the image header. With or without astrometric standards in the field of view, one can use the relative position of sources in overlapping images to fine tune alignment and registration of images, thus allowing improved dither combined products. The need for better WCS headers is discussed in more detail in ****Megan's section****, while the need for making improved dithered products is discussed in ****Bill's section****. Many science uses can be made of source catalogs. The limits are primarily set by what the software measures which could easily include: [1] positions; [2] fluxes (within apertures, isophotes or model fitted); [3] crude morphological information (position, angle, ellipticity of an isophote, image moments); [4] star-galaxy discrimination; [5] local sky level; and [6] detailed morphological properties (such as the bulge/disk ratio, asymmetry parameters, and galaxy classification). These quantities are already valuable in instrumental units (e.g. pixel positions, flux in DN). Additional value is made by calibrating them (e.g. world coordinate positions and fluxes in f_lambda or f_nu). Since the calibrations are essentially in the image headers this is readily achieved, either as a part of the segmentation and classification process, or offline by the user. Additional scientific utility can come from a statistical analysis of the source catalog or by matching several catalogs. Possible examples include determining the depth of an image from a magnitude histogram, plotting color magnitude diagrams, looking for variability from repeated observations, determining photometric redshifts, and measuring lensing or shear with image shape parameters. Segmentation and classification can be done at various stages of the image processing, and the uses one makes of the catalogs can be suitably tailored to the needs of that stage of processing. For example structural properties may not be of much use before all the images of a field in a given filter are dither combined, but the pixel positions may be very useful for fine tuning the registration so as to make a better dithered product can be made. As one proceeds from the operational uses to the various scientific uses [1] the quality requirements increase; [2] need for intelligent processing increases (e.g. field dependence sparse vs. crowded, distant vs. nearby galaxies, etc); [3] the applicability and target audience narrows; and [4] the methods become more specialized. Assumptions =========== Existing segmentation software works best with cosmic ray free images. If segmentation is to be done on single CCD frames, the software must be able to identify cosmic rays (and perhaps remove them). - Are these catalogs going to be archived by MAST? **** - OTFR implications ****. Required Decisions ================== **** Minimum, Intermediate and Maximum Goals ======================================= [1] The minimum goal is to provide operational grade source catalogs. these can be constructed both before and after dither combining. The operational grade catalog will be geared towards tieing down the WCS and improving the image registration and alignment prior to dither combining. The segmentation software must provide good pixel positions and crude fluxes of compact local maxima and must be very robust against false detections (e.g. have a high threshold cutoff). [2] An intermediate goal is to provide "easy" science grade catalogs. These should be made after cosmic ray or dither combining images. The primary purpose of these is quick look and assessment of sources. The segmentation software must be geared to the simplest simplest fields (e.g. distant galaxy fields), and must be reasonably robust against false detections. The users should be made aware of limitations of these catalogs (i.e. buyer beware). [3] The maximum goal is to provide complete science grade catalogs. These would have the deepest possible threshold and include more esoteric measurements (e.g. galaxy morphology). These should be constructed after cosmic ray or dither combining images. The software for constructing a complete science grade source catalog requires the most intelligence in processing (e.g. field dependence in threshold setting, bulge/disk decomposition only turned on for the highest S/N extended sources) and requires the most quality control of output (i.e. error estimates). Again, the users should be made aware of limitations of these catalogs (i.e. buyer beware). Implementation Plan =================== It is likely that the same software (e.g. SExtractor) can be used to achieve goals [1], [2], and much of goal [3] (specialized classification software may be needed for potential science goals such as bulge/disk decomposition and galaxy morphology determination). The difference between the different grades of catalogs is largely a matter of which parameters are measured and how the threshold for object detection is set. Hence the easy science grade catalog can be consisdered a subset of the complete science grade catalog, and the operational grade catalog a subset of both of these. Research phase: The main work required for implementing any segmentation and classification software is research. First the various existing software packages for segmentation must be identified and tested on the complete range of existing HST data (e.g. all imaging instruments, crowded and sparse fields, some with foreground sources such as nearby galaxies, taken with a range of exposure times) as well as simulated images (for which we know the truth, and to plan for upcoming instruments such as WF3). The aim of the research would be to find out which packages work best and which parameters should be adjusted to make the cuts between the different grades of catalogs. Planning phase: Next the data requirements (rules) for each level of implementation, should be determined. These rules could include: only use CR-combined images; do not use on moving targets; position constraints (galactic latitude, distance from nearby galaxies, etc...). This will also allow us to flesh out the goals of the different grades of source catalogs and hence make a more detailed implementation plan. Implementation phase: The rest of the work would involve writing wrappers for the existing software to produce each catalog grade (which will successively be phased in). In addition lots of stress testing will be required to make the scripts robust and document the limitations of the different catalog grades (e.g. false detection rate as a function of S/N). Required Resources and timescales ================================= **** should discuss this. This is vaguely based on ACS IDT experience. Research phase: at least one scientist/year. Planning phase: scientist and programmer working together for three months. Implementation phase: On order of three programmer/months to implement each catalog grade, followed by a similar amount of time by 1-2 scientists to stress test and document each grade.