X-Sender: gaffney@comet.stsci.edu Date: Tue, 18 Dec 2001 17:23:18 -0500 To: "Howard A. Bushouse" From: Niall Gaffney Subject: Re: New "Customized Reprocessing" SHARE writeup Cc: bushouse@stsci.edu, dickinson@stsci.edu, donahue@stsci.edu, fraquelli@stsci.edu, gaffney@stsci.edu, gak@stsci.edu, keyes@stsci.edu, koekemoe@stsci.edu, mauro@stsci.edu, meurer@pha.jhu.edu, padovani@stsci.edu, sparks@stsci.edu, swade@stsci.edu Mime-Version: 1.0 Howard, One missing issues here is how a user actually USES these resources. Creating such a potentially very useful facility without having the ability for users to use it would be a problem, and needs to be considered at this point. I can see several options here, which have their good sides and bad sides. I present these more to start the discussion than any thing else... 1> the user is an expert and goes into a request and manually sets things there. Not a great solution for most users but it is infinitely flexible. The down side is that this leads to more user error on requests and many resubissions which tax the system and limits use of it to those who probably know enough to do all this without this system (as they are happy to run calXXX on their own). 2> The User Interfaces have some scheme where you can go for EACH instrument in the request and set switches and values. This will require a lot of development for the web and starview that is not in the estimate (perhaps 1 month each plus testing and getting feedback). It also adds another layer of dependance, so if calXXX changes, so will at least 3 other systems. Hence turnarounds on changes to the calXXX systems will be very costly with reguard to testing and integration. Another down side of this is that if a person wants to play with switches (say the cosmic ray rejection threshold) you have to wait for results from dads to see what happened and will probably have several requests to try to find that "sweet spot". Users who do this will soon find it better to do this locally, where the turn around is many times faster. 3> We limit the user interface options to a small subset of options that are generic enough that most users will understand them. Say things like CR rejection threshold, spectral extraction width and extraction centroid. This requires somewhat less interface development but a bit more backend development where combinations of Instrument, parameter, and value get mapped to actual calXXX parameter values, files, and switches. The problem here is how does the user know what to set these to without having done a data request already? This leads to many requests again to tweak parameters and users waiting for their results, and finding it much faster to get the raw data and play with parameters locally. Further it significantly limits what the user can do with the system. 4> Have the ability to ATTACH to a request a set of parameter files that were the output from some things they did in some locally run pipeline that worked. The benefit here is that this allows the user to get some small subset of raw data and calibration files and play with it until they find the OTFR parameters they like. Then the user can do the equivalent of attaching the correct .par files from the pipeline to the request and have them used (which for most non-IRAF experts could be tough to do as many IRAF users have no idea where there param directory is, much less what it contains). The benefit here is that the users can play with the parameters without taxing the OTFR/DADS system to find the way they want their pipeline run, and then have the pipelines here run that way so they don't have to do it. The downside is it requires the user to install, run, and work with the pipeline (which may not be a drawback as then at least they will understand what they are doing) and know how to get raw data and calibration files. This would also require some development on the web/starview sides to somehow make it possible to attach such parameters from IRAF, process them and have them included in the request XML object in some way DADS/OTFR can use. 5> make a version of calXXX that understands how to get data out of DADS. So you could start off with simply a datasetname and have the calXXX task interacts with the databases here to figure out what files are needed, looks at its local disks and sees what its got, build a retrieval request that gets all the data it does not have locally, and then runs allowing the user to set the switches. This ones sort of pie in the sky, but it would be cool. The down side is that this would create a tasks that could run for a very long time (especially if the databases or DADS is down) and confuse the user to no end as many would not be aware of what was going on. I don't have an easy answer (obviously), and there are probably many other possible solutions which may be simpler to implement than some of these I pose. However these issues need to look at and benefit of creating such a system weighted against both the cost and the usefulness of the enhanced system before we implement the backend. Further they need to be worked into the timeline and cost sections (with the exception of #1 of course). Niall At 03:58 PM 12/18/2001 -0500, Howard A. Bushouse wrote: >Customized Reprocessing of Archived Data >======================================== > >Description: > >Allow users to specify customized parameters to be used when running OTFR >to tailor the pipeline calibration to their scientific needs. > >Scientific Case: > >The standard processing applied in instrument calibration pipelines is >not always optimized for individual observations. A typical routine >example is the extraction of a spectrum for a STIS spectral observation. >The pipeline assumes the target is a point source in all cases and uses >an extraction aperture based on the point-source assumption. Summed or >spatially resolved spectra of extended sources require extraction apertures >that are customized in size and placement along the slit. Another example >would be choosing different parameters for performing cosmic-ray rejection. > >Unique STScI capabilities: > >The calibration pipelines are STScI developed software, and the resident >instrument groups in the HST Division are the best source of knowledge for >what aspects of customized reprocessing would have the greatest utility. > >Drawbacks: > >Calibration pipeline software is already distributed to the community in >STSDAS, and this currently allows users to reprocess their data locally >in a customized fashion when necessary. As described, this capability >would only add a special archive interface and offload the computations >from the user's site to STScI. This may cause performance problems for >local STScI servers, depending upon the load. > >Assumptions: > > Adding user-selectable processing options to StarView retrieval > screens gives users access to only those processing options that > are already available and built into the various HST instrument > processing pipelines (calxxx routines). Furthermore, the only > types of options can that be enabled via this interface are > those that do not require interactive input or guidance from > the user, nor can the user provide any type of input data file. > >Required Decisions: > > A decision must be made as to whether or not any particular > processing option in a pipeline should be made available via > OTFR processing and retrieval methods. Options that significantly > increase the required processing time or require iterative > settings of options to achieve optimum results, for example, > may be best left to the individual user to perform on their own. > >Min and Max Goals: > > The range of goals for this particular project is quite narrow, > because implementing the necessary mechanism for any processing > option essentially meets the needs for access to all options. > The goal is to implement the ability for allowing users to set all > available pipeline options that meet the assumptions stated above. > >Implementation Plan: > > This work is already planned as a follow-on to the basic DADS > redesign that is currently in progress. The enhancements listed > below will be implemented after basic functionality of the new > system has been finished. > > Necessary steps in DADS distribution: > > 1. Modify the distribution DTD by adding to OTFR processing options > the ability to specify lists of "keyword = value" items, attaching > them either globally or to specific rootnames. > > 2. Modify the XML parser to pick up these lists as part of the OTFR > processing options. > > 3. Modify the transaction generator to pass these lists along to the > OTFR transactions that are generated. > > 4. Modify the OTFR transaction and the OTFR message format to include > these "keyword = value" lists. > > Necessary steps in OTFR processing: > > 1. Modify OTFR processing to accept "keyword = value" lists and give > them to the appropriate calibration software step (which could be > done via the command line). > >Required Resources: > > DADS: 1 FTE month for coding and unit testing > OTFR: 1 FTE month for coding and unit testing > Combined system: 1 FTE month for testing > Total: 3 FTE months > >Time Scales: > > Expect that work can begin in late 2002, around Oct. 1. Operational > implementation would then occur in early 2003. -------------------------- Niall Gaffney - StarView gaffney@stsci.edu