# Full text of "Kepler Presearch Data Conditioning II - A Bayesian Approach to Systematic Error Correction"

## See other formats

(N o (N Kepler Presearch Data Conditioning II - A Bayesian Approach to Systematic Error Correction Jeffrey C. Smitli*'^'^, Martin C. Stumpe^'^, Jeffrey E. Van Cleve^'^, Jon M. Jenlcins^'^, Thomas S. Barclay^''^, Michael N. Fanelh^'^, Forrest R. Girouard^'^, Jeffery J. Kolodziejczalf^, Sean D. McCauliff^'^, Robert L. Morris^'^, Joseph D. Twiclcen^'^ ABSTRACT r- O > m 00 en rn O (N X With the unprecedented photometric precision of the Kepler Spacecraft, sig- nificant systematic and stochasti c errors on tran sit signal levels are observable in the Kepler photometric data (jjenkinsi l2010al ). These errors, which include discontinuities, outliers, systematic trends and other instrumental signatures, obscure astrophysical signals. The Presearch Data Conditioning (PDC) module of the Kepler data analysis pipeline tries to remove these errors while preserving planet transits and other astrophysically interesting signals. The completely new noise and stellar variability regime observed in Kepler data poses a signifi cant probl em to standar d cotrending methods such as SYSREM (JTamuzl |2005| ) and TEA ( lKovacsll2005l ). Variable stars are often of particular astrophysical interest so the preservation of their signals is of significant importance to the astro phys- ical c ommunity. We present a Bayesian Maximum A Posteriori (MAP) (JKay I993I ) approach where a subset of highly correlated and quiet stars is used to generate a cotrending basis vector set which is in turn used to establish a range of "reasonable" robust fit parameters. These robust fit parameters are then used to generate a Bayesian Prior and a Bayesian Posterior Probability Distribution Function (PDF) which when maximized finds the best fit that simultaneously removes systematic effects while reducing the signal distortion and noise injec- tion which commonly afflicts simple least-squares (LS) fitting. A numerical and empirical approach is taken where the Bayesian Prior PDEs are generated from fits to the light curve distributions themselves. email: jefFrey.sniith@nasa.gov iNASA Ames Research Center, Moffett Field, CA 94035, USA ^SETI Institute, 189 Bernardo Ave, Suite 100, Mountain View, CA 94043, USA ■^Bay Area Environmental Research Institute, 560 Third St West, Sonoma, CA 95476, USA "^Orbital Sciences Corporation, 21839 Atlantic Blvd, Dulles, VA 20166, USA ^Marshall Space FHght Center, Huntsville, AL 25812, USA -2- Subject headings: Stars; Extrasolar Planets; Data Analysis and Techniques 1. An Overview of the Kepler Data Pipeline Kepler's primary science objective is to determine the frequency of Earth-size planets transiting their Sun-like host stars in the habitable zono This daunting task demands an instrument capable of measuring the light output from each of over 100,000 stars simultane- ously with an unprecedented photometric precision of 20 parts per million (ppm) at 6.5-hour intervals. The large number of stars is required because the probability of the geometrical alignment of planetary orbits that permit observation of transits is the ratio of the size of the star to the size of the planetary orbit. For Earth-like planets in 1-Astronomical Unit (AU) orbits about Sun-like stars, only ~0.5% will exhibit transits. By observing such a large number of stars, Kepler is guaranteed to produce a robust result in the happy event that many Earth-size planets are detected in or near the habitable zone. The Kepler Data Pipeline is divided into several components in order to allow for efficient management and parallel processing of data. Raw pixel data downlinked from the Kepler photometer are calibrat ed by the Calib ration module (CAL) to produce cal ibrated targe t and background pixels flQuintanal l2010l ) and their associated uncertainties (jClarkd l2010l ). The calibrated pixels are then processed by the Photometric Analysis module (PA) to fit and remove sky background and e xtract sirnple ap erture photometry from the background- corrected, calibrated target pixelcl (lTwickenll2010al ). PA also measures the centroid locations of each star in each frame. The final step to produce light curves is performed in the Pre- search Data Conditioning module (PDC), where signatures in the light curves correlated with systematic error sources from the telescope and spacecraft, such as pointing drift, focus changes, and thermal transients are removed. Additionally, PDC identifies and removes Sudden Pixel Sensitivity Dropouts (SPSDs) which result in abrupt drops in pixel fiux with short recovery periods up to a few hours, but usually not to the same fiux level as before. These step discontinuities are identified separately from those due to operational activities, such as safe modes and pointin g tweaks, and are m ended using a sophisticated method described in a companion paper (JKolodzieiczakll2012l ). PDC also identifies residual isolated ^Thc habitable zone is defined as the range of orbital distances for which liquid water would pool on the surface of a terrestrial planet such as Earth, Mars, or Venus without greenhouse gas adjustments to the atmosphere. ^In simple aperture photometry, the brightness of a star in a given frame is measured by summing up the pixel values containing the image of the star. -3- outliers and fills data gaps (such as during intra-quarter downlinks) so that the data for each quarterly segment is contiguous when presented to later pipeline modules. In a final step, PDC adjusts the light curves to account for excess flux in the optimal apertures due to starfield crowding and the fraction of the target star flux in the aperture to make apparent transit depths uniform from quarter to quarter as the stars move from detector to detector with each roll maneuver. Output data products include raw and calibrated pixels, raw and systematic error-corrected flux time series, and centroids and associated uncertainties for each target star, which are archived to the Data Manager nent Center and made available to the public through the Multimission Archive at STS cIq (McCaulif!ll2010l ). A companion paper describes the details of overall PDC architecture ( Stumpell2012[ ). Data is then passed to the Transiting Planet Search module (TPS) fjjenkinsi l2010bf ) where a wavelet-based adaptive matched filter is applied to identify transit-like features with durations in the range of 1 to 16 hours. Light curves with transit-like features whose combined signal-to-noise ratio (SNR) exceeds 7.1a for a specified trial period and epoch are designated as Threshold Crossing Events (TCEs) and subjected to further scrutiny by the Data Validation module (DV). DV performs a suite of statistical tests to evaluate the confidence in the transit detection, to reject false positives by background eclipsing binaries, and to extract physical parameters of each syste ni (along w i th associated uncertainties and covariance matrices) for each planet candidate fjWu I l2010l : iTenenbaum 1 120101 ) . After the planetary signatures are fitted, DV removes them from the light curves and searches over the residual time series for additional transiting planets. This process repeats until no further TCEs are identified. The DV results and diagnostics are then furnishe d to the Scienc e Team to facilitate disposition by the Follow-up Observing Program (FOP) (JGautier Il2010l ). 2. A Bayesian Approach to Correcting Systematic Errors Kepler is opening up a new vista in astronomy and astrophysics and is operating in a new regime where the instrumental signatures compete with the minuscule signatures of terrestrial planets transiting their host stars. The dynamic range of the intrinsic stellar variability observed in the Kepler light curves is breathtaking: RR Lyrae stars explosively oscillate with periods of approximately 0.5 days, doubling their brightness over a few hours. Some flare stars double their brightness on much shorter time scales at unpredictable inter- vals. At the same time, some stars exhibit quasi- coherent oscillations with amplitudes of 50 ppm that can be seen by eye in the raw flux time series (jjenkinsll2010d ). The richness of ^ http://stdatu.stsci.edu/kcplcr/ -4- Kepler's data lies in the huge dynamic range for the variations in intensity by 4 orders of magnitude and the range of time scales probed by the data, from a few minutes for SC data to weeks, months, and ultimately, to years. Given that Kepler was designed to be capable of resolving small 100-ppm changes in brightness over several hours, it is remarkably rewarding that it is revealing so much more. The challenge is that an instrument so sensitive to the amount of light from a star striking a small collection of pixels is also very sensitive to small changes in its environment. The systematic errors observed in Kepler light curves exhibit a range of different time scales, from a few hours to several days to many days and weeks. Such phenomena include, for example, temperature variations of the reaction wheel housing over the 3 day momentum managements cycles and the resultant focus changes of ~2.2 /im per °C. Large thermal effects can be observed in the science data for ~5 days after recovering from intermittent safe modes, and for ~3 days after attitude changes required to downlink the data each month which is due to different sides of the spacecraft being heated during downhnks and subsequent thermal recoveries. Another prominent systematic is Differential Velocity Aberration (DVA) and the orbital period which results in gradual trends in the data over each quarter. The principle objective of PDC is to remove these systematic effects by cotrendingQ. The fact that most systematics such as these affect all the science data simultaneously, though to differing degrees, provides significant leverage in dealing with these effects. 2.1. The basic problem and the principle behind the solution It is standard practice when removing systematic errors in stellar data to use robust least-squares (LS ) on a set of basis vectors as is used in methods such as SYSREM (JTamuz 20051 ) and TEA flKovacsll2005l ). A robust least-squares approach, as outlined below in sec- tion 12. 2[ can find a chance linear combination of the systematic error model components that reduces the bulk Root Mean Square (RMS) at the expense of distorting the intrinsic stellar variations and introducing additional noise on short timescales. The fundamental problem with this approach is the fact that the implicit model fitted to the data for each star is incomplete. Least-squares cotrending projects the data vector onto the selected basis vectors and removes the components that are parallel to any linear combination of the basis vectors. This process is guaranteed to reduce the bulk RMS residuals, but may do so at ** Detrending is the removal of low- frequency signal content regardless of origin (intrinsic or systematic) . In contrast, cotrending is the removal of signal content common to multiple targets, which can better preserve intrinsic low-frequency signals while removing wide-band systematic signals. -5- the cost of injecting additional noise or distortion into the flux time series. Indeed, this occurs frequently for stars with high intrinsic variability, such as RR Lyrae stars, eclipsing binaries, and classical pulsators. For example, if one of the model terms is strongly related to focus variations and the long-term trend is for the width of the stellar point spread function (PSF) to broaden over the observation interval, then the flux for all stars should decrease over time. A least-squares fit, however, may invert the focus-related model term for a star whose flux increases over the observation interval, thereby removing the signature of intrinsic stellar variability from this light curve because there is a coincidental correlation between the observed change in flux and the observed change in focus. Given that the star would be expected to dim slightly over time, if anything, due to the focus change, PDC should be correcting the star so that it brightens slightly more than the original flux time series would indicate. The situation is analogous to opening a jigsaw puzzle box and finding only 30% of the pieces present. Least-squares gamely tries to put the jigsaw pieces together in order to match the picture on the box cover by stretching, rotating, and translating the pieces that were present in the box. The result is a set of pieces that roughly overlap the picture on the box cover, but one where the details don't necessary match up well, even though individual pieces may obviously fit. In order to improve the performance of robust LS, we need to provide the fitter with constraints on the magnitudes and signs of the fit coefficients. These constraints can be obtained by using the ensemble behavior of the stars to develop an empirical model of the underlying physics. For example, the photometric change that can be induced by a pointing change of 0.1 arcsec must be bounded, and this bound can be estimated by looking at how the collection of stars behaves for a pointing change of this magnitude. As an example of this analysis and to demonstrate systematic trends in the Kepler data, take channel 2.1 which is the most thermally sensitive CCD channel in Kepler's focal plane. Nearly all stars on this channel exhibit obvious focus- and pointing-related instrumental signatures in their pixel time series and flux time series. Figured] shows several characteristic light curves for typical targetqj on channel 2.1 during the Kepler Quarter 7 data season normalized by the median flux value. Note the long-term increase for all flux curves over the 90-day interval. This is due to seasonal changes in the shape of the telescope and therefore its focus as the Sun rotates about the barrel of the telescope while the spacecraft orbits the Sun and maintains its attitude fixed on the Field Of View (FOV). All light curves exhibit these long term trends but to differing degrees. Also present are some short-term oscillations evident but mainly obscured in variable targets, which are due to focus changes driven by ^The light curves are referred to as "targets" and not stars since not all objects in the Kepler FOV are stellar. Galactic studies are also performed with Kepler Data. -6- a heater cycling on and off to condition the temperature of the box containing reaction wheels 3 and 4 on the spacecraft bus. This component was receiving more and more shade throughout this time interval and the thermostat actuated more frequently over time. A target that is varying on levels and periods similar to these systematic effects can obscure the systematics making identification difficult. Looking at a single quiet target shown in Figure [2] (the same target shown in solid blue in Figure [1]) allows us to more clearly see the systematic trends which are exhibited in all the targets in Figure [1] but obscured by variability. Each Earth Point, one at the beginning of the quarter (cadence index 0) and after each monthly downlink (cadences 1500 and 2800) results in a heating of different sides of the telescope as the spacecrafts reorients the antennae to downlink data. The Earth Points themselves are gaps in the data. They result in periods of local heating and cooling distorting the telescope. A characteristic recovery time is also evident. The other trends as described above are also clearly evident. A final short data gap is also evident at cadence 3950, but this was not due to a reorientation of the spacecraft so no thermal recovery is present. As a counter example. Figure [3] shows the same highly variable target in solid black in Figure [H Notice how this highly variable star almost completely obscures the long term trend. The targets shown in Figures [2] and [3] will hereby be referred to as the Quiet Target and the Variable Target and used as canonical example targets in section [3l How can we separate intrinsic stellar variability from instrumental signatures? We do not expect intrinsic stellar variability to be correlated from target to target, except for rare coincidences, and even then one would not expect a high degree of correlation for all time scales. However, we do expect instrumental signatures to be highly correlated from target to target and can exploit this observation to provide constraints on the co-trending that PDC performs. Figure H] shows a histogram of the absolute value of the correlation coefficient for 1864 targets on channel 2.1. The targets' light curves are highly correlated as evidenced by the near complete pile- up near an absolute correlation coefficient of 1. Examination of individual light curves indicates that these light curves are contaminated to a large degree by instrumental signatures, as evidenced in Figures [1] and [21 But not all the targets are dominated by systematic errors. The trick is to come up with a method that can distinguish between intrinsic stellar variability and chance correlations with linear combinations of the diagnostic time series used to co-trend out systematic errors. -7- 0.04 Selection of Normalized Target Fluxes from Module 2.1 2000 3000 Cadence Index 5000 Fig. 1. — Selection of typical light curves on channel 2.1. Notice the long term trend exhibited in all light curves. However, for highly variable targets the trend is not entirely clear. This illustrates the need to separate intrinsic stellar variability from systematic trends. 0.02 0.01 X 3 LJ_ T3 0) N -0.01 (0 E ^ o z -0.02 -0.03 -0.04 A Quiet Target on Module 2.1 Showing Systematics i^'*fiM"*'*p«rt* 1000 2000 3000 Cadence Index 4000 5000 Fig. 2. — A particularly quiet target on channel 2.1 showing almost purely systematic trends. The long term trend is due to the seasonal changes to the shape of the telescope as the sun rotates around the barrel. Other spacecraft systematics are also visible such as monthly Earth-point downlinks and heater cycling. Data gaps and their thermal recoveries during the monthly downlinks are evident at cadences 1500 and 2800. -9- A highly Variable Target on Module 2.1 1000 2000 3000 Cadence index 4000 5000 Fig. 3. — A highly variable target where the variability completely obscures the systematic trend. -10- Histogram of Median Absolute Correlation Per Star for Uncorrected Flux 1400r 1200 1000 800 600 400 200 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Median Absolute Correlation Per Star Fig. 4. — Median Absolute Correlation for all targets on channel 2.1 Quarter 7. - 11 - 2.2. The MAP Approach, An Analytical Solution A Bayesian approach called the Maximum A Posteriori (MAP) method allows us to provide PDC with constraints on the fitted coefficients to help prevent over-fitting and disto rtion of intrinsic stellar variability. In this exposition we follow the notation of (JKay 1993|). The PDC-MAP technique examines the behavior of the robust least-squares fit coef- ficients across an ensemble of targets on each CCD readout channel in order to develop a description for the "typical" value for each model term. This description is a probability density function (PDF) that can be used to constrain the coefficients fitted in a second pass. To develop this approach, we build on a maximum likelihood approach. The maximum likelihood approach models each light curve, y, as a linear combination of instrumental systematic vectors, referred to as Cotrending Basis Vectors or CBVs, arranged as the columns of a design matrix, H, plus zero-mean, Gaussian observation noise, w: j/ = H6» + w. (1) The Maximum Likelihood Estimator (MLE) seeks to find the solution, ^mle? that maximizes the likelihood function, p(y; 6), given by P (y; 0) = N r exp (27r)^|Cw|^ -^{y-uefc^'iy-ue) (2) where C^ is the covariance of w and N is the number of data points. Taking the gradient of the log of Equation [2l setting it equal to zero, and solving for 6 yields the familiar least- squares solution, ^MLE = (H^C^^H)-' H^C-V. (3) This solution assumes the model H is a complete model to the data. We will show that the Bayesian model accounts for an incomplete model which is the common case when removing systematics from stellar signals. Adopting the Bayesian approach allows us to incorporate side information, such as knowledge of prior constraints on the model, in a natural way. Bayesianists view the under- lying model as being drawn from a distribution and the data as being one realization of this process. In this case we wish to find the Maximum A Posteriori (MAP) estimator of the model coefficients given the observations (data): ^MAP = argmaxp(0|y) = arg maxp (y|0)p (0) , (4) -12- where we've applied Bayes' rule fJD' Agostinil |2003| ) to simplify the expression. In this equa- tion, p [0) is the prior PDF of the model coefficients. The mathematical form for p{y\0) is the same as for the non-Bayesian likelihood function p (y; 6) in Equation |21 For illustration purposes, if we adopt a Gaussian form for the coefficient distribution, 0, then p{6) takes a closed form solution. P(^) = M r exp {2Tx)-\Ce\-- ^iO-t,,fC,'i0-fie) (5) where Cg and fig are the covariance and mean of 6, respectively, and we assume that the coefficients are uncorrelated (which will hold true for orthogonal basis functions), we can then maximize Equation Hj using Equation [5l by maximizing its log likelihood, \n[p{y\e)p{e)] = \n[p{y\e)]+\n[p{e)] = -^ In (27r) - i In |Cw| - ^ (y - HOf C^^ (y - H6) -|^ln(2vr)-lln|C,|-l(0-M,f C,M0-/Xe). (6) Taking the gradient of equation |6] with respect to 6, setting it to zero, and solving for 6 yields 0MAP = (H^C^'H + C, ^) "' (H^C- V + C, V.) • (7) If the observation noise, w, is zero-mean, white Gaussian noise with variance o"^, then Equa- tion [7] can be rewritten as ^MAP = (H^H + a'Cg') '' (H^y + a^C^ Ve) • (8) The key to this Bayesian technique is to determine when to preference, or weight, the prior PDF over the conditional PDF. If the variance in the data is large compared to the "spread" allowed by the prior PDF for the model, then the MAP estimator gives more weight to the prior so that ^map — ^ Me as a^ — ?■ oo. This case would correspond, for example, to targets with large stellar variability such as with the target given in Figure [31 In this case, the MAP weighting constrains the fitter from distorting the light curve and introducing noise on a short time scale. Conversely, if the variance in the data is small compared to the degree to which the prior PDF confines the model, the MAP estimator "trusts" the data over the prior knowledge and ^map -^ ^mle as a^ -^ 0. This case would correspond to targets with small stellar variability such as with the target in Figure |2] where there is little risk of over-fitting and distortion of the light curves and it is a "safe" bet to use the conditional, least-squares fit. -13- 3. The Empirical Bayesian MAP Approach and Implementation The above analytical solution to the Bayesian posterior PDF restricts the prior PDF to a Gaussian form. There is no a-priori reason to make this assumption and in general, since we are developing an empirical prior PDF, the least number of analytical constraints on the form, the more complete will be the empirical model. If a Gaussian form to the prior is no longer assumed then the prior formalism in equa- tion [5] can no longer be used. We can, however, still take the log form of equation |4] to obtain ^MAP = argmaxp(0|y) = argmax (log(p(y|0)) + \og{p{6))) . (9) e e Using the Maximum Likelihood Estimator in equation |2] for jo(y|^), removing the constant terms, inserting a weighting parameter and using normalized light curves, y, we obtain ^MAP = arg max e nT 2a2 (y-H0)^(y-H0) + Wprlogp(0) (10) where we assume the observation noise, w, is zero-mean, white Gaussian noise and has variance a"^. Since p{0) is no longer in closed form, the "spread" in the prior PDF (i.e. the covariance of 6, Cq in equation [8]) can no longer be expressed succinctly. In its stead, a generalized weighting parameter, Wpr, is used to characterize the "spread" in the prior PDF. Equation [10] must now be evaluated numerically. The overall flow of the algorithm is shown in Figure [51 We start by normalizing the flux light curves and calculating a relative stellar variability. We then flnd basis vectors using SVD based on a reduced set of flux light curves where cuts are made on target-to- target correlation and stellar variability. A robust least-squares flt is then performed on each target using the basis vectors just found. This ensemble of robust fit coefficients is used to generate the prior PDF. The conditional PDF is also found based on the same basis vectors. Once both prior and conditional PDFs are found they are combined to generate the posterior PDF where a weighting parameter, based on the stellar variability and the "goodness" of the prior flt, is used to weigh the prior relative to the conditional PDF. Details are elucidated in the following subsections. 3.1. Finding the Cotrending Basis Vectors The Cotrending Basis Vectors are obtained using Singular Value Decomposition. In or- der to have equal representation for all light curves independent of their absolute magnitude, we flrst normalize the targets by their median flux values ( mcdlan'rflux'i ) • ^^ then select the -14- Normalized Light Curves Calculate Stellar Variability Robust Fit to basis vectors Find Prior PDF using Robust Fit p{0) Find Conditional PDF ' 7 pim ^ Maximize Posterior Estimator e 7 Fig. 5. — Flowchart of the PDC-MAP cotrending algorithm. -15- 50% most highly correlated targets based on the median absolute Pearson correlation. This cut generates a set that exhibits the strongest trends in the data. It mostly removes targets with large variability but not completely. A variable star exhibiting a strong trend can still remain in the reduced list. We therefore first make a cut on the estimated variability of each target. An estimate of the intrinsic stellar variability of each target must be found. Herein lies the fundamental chicken-and-egg problem of the cotrending method. We need to know the stellar variability of each target in order to know how much to rely on the prior. But if we already knew the stellar variability then we would have no need for the prior - the cotrending solution would simply be the intrinsic stellar variability subtracted from the light curve, minus a Gaussian noise estimate. This issue is not specific to this particular cotrending method either. Whenever a system is characterized with an incomplete model there exists the problem of identifying the components in the system not represented in the model. We fortunately do not need to absolutely know the variability, we only need an estimated metric in order to weigh the prior. This estimate can be obtained by comparing a third-order polynomial to the light curve. The polynomial will remove any long term trends leaving behind a roughly detrended curve. The standard deviation of this polynomial removed light curve results in a rough calculation of the variability of the target. Removing a low-order polynomial is essentially a high-pass filter, we are therefore assuming any long term trends are systematic and short term trends are stellar. There are numerous counter-examples of short term trends that are actually systematic - reaction wheel zero crossingsn is a good example. However, short term systematic trends tend to be small in magnitude whereas long term systematics tend to result in large diversions in the flux amplitude. Likewise, there are examples of intrinsic long-term trends but they are generally smaller than the systematic trends. Since we are only concerned with the relative amplitude of stellar versus systematic variability, we are using the low pass filter to distinguish two characteristic realms of influence: long term trends dominated by systematics and short term trends dominated by intrinsic stellar variation. An example is shown in Figure |6l Here, a highly variable target is compounded with a long-term DVA and thermal trend. For periods less than 400 cadences the variance in the flux is dominated by stellar features. The long term variance, and the general trend to higher flux values is due to systematics. The variance of the residual after removing the polynomial fit, labeled as "Coarsely detrended light curve" in the figure, gives a rough estimate of the stellar variability of this target. Note that there are still systematic features in the detrended light curve. They are however small in magnitude compared to the stellar variabilitjo. ^This does not lessen the ability of PDC-MAP to remove short term systematics. Such short term -16- 0.03 -0.041 Polynomial Fit to a Highly Variable Target Raw light curve -Polynomial Fit Coarsly detrended light curve 1000 2000 3000 Cadence Index 4000 5000 Fig. 6. — By removing a 3rd order polynomial fit to the raw light curve an estimate of the intrinsic variability of the target can be calculated. -17- The variability, V, is measured using V=^^, (IV AyV where a^ is the standard deviation of the third-order polynomial detrended ligh t curve, A jy is the uncertainty of the flux data as determined by the PA pipeline component ( 1Clarkell2010[ ) and V is the median variability over all light curves in the sample. The normalization by the uncertainty is to ensure the noise in the data is not included in the stellar variability. The normalization by the median variability is so that a variability of 1 is considered typical thereby simplifying the analysis parameterization. Figure [7] shows a histogram of the mea- sured variability for all targets on channel 2.1. The median of all of these values is evidently 1 and the distribution is typical for all channels where most are close to typical variability but with a long tail to high variability (note the log scale for the x-axis). There are two cutoff thresholds plotted as well. The upper (in dashed red) is the threshold to determine if a target is "highly variable." The lower (in solid green) is to determine if a target is "very quiet." The very quiet targets have such a low amount of variability that using the prior PDF when generating the fit has been found to be problematic. Any targets above the high variability threshold are removed from the reduced list. The remaining targets are sorted with respect to median absolute correlation and the 50% most highly correlated are used for SVD. Due to all targets being normalized by their median, most targets pass through zero amplitude at the midpoint as can be seen as a "node" in the light curves at cadence 2200 in Figure [T] If SVD was performed on this set, as is, then all the strong cotrending basis vectors would have zero amplitude at the midpoint. The basis vectors would therefore be unable to remove systematics in the minority of targets that do not pass through zero at the midpoint. The light curves are therefore dithered slightly by a zero-mean Gaussian dithering magnitude in order to slightly "spread" the light curves about the zero flux value. Since the dithering is zero mean this has no effect on the resultant basis vectors other than to remove the artificial zero crossing node at the midpoint. Note that the dithering is only used to generate the basis vectors. The cotrending is performed on the non-dithered, but still median-normalized, light curves. Figure |H] shows the singular values from the singular value decomposition. This figure is characteristic of all channels; 2 or 3 strong singular values, then a slowly tapering region for about another dozen values until finally asymptotically approaching zero (as is expected with SVD). The first several left singular vectors are selected (typically the first 8) to become systematics are still present in the basis vectors and so when the PDF fit is performed the short term trends are removed. -18- 100 80 60 40 20 Stellar Variability Histogram for All Targets on Module 2.1 10" 10^ ISVD and Prior Generation; Cutoff = 1.3 I Prior Used in Posterior; Cutoff = 0.5 i Ml UkM 10' Stellar Variability 10^ 10^ Fig. 7. — Histogram of estimated variability for all targets on channel 2.1. This distribution is typical for all Kepler channels. The quiet targets below the "SVD and Prior Generation" threshold are used to generate the cotrending basis vectors. -19- the Cotrending Basis Vectors. These first singular vectors exhibit th e principle trends in the data due to DVA, pointing errors, impulses due to Argabrightenings (jWitteborrul201ll ). focus errors and reaction wheel zero crossings among other trends. The number of basis vectors used is generally 8, however a signal-to-noise ratio test is performed where the SNR is determined by SNRdb = 101og,o(^). (12) \ noise / ^signal and Anoisc being the Root Mean Square of the light curve and noise floor respectively. The noise floor is approximated by the first differences between adjacent flux values. Any of the 8 basis vectors with a SNR below a threshold of 5 decibels are removed but only a small number of basis vectors over the entire field of view are removed by the SNR test. Most have high SNR. There are other sophisticat ed methods to find the dimensionality of an eigensystem such as Bayesian Model Selection ( lMinkall2000l ). These are not used because they tend to pick too high a dimensionality in this particular situation. We wish to find only the singular vectors with systematics, the lesser singular vectors do contain light curve signal information, but not necessarily systematics and we have found including them in the MAP fit adds no value yet slows down the algorithm. For a minority of basis vectors, a few target light curves can dominate the signal. The normalization process attempts to "equalize" the strength of all targets, but a small number of light curves can be over-represented in the singular vectors from SVD. To eliminate this we calculate an entropy metric for each basis vector using the following entropy calculation hiPi) = — p{x) log p{x)dx, (13) where p(x) is a probability distribution function created from the right singular vectors from SVD (referred to as the V-Matrix), V^{X) = {^k^}. (14) The V-Matrix contains the contribution of the signal in the basis vector from each target light curve. We must first normalize the entropy calculation to a Gaussian distribution, which has the highest entropy of any continuous distribution with the same 2nd moment. The entropy of a Gaussian is H.M = i±ifM + ,ogM, (15) a being the 2nd central moment of Vki for fixed i. The resultant relative entropy is therefore h\p;) = h{p;)-H,{a). (16) -20- Singular Values from SVD for reduced Normalized Flux Fig. 8. — The singular values from SVD on the reduced set of "cleaned" targets that are highly correlated and quiet. -21 - If one (or a few) targets are domineering then they will have much larger values in the V- Matrix then all the other targets. A negative value of the entropy calculation will identify this condition. Bad entropy is somewhat arbitrary but we have found that a value below —0.7 is poor. For any basis vectors with identified poor entropy, the V-matrix column for that basis vector is examined for stand-out targets. The offending targets are removed and SVD is re-computed on the remaining targets. The process is iterated until the entropy of all basis vectors is below —0.7. Typically, no more than a couple iterations is necessary and fewer than 20 targets are removed (out of 2500 total targets). Figure [9] shows the first eight cotrending basis vectors generated for channel 2.1 and Figure [10] shows just the first basis vector. Trends can be found in all the vectors but it is useful to concentrate on the first, and strongest. Here the most characteristic trends and systematics in the data can be found. The general trend to higher flux is due to the seasonal change and solar orientation. The short recovery periods at cadence indices 0, 1500 and 2800 are due to monthly downlinks. The short spikes at 700 and 1450 are due to artifacts from correcting cosmic rays near reaction wheel zero crossing periodCl- The periodic oscillation is due to heater cycling. Notice how the Basis Vector in Figure [TU] closely follows the Flux Light Curve in Figure [21 This signifies that virtually all the features in this Flux Light Curve are due to systematic effects and not intrinsic stellar variability. In theory, any features in the light curves in Figure [1] that are not represented in the basis vectors in Figure [9] are intrinsic to the target. However, a simple least-squares projection of the light curves on the basis vectors will not produce desirable results for all targets as will be shown below. Once the cotrending basis vectors are found a robust LS fit is performed on each target. This creates the empirical data used to generate the prior PDF. 3.2. Numerically Generating p{6) The prior PDF is based on the distribution of robust fit coefficients of the basis vectors for all light curves using the method described in (iBowmanI [19871 ) . This method computes a probability density estimate of the sample data based on a normal kernel function using a window that is a function of the number of points in the data sample. The form of the prior PDF will depend on the parameterization of the robust fit coefficients. We must thus decide how to parameterize the coefficients to best extract the correlations. Some systematic effects are caused by focal plane irregularities and instrumental vibrations which are stronger ^These artifacts have been resolved in a recent version of the PA pipehne component but Argabrightenings stih persist. -22- Basis Vector 1 Basis Vector 2 Basis Vector 3 Basis Vector 4 /•y^ Basis Vector 5 Basis Vector 6 A^V VVA Basis Vector 7 Basis Vector 8 w^/V Fig. 9. — First 8 Cotrending Basis Vectors for channel 2.1. The gaps in the data have been hnearly filled so these curves are continuous. -23- First Cotrending Basis Vector for Module 2.1 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Cadence Index Fig. 10. — First Cotrending Basis Vector for channel 2.1. The aniphtude of the basis vector is arbitrary. -24- near the edges of the CCD frame (nearer to the spacecraft housing). There are also other issues that are dependent on the physical position of each pixel on the CCD. Therefore, the targets' locations in the sky as characterized by right ascension and declination are reasonable parameters to characterize target location with respect to the sources of systematic effects. The targets' influence by systematic effects is also directly related to the stellar magnitude since different magnitude targets result in different saturation levels of the CCD pixels. For example, the readout electronics for the CCDs are sensitive to temperature drift but the sensitivity is non-linear with respect to CCD flux levels. So brighter targets are affected by instrument temperature differently than dim targets. We therefore parameterize the prior PDF with three independent variables: 1. Stellar Magnitude [Kp) 2. Right Ascension (RA) and 3. Declination (Dec) . Figures [TTl [T2] and [T3] show the robust flt coefficients for a basis vector plotted against Kp, RA and Dec. The blue star data is for all targets whereas the red circle data is just for those targets remaining for SVD after the cuts discussed in section 13.11 The solid blue and dashed red curves in Figures [12] and [13] are the travelling window means of the blue star and red circle data respectively. The cuts clearly produce a bimodal distribution in Kp for this basis vector. A simple Gaussian flt would not reproduce this and demonstrates that the systematic trends are correlated with Kp but variable targets are masking the true correlation when a simple robust flt is performed. The correlations in RA and Dec are also evident but to a lesser extent. Notice also that the mean (solid blue curves) are biased compared to the dashed red. This is again because the variable targets are masking the true trends in the data. Some basis vectors exhibit stronger trends in Kp, RA or Dec but not necessarily all three simultaneously as is expected if the different systematics represented by the basis vectors have different instrumental sources. Plotting different Basis Vectors and/or channels reveals different trends and correlations. We want to mainly rely on targets in the neighborhood around the target we are fltting, referred to as the Target Under Study (TUS), in RA, Dec and Kp space when generating the prior PDF. If we simply found an evenly weighted PDF then a large cluster of targets with a certain coefficient value, even if non-local to the TUS, would always dominate the peak of the prior PDF. We therefore use a weighted probability density estimate based on the Standardized Euclidean Distance between targets x and y. D = y^(x - y) A-^ (x - y)^ (17) where A is a diagonal matrix whose diagonal elements give the relative weighting for each dimension. A straight normalization in each dimension by its standard deviation would result -25- 0) o o 0.6 0.4 0) 3 m i 0.2 c 0) -0.2 Robust Fit Coefficient vs Stellar Magnitude All Coefficients Coefficients used for SVD • ••• . > .% • 'i^* ♦J * ••• t «»^,«? rfi*?i« • • « *% •» ** # * * «• *% * » * "r ••5*», 0m*£* • • 11 11.5 12 12.5 13 13.5 14 Kepler Magnitude 14.5 15 15.5 Fig. 11. — Robust fit coefficients for Basis Vector 1 for all targets (Blue stars) and only those targets used for SVD (Red circles) plotted against Kepler Magnitude. By taking cuts on stellar variability, target-to-target correlation and entropy results in a bimodal distribution that would not be evident without the cuts. -26- Robust Fit Coefficient vs Riglit Ascension 19.62 19.64 19.66 19.68 Riglit Assension [degrees] 19.72 Fig. 12. — Robust fit coefficients for Basis Vector 1 for all targets (Blue stars) and only those targets used for SVD (Red circles) plotted against Riglit Ascension. -27- Robust Fit Coefficients vs Declination 0) 0.2 0.1 (0 > o % -0.1 o o -0.2 -0.3 * :.«.... * * ® ^ ^ ® ®^ ® * * » « * ** * . Ij^J*? "*-is,-«^5*f -;•**€ 1 • : » * * » * All Coefficients ° Coefficients used for SVD SB* SB » , » * ** '• *.* * * * « * * .-i » . .\ «%..r /^- * **%\ .* *- *» *,SB* * •«* .**.# , *»,»*» *t «» » *, SB * * * » *.*»•*» S » *« % «t » * SB* ** * % SB(B# s**» t^»,r#. a%*^''^l ^,1 t 1 III ("Ht afe^|'Vi|f'^*%*»*> *»t*#f***/ *{■ * ****\ % ^ •" * * * f- 1 -/ •* * J ****%* * % *;* »• **r **'* *,» •%* *** « *% *»•***„ .» € • * !*• * / " •.»;:**;**! .:... . .* ^ « )^ * * JJ * * SB » * * * * * » » • » » • ♦**•** * .* , * * 1 *SB "'*""f"' • „| *^ ».*/;#•*% *.* .* :., . •*. *: '** - ^ % . w. * . ♦ »^ SB » » » • *„ * * ♦ » * * » * * * * 1 » 1 , 42 42.2 42.4 42.6 42.8 Declination [hours] 43 43.2 Fig. 13. — Robust fit coefficients for Basis Vector 1 for all targets (Blue stars) and only those targets used for SVD (Red circles) plotted against Declination. -28- in equal weighting of all three dimensions, but we wish to overemphasize the prior PDF in dimensions that exhibit greater correlations, and in our case, the robust fit coefficients exhibit a stronger correlation in Kp than in RA or Dec. The A matrix diagonals are therefore A. = ^^^ (18) where mad [O.^) is the median absolute deviation of the coefficient distribution along dimen- sion i and Si is the scaling factor for dimension z. Si = {2, if i =^ Kp or 1, otherwise.} The above weighting results in the Kp dimension weighted twice as much as RA and Dec when generating the prior PDF. That is, targets further away in the Kp dimension are weighted proportionately less than in RA and Dec. This effectively results in taking a tighter cut in Kp space to emphasize the greater correlation in that dimension. Since the PDF is weighted by this distance metric, the PDF will emphasize the correlation in Kp and yet still be sensitive to the trends in RA and Dec. The median absolute deviation is used instead of the standard deviation in order to ignore outliers. The weighting by equation [T7] and how it affects the prior PDF is illustrated in Fig- ures dU [T5] and dSl The later two being the prior PDFs for the same two targets in Figures [2] and [3l The blue histogram in all three figures is exactly the same since they are generated from the same distribution of coefficients. However, the prior PDF (red curve) is dramat- ically different. In Figure [14] a bimodal PDF is evident due to targets nearby to the TUS containing two clusters, around -1.3 and -0.85, and suggests that the coefficient value for the TUS should be one of these two values. Which value that is actually chosen will be depen- dent on the form of the conditional PDF and the weighting of the prior PDF as discussed in Section [331 In Figure [16] the targets near the TUS have coefficients clustered around -0.34 which is far from the peak in the unweighted PDF. Using the unweighted PDF would have completely missed the actual systematic trend in the data near the TUS. The log of the prior PDF is plotted in these figures for direct comparison with equation [10] which results in a compression of the PDF near the top. In summary, the prior PDF is developed by generating a 3 dimensional weighted dis- tribution of robust LS fit coefficients in RA, Dec and Kp space. This methodology makes no assumptions on the form of the PDF, Gaussian or otherwise, and allows PDC-MAP to identify and characterize the form of the systematic trends across the full distribution of targets. -29- 200 150 100 Prior PDF (renormalized); Component 2 ■ 1864 coefficients — Weighted PDF 0.5 Coefficient Value Fig. 14. — Histogram of basis vector 2 robust fit coefficients for all 1864 targets on channel 2.1 and the weighted probability density for a particular target. The weighting by distance in Kp, RA and Dec clearly affects the PDF. -30- 200 150 100 Prior PDF (renormalized) for the Quiet Target; Component 2 ■ 1864 coefficients — Weiglited PDF 0.5 Coefficient Value Fig. 15. — Histogram of basis vector 2 robust fit coefficients for all 1864 targets on channel 2.1 and the weighted probability density for the Quiet Target shown in Figure [2l -31 - 200 150 100 50 Prior PDF (renormalized) for the Variable Target; Component 2 Hi 864 coefficients — Weiglited PDF -1 -0.5 Coefficient Value 0.5 Fig. 16. — Histogram of basis vector 2 robust fit coefficients for all 1864 targets on channel 2.1 and the weighted probability density for the Variable Target light curve shown in Figure [31 -32- 3.3. Finding the Weighting Parameter W pr For each hght curve the weighting parameter, Wpr, in equation [10] is an empirical weighting parameter that is principally based on the variability of each target. The greater the variability the greater we need to constrain the least-squares fit. However, there is another complication. A fit to the prior PDF is not always a good fit to the trend in the target. The reason for this disagreement is currently being investigated. One factor influencing the "goodness" of the prior fit is the sparseness of the targets in certain regions in Ra, Dec and Kp space. A sparse distribution will result in poor prior statistics. There are also other unknown causes resulting in poor priors for some targets and so an additional parameter in the prior weighting is an evaluation of the "goodness" of the prior fit. The goodness is evaluated using a method similar to the variability calculation above. The prior fit is compared to a 3rd order polynomial fit to the light curve with a soft- wall cutoff using the following equation G,= (^-('^)'- 'f«'"'<"« (19) I 0, otherwise where Graw is the "raw" goodness given by and -Fpr and Fpoiy are the prior PDF fit and the 3rd order polynomial fit to the data respec- tively. Normalization by the median absolute deviation (mad) of the polynomial fit removed light curve allows for a comparison of the difference between the polynomial fit and the prior fit with respect to the variance of the target. The soft cutoff is to ensure that small changes in the light curve will not have dramatic changes in the weighting. The scaling parameter ac is determined by when the deviation of the prior fit to the polynomial fit becomes too poor to be useful in constraining the Posterior fit. An example of a poor Prior Fit is given in Figure [T71 Notice how both the long term trend and the Earth-point recoveries are much larger in the prior fit than in the actual data. Examples such as this are in the minority, but frequent enough to require the additional test for prior goodness. It could be proposed that this target is trending downward cancelling out the upward trend of the prior fit. This is unfortunately not the case. Examination of Kepler Season Quarters 6 and 8 reveal that this target is not experiencing a general trend in Quarter 7. The prior fit is indeed poor and should not be used to any large degree. The unfortunate side effect of the prior goodness test is that PDC-MAP is less sensitive to very long term trends in the data. A true long term trend in the data that cancels out the systematic trend can confuse the prior goodness metric which would interpret the fit as a bad prior. The only way to surely know the actual -33- 1.55 x10 0) u c 0) ■a (0 o 0) '^ 1.5 0) o (A < •is 46 Raw Flux and Poor Prior Fit 5.548 5.55 5.552 Cadence mid-timestamp — Raw Flux Prior Fit 5.554 xlO 5.556 4 Fig. 17. — An example of a poor prior PDF fit to the trend in the target. -34- long term trend is to examine multi-quarter data. Future versions of the PDC module may indeed provide this functionality. The resultant full form to the prior weighting is Wpr = \/^^*G'^?. (21) The parameters /5y and /3g being scaling factors for the Variability and Prior Goodness respectively. Future work includes fully characterizing the conditions where the prior fit is poor and thereby remove the unfortunate need for an empirical prior "goodness" metric. In cases where the prior goodness is near zero the fit reverts to a reduced robust fit where the number of basis vectors is limited to just the first several (default is 4). A MAP fit has the pleasant feature where a large number of basis vectors can be used. The prior PDF restricts the fit from drifting drastically in function space searching the large set of basis vectors for a combination that reduces the bulk RMS at the expense of distorting stellar features. If the prior cannot be used then there is no such restriction and the posterior PDF becomes a least-squares fit, so a more limited number of basis vectors must be used in order to constrain the fit. The first several basis vectors have very strong trends in most of the data and have low noise components so they are generally safe to use even with an unrestricted least-squares fit. It is also generally true that a target with a bad prior is so because the target is quiet and any small deviation in the prior from a true trend is very noticeable and the prior is neither necessary or desirable to use. If the target is below the variability threshold shown in Figure [7] then the target is very quiet and in many cases the use of the prior fit only worsens the fit over a least-squares fit and so the prior weighting is zeroed. This is due to the prior fit never being an exact match to the target trend and even small deviations can "pull" the posterior fit away from a good fit. In such cases there is little risk of a quiet target biasing a least-squares fit away from a proper cotrending fit. The majority of targets do not fall into either of the above two cases and the prior is used to the degree dictated by the prior weight, and a Bayesian MAP fit is performed. 3.4. Maximization of the Posterior PDF Once the prior weighting is determined the posterior PDF can be assembled using equation [TUl Due to the empirical, and therefore non-analytical, form of the prior PDF the posterior must be maximized numerically. In general, a multidimensional maximization is difficult and time consuming due to the risk of only finding local maxima. Fortunately, due to the use of SVD, the basis vectors are all orthogonal so the various coefficients 9i can be -35- maximized sequentially. The process is therefore straightforward. The strongest singular vector is maximized first and then all subsequent singular vectors are maximized in turn. Following along with the same two examples of a quiet and a variable target, Figures and [19] show the final posterior PDF along with the prior and conditional PDFs. The black dots and magenta stars are the maxima of the prior and conditional PDFs and the blue circle is the maximum of the posterior PDF. Due to the varying scales of the three curves the prior and conditional curves have been renormalized to the same scale as the posterior for illustration purposes. The conditional fit has the smooth quadratic form characteristic of a least-squares fit. The prior appears Gaussian on this scale but in general is not. The title of each plot gives the prior weight. For the quiet target the variability is low so the prior weight is only 4.65 resulting in a minor correction to the conditional curve and so the conditional and posterior curves are virtually identical. Also, in cases where the width of the Prior PDF is large the maximum is low and it makes little contribution to the posterior. A wide prior is equivalent to saying that the prior information results in little added information to a good fit. The extreme case being a fiat prior PDF which provides no additional information. This is in contrast to the variable target where the conditional maximum is near the Prior PDF, meaning the fit almost completely relies on the prior data, due to the high weight on the Prior PDF of 976, resulting in only a slight influence of the conditional fit. In the case of the variable target in particular, if the prior PDF was not determined using a weighted distribution (in Kp, RA and Dec space) then the maximum of the prior would have been at about 0.6 as shown by the unweighted histogram in Figure [TBI This would have resulted in a poor fit to the systematics. The actual prior fit takes into account the location of the Target Under Study and the systematic trends in targets nearby to the TUS. Once the maximum of the posterior PDF is found for each basis vector the MAP fit is a linear combination of the basis vectors. The resultant fits are in Figure [2D] and [2U For the Quiet Target all three fits roughly overlap the actual trend in the data. The prior fit is not an exact match and the slight disagreement is to be expected since the prior is purely formulated using targets other than the TUS. For a quiet target such as this one highly weighting the prior PDF would result in a degradation of the fit and so instead the PDF relies mainly on the conditional PDF (i.e. the red dashed and green solid curves overlap). The resultant light curve after the trend removal is in the bottom figure for both the MAP fit and the conditional fit, the later being a least-squares fit. For the quiet target notice how the resultant curve is near featureless above the noise floor. Some slight artifacts are not fully removed and methods to correct these are discussed in Section [1] In the case of the variable target the conditional PDF results in a fit that attempts to remove all features in the light curve, whereas the Prior PDF correctly identifies just the systematic trends in the data. In this case it is beneficial to rely principally on the prior PDF. The prior cannot be well -36- ^ ^5 PDF Maximization for Quiet Target; Component 2; Prior Weighit = 4.6512 II — Posterior PDF 1 ---Prior PDF Conditional PDF m @ ° Posterior Maximum • Prior Maximum - = .1 ^^^^^-^ * Conditional Maximum Arbitrarv / \ ; » I * 1 * 1 » 1 » , \ ~~^--. - -3 1 » \. -4 1 1 'l 1 1 1 X -0.8 -0.6 -0.4 -0.2 Coefficient Value 0.2 Fig. 18. — Posterior, prior and conditional PDFs for the Quiet Target. The prior and conditional curves have been renormalized to the same scale as the Posterior for legibility. This target is quiet so the prior PDF weighting is low and does not influence the posterior by much. The maximum of the posterior is therefore very close to the conditional maximum. The width of the prior PDF can also influence its height and amount of influence on the conditional. -37- ^ ^6 PDF Maximization for Variable Target Component 2; Prior Weight = 976.1743 -0.5 ■^ -1 n .^-1.5 < -2 1 1 1 — Posterior PDF ---Prior PDF Conditional PDF ° Posterior Maximum • Prior Maximum * Conditional Maximum - / \ ^ — ' ^ / \ 1 \ 1 \ > \ ' \ . / / » / » / » / » / \ 1 1 1 1 \ \ 1 1 ^ -0.5 -0.4 ■0.3 -0.2 -0.1 Coefficient Value 0.1 Fig. 19. — Posterior, prior and conditional PDFs for the Variable Target. The prior and conditional curves have been renormalized to the same scale as the posterior for legibility. This target is highly variable so the prior PDFs is highly weighted at 976 and it influences the posterior considerably. The maximum of the posterior is therefore close to the prior maximum. -38- discerned in the figure because it lies under the MAP fit. The conditional fit also introduces a considerable amount of noise into the corrected light curve due to it being constructed more from the lesser basis vectors (shown in Figure ^ which contain larger noise components. 3.5. Propagation of Uncertainties Propagation of uncertainties is not necessarily straightforward because a covariance matrix is difficult to formulate for an empirical prior PDF. As a first approximation the propagation can be assumed to be through a least-squares solution - which is close to the solution for most targets. If C^aw and Ccot denote the covariance matrices for the temporal samples of the raw and cotrended flux time series for a given target, then the uncertainties may be propagated (disregarding the uncertainty in the mean level which can be considered to be negligible) by ^cot ^ -^ cot ^ raw -'cot; l^"^/ where the transformation Tcot is defined by Teot = (/ - HH^) . (23) H being the same design matrix as in equation [101 This is overly conservative since the pos- terior PDF is more constrained than a simple least-squares fit. A more accurate propagation of uncertainties would take into account the attenuation of the uncertainties due to the prior PDF. 4. Future Improvements The algorithm as presented works phenomenally well for the majority of ligh t curves in th e Kepler FOV. Overall PDC performance is discussed in a companion paper (jStumpe 2OI2I ). However, problems do arise. Remediations to these problems are discussed in this section. One of the main issues with the current method is different types of systematic effects are represented in each basis vector shown in Figure [91 For example, the long term trends associated with DVA and seasonal changes should not be represented by the same basis vectors as heater cycles and Earth point thermal recoveries. Given that these different systematics behave on different time-scales a reasonable solution is to band-split the light curves and generate separate basis vectors for each band. This method is currently in development. -39- 0) o 5.8r T3 (0 u X E 5.6 0) 5.5 Raw Flux and Fit Curves for the Quiet Target M^MMit^»N^*M>M^ — Raw Flux — Conditional Fit Prior Fit — - MAP Fit Jo 6.546 5.547 5.548 5.549 5.55 5.551 5.552 5.553 5.554 5.555 5.556 x10 Corrected Flux Time Series x10 5.69 Conditional Residual MAP Residual 5.546 5.547 5.548 5.549 5.55 5.551 5.552 5.553 5.554 5.555 5.556 Cadence mid-timestamp xlO Fig. 20. — Resultant fits to the Prior, conditional and posterior (MAP) PDFs for the Quiet Target. Target variability is low at 1.17 and Wpr is also low at 4.65. Quiet targets rely principally on the conditional PDF. -40- 0) o c 0) ■o n u o 0) *-• 3 O w ^.8 Raw Flux and Fit Curves for the Variable Target 39x10 3.8 3.7 3.6 3.5 5.546 5.547 5.548 5.549 5.55 5.551 5.552 5.553 5.554 5.555 5.556 8 Corrected Flux Time Series 1 1 1 1 1 1 1 1 1 1 1 ^»sJ^ 1 Raw Flux Conditional Fit Prior Fit --MAP Fit xlO xlO ''\ 3.9 3.8 3.7 3.6 5.546 5.547 5.548 5.549 5.55 5.551 5.552 5.553 5.554 5.555 5.556 N/i Conditional Residual MAP Residual Cadence mid-timestamp XlO Fig. 21. — Resultant fits to the prior, conditional and posterior (MAP) PDFs for the Variable Target. Target variability is high at 30.25 and Wpr is also high at 976.2. Variable targets rely principally on the prior PDF. It is evident that the Posterior (MAP) fit finds the systematic trend yet preserves the variability. -41- The current method relies solely on Singular Value Decomposition to generate the basis vectors after cuts on quiet and correlated targets. SVD is a reliable and often used tool to generate basis vectors that describe highly correlated trends in data. However, it does have its detractions. For non-Gaussian systematic tren ds, SVD is not i deal. Methods such as Independent Component Analysis as described in (j WaldmannI |20 1 ll ) can potentially better de-convolve independent systematic sources. Occasionally a single target can dominate one or more basis vectors. The resulting basis vectors essentially contain all stellar variability and noise of the offending target making the basis vector impotent in removing systematics. The entropy cleaning step removes the offending targets, but the source of the problem is partially a result of the normalization by the median value of each light curve. For the small number of dim targets with appreciable noise, the noise is overly represented in the basis vectors. Normalization by the standard deviation or noise floor of the light curves would limit the problem of over-represented targets, however at the expense of not equally normalizing the light curves by flux intensity which can introduce other problems. We have also discovered that over the channel field of view groups of targets exhibit similar systematics that are distinct from other targets so specific clusters of targets can be identified with similar trends. Using the same basis vectors for all targets is not ideal in this sit uation. We are investigating using a Hierarchical Clustering method such as described in ( 1jainlll999l ) to isolate the clusters and develop basis vectors separately for each cluster. The prior PDF generation is based on correlations in Stellar Magnitude, Right Ascension and Declination but in some targets the prior is poor. It is to be expected that systematics are also correlated with other stellar and instrumental parameters. A full parametric study finding these correlations is to be performed to identify hidden variables that further char- acterize the systematics. One strongly suspect hidden variable is sub-pixel centroid motion. Kepler collects both long cadence data at ~30 minute intervals and short cadence data at ~60 second intervals. No more than 512 short cadence targets are collected at any time and are spread over the entire field of view so the number of short cadence targets per channel is small and at most about a dozen. A dozen is too small of a sample for the prior PDF to be properly formulated. We are investigating ways to extend PDC-MAP to short cadence data. Options include using the prior PDF from long cadence data and using a single reference ensemble drawn from all 512 targets. -42- 5. Conclusions PDC-MAP dramatically improves Kepler's ability to understand the properties of parent stars. It preserves stellar signals and minimizes the noise while removing the systematic errors that can mask transit signals. PDC-MAP therefore ultimately improves Kepler's primary mission of detecting Earth-like planets. But PDC-MAP also improves the Kepler data's utility to the broader astrophysical community. Non-planet finding studies such as asteroseismology is greatly benefited by PDC-MAP's ability to preserve stellar signals in the light curves. Funding for this Discovery Mission is provided by NASA's Science Mission Directorate. We thank the thousands of people whose efforts made Kepler's grand voyage of discovery possible. The Kepler Science Office and Science Working Group work diligently in analyzing data products and have provided great insight into data reduction methods and data pro- cessing. We especially want to thank the Kepler SOC staff who helped design, build, and operate the Kepler Science Pipeline for putting their hearts into this endeavor. The authors would also like to thank the very productive conversations we have had with the Suzanne Aigrain group at Oxford University, especially Stephen Reece, Stephen Roberts and Amy McQuillan. Facilities: Kepler. REFERENCES Bowman, A. W. and Azzalini, A. Applied Smoothing Techniques for Data Analysis. New York: Oxford University Press, 1987. Clarke B. D., et al. A framework for propagation of uncertainties in the Kepler data analysis pipeline. In SPIE Conference Series, volume 7740 , July 2010. D'Agostini, G.. Bayesian Reasoning in Data Analysis. New Jersey: World Scientific, 2003 Gautier, T. N. et al. The Kepler Follow-up Observation Program. ArXiv e-prints, January 2010. M. R. Haas, M. R. et al. Kepler Science Operations. ApJL, 713:L115-L119, April 2010. A. K. Jain, A. K. et al.. Data Clustering: A Review. ACM Computing Surveys, 31 Issue 3, pp 264-323 (1999) Jenkins, J. M., et al., ApJL 713, L87 (2010) Jenkins, J. M. et al. Transiting planet search in the Kepler pipeline. In SPIE Conference Series, volume 7740, July 2010. Jenkins, J. M. et al. Initial Characteristics of Kepler Long Cadence Data for Detecting Transiting Planets. ApJL, 713:L120-L125, April 2010. Jenkins, J.M., et al. Planet Detection: The Kepler Mission, in Advances in Machine Learning and Data Mining for Astronomy (eds. M. Way, J. Scargle, K. AU, A. Srivastava), Chapman and Hall/CRC Press (2011) -43- Kay. S. M. Fundamentals of Statistical Signal Processing: Estimation Theory. New Jersey: Prentice Hall PTR, 1993. Kolodzicjczak J. et al, To be submitted to PASP (2012) Kovacs, G., Bakos, G., Noyes, R. A trend filtering algorithm for wide-field variability surveys. Mon. Not. R. Astron. Soc. 356, 557-567 (2005) McCauliff, S. et al. The Kepler DB: a database management system for arrays, sparse arrays, and binary data. In SPIE Conference Series, volume 7740, July 2010. Minka. T. P. Automatic choise of dimensionality for PGA. MIT Media Lab. Perceptual Gomputing Section Technical Report No. 514 (2000, revised 2008) Quintana, E. V. et al. Pixel-level calibration in the Kepler Science Operations Genter pipeline. In SPIE Conference Series, volume 7740 , July 2010. Stumpe M. G. et al. Architecture and Algorithms of the Presearch Data Gonditioning Module of the Kepler Science Processing Pipeline, Submitted to PASP (2012) Tamuz, O. Mazch, T. Zucker, S. Gorrecting systematic effects in a large set of photometric light curves. Mon. Not. R. Astron. Soc. 356, 14661470 (2005) Tencnbaum, P. et al. An algorithm for the fitting of planet models to Kepler light curves. In SPIE Conference Series, volume 7740, July 2010. Twicken, J. D. et al. Photometric analysis in the Kepler Science Operations Genter pipeline. In SPIE Conference Series, volume 7740, July 2010. Waldmann, I. P. Of 'Gocktail Parties' and Exoplanets. Astro-ph.EP (2011) Witteborn, F. G. et al. DEBRIS sightings in the Kepler field. In SPIE Conference Series, volume 8151, 815117 (2011) Wu, H. et al. Data validation in the Kepler Science Operations Genter pipeline. In SPIE Conference Series, volume 7740, page 42W, July 2010. This preprint was prepared with the AAS IATJt;X macros v5.2.