A Genomic Model for the Prediction of Ultraviolet Inactivation Rate Constants for RNA and DNA Viruses

Wladyslaw J. Kowalski1, William P. Bahnfleth2, Mark T. Hernandez3

1 Immune Building Systems, Inc., 575 Madison Ave., 10thFloor, New York, NY10022,

2 The Pennsylvania State University, Department of Architectural Engineering, University Park, PA 16802

3 University of Colorado, UCB 428, Department of Civil, Environmental, and Architectural Engineering,

1111 Engineering Drive #441, Boulder, CO 80309


mathematical model is presented to explain the ultraviolet susceptibility of viruses in terms of genomic sequences that have a high potential for photodimerization. The specific sequences with high dimerization potential include doublets of thymine (TT), thymine-cytosine (TC), cytosine (CC), and triplets composed of single purines combined with pyrimidine doublets. The complete genomes of 49 animal viruses and bacteriophages were evaluated using base-counting software to establish the frequencies of dimerizable doublets and triplets. The model also accounts for the effects of ultraviolet scattering. Constants defining the relative lethality of the four dimer types were determined via curve-fitting. A total 77 water-based UV rate constant data sets were used to represent 22 DNA viruses. A total of 70 data sets were used to represent 27 RNA viruses. Predictions are provided for dozens of viruses of importance to human health that have not previously been tested for UV susceptibility.


Ultraviolet susceptibility, UV rate constants, D90 values, photodimerization, genomic modeling, pyrimidine dimers, viruses, ultraviolet germicidal irradiation, water disinfection, air disinfection, Z values, bioweapons, UVGI.


UV rate constants and D90 values, as well as other terms defining UV susceptibility, have been determined in laboratory experiments and cataloged for decades, but as yet no one has produced a definitive theoretical model that can predict the ultraviolet susceptibility of microbes. The subject of virus UV susceptibility has been extensively studied and the processes that occur at the molecular level have been quantified to an extraordinary degree, but the complexities of these processes seem to have precluded development of a complete quantitative model of virus inactivation. The pieces to this complex puzzle have, in fact, been available in the literature for some time, particularly in the works of Setlow and Carrier (1966), Smith and Hanawalt (1969), Becker and Wang (1989), and others, but what was unavailable was specific knowledge about the genomes. Through the efforts and industry of molecular biologists, this gap has been filled over the previous two decades and a large number of viruses have had their genomes sequenced and published. This paper applies the basic inactivation models originally proposed by various researchers to an assortment of viral genomes from the NCBI database (NCBI 2009) and statistically evaluates the correlation with known UV D90 values. With some enhancements of the basic model and adjustments to the parameters, a new model is developed herein that provides fairly accurate predictions for both RNA and DNA viruses. This model also includes a new ultraviolet scattering model developed by the authors that contributes to the overall accuracy of the DNA model.

Rate Constant Determinants

Various intrinsic factors determine the sensitivity of a virus to UV exposure under any set of constant ambient conditions of temperature and humidity. These include, but may not be limited to, the following species-dependent properties:

• Physical size
• Molecular weight of DNA or RNA
• DNA Conformation (A or B)
• Presence of chromophores or UV absorbers
• Propensity for clumping or agglutination
• Presence of repair enzymes or dark/light repair mechanisms
• Hydrophilic surface properties
• Relative Index of Refraction
• Specific UV spectrum (broad band UVC/UVB vs. narrow band UVC)
• G+C% and T+A%
• % of Potential Pyrimidine or Purine Dimers

          The physical size of a virus bears no clear relationship with UV susceptibility, except that for the largest viruses, as size increases, the UV rate constant tends to decrease slightly (which is likely the result of UV scattering as discussed later). It might be expected that physical size would confer photoprotection through thickness alone, but it appears that the protein composition of the capsid, not its thickness, is a more important determinant due to the presence of UV-absorbing chromophores (Webb 1965).

         Molecular weight has sometimes been cited as a factor in UV susceptibility (David 1973). However, it has been demonstrated that shearing DNA molecules to half size and then irradiating them does not alter the number of pyrimidine dimers or viral DNA inactivation, and this could be considered evidence that UV-induced damage to DNA is independent of molecular weight (Scholes et al 1967). Figure 1 illustrates this effect for a large number of viruses

– there is no clear relationship between molecular weight (or genome size) and the D90 values for any of the virus types – double-stranded DNA, single-stranded RNA, double-stranded RNA, or single-stranded DNA.


          Other suspected determinants can be dismissed because of their relatively minor or insignificant effect, such as the specific UV spectrum, the presence of repair enzymes, and hydrophilic surface properties. The DNA conformation is apparently a factor, but in this paper DNA viruses (in water – B conformation) are treated separately from RNA viruses (A-conformation). The matter of chromophore content and the relative index of refraction cannot be fully resolved at present and although they might be major factors, they are left for future research. Similarly, the propensity for clumping, which has been noted to be a protective factor, cannot be resolved due to a lack of detailed knowledge regarding chromophore content of envelopes and nucleocapsids.

          One criteria worth examining in detail is the genomic G+C or T+A content. The genomic GC content of RNA viruses varies from about 30-60%, while that of DNA viruses varies from about 30-75%. One study on UVB irradiation of bacteria reports a strong correlation between the formation of cytosine-containing photoproducts with increasing GC content (Matallana-Surget 2008). Since an increased thymine content will likely result in a proportional increase in photodimers of the TT and CT variety, it could be expected that there must be some statistical relationship between G+C% (or conversely with T+A%) and UV susceptibility. Figure 2 shows the results of this comparison for 27 double stranded DNA viruses and Figure 3 shows the same for 28 single-stranded DNA viruses irradiated in water. These comparison represent average UV rate constants for virus species where there are more than one data set. The DNA viruses show no significant correlation. The RNA viruses, however, show a fairly good correlation with an R2 of about 45%. This latter result, however promising, represents the limits of GC% content as a predictor of UV susceptibility, since it only provides an indicator of the presence of thymine doublets and triplets rather than an exact accounting of the potential dimers in any genome.


The UV Scattering Model


Viruses, which are about 0.02 microns and larger, are subject to ultraviolet scattering effects due to the fact that their size is very near the wavelength of ultraviolet light. The effect of scattering is to reduce the effective irradiance to which the microbe is exposed, and it is necessary to account for this attenuation before proceeding with the genomic model. The interaction between ultraviolet wavelengths and the particle is a function of the relative size of the particle compared with the wavelength, as defined by the size parameter:

The scattering of light is due to differences in the refractive indices between the medium and the particle (Modest 1993, Garcia-Lopez et al 2006). The scattering properties of a spherical particle in any medium are defined by the complex index of refraction:

where n = real refractive index

κ = imaginary refractive index (absorptive index or absorption coefficient) The process of independent Mie scattering is also governed by the relative refractive index, defined as follows:

         The refractive index of microbes in visible light has been studied by several researchers. Balch et al (2000) found the median refractive index of four viruses to be 1.06, with a range of 1.03-1.26. Stramski and Kiefer (1991) assumed viruses to have a refractive index of 1.05. Biological cells were assumed by Mullaney and Dean (1970) to have relative refractive indices of about 1.05 in visible light. Klenin (1965) found S. aureus to have a refractive index in the range 1.05-1.12. Petukhov (1964) gives the refractive index of certain bacteria in the limits of 1.37-1.4. There are no studies that address the real refractive index of bacteria or viruses at UV wavelengths except Hoyle ans Wickramasinghe (1983) who suggest ns = 1.43 as a reasonable choice for coliform bacteria. Water has a refractive index of nm = 1.4 in the ultraviolet range. If we scaled the refractive index of viruses (Balch’s value) to that of water (from visible to UV), the estimated real refractive index would be 1.06(1.4/1.33) = 1.12. Garcia-Lopez et al (2006) state that for soft-bodied biological particles n is between 1.04-1.45. All things considered, we choose n = ns = 1.12 for the real refractive index of viruses under UV exposure. In fact, any value in the range 1.03-1.45 seems to have very little net impact on the fraction of scattered UV irradiation as was verified by multiple trials.

          For the imaginary refractive index (the absorptive index) in the UV range no information is available. Per Garcia-Lopez et al (2006), hemoglobin has a κ of 0.01-0.15, while polystyrene has a κ of 0.01-0.82. However, we can reasonably assume a value comparable to that of water, k=1.4, or any value in the range of the real refractive indices given above as they have even less overall impact than the choice of the real refractive index. These values were used as input to a Mie Scattering program (Prahl 2009) to estimate the effects of UV scattering at the wavelength of 253.7 nm, and with negligible concentrations (0.000001 spheres/μm3).


          Table 1 below summarizes the primary parameters computed by the Mie Scattering program in the first eight columns (Prahl 2001), including the scattering efficiency (Qsca), the extinction efficiency (Qext), the absorption efficiency (Qabs), the scattering cross-section (Csca), the extinction cross-section (Cext), and the absorption cross-section (Cabs). The efficiency terms are essentially self-defining but readers may consult the references for detailed definitions and further information on Mie theory (Modest 1993, vandeHulst 1957, Bohren and Huffman 1983). The scattering cross-section represents the area which when multiplied by the incident irradiance gives the power scattered by the particle. The extinction cross-section represents the area which when multiplied by the incident irradiance gives the total power removed from the incident wave by scattering and absorption. The final column shows the computed ratio of the scattering cross-section to the extinction cross-section, which represents the fraction of total irradiance that is scattered away. This fraction is used to reduce the UV exposure dose for the microbes in the genomic model.


           Figure 4 illustrates three parameters from Table 1, the scattering efficiency, the absorption efficiency, and the scattered fraction of incident UV irradiance. The reason that the efficiencies exceed a value of unity is due to the extinction paradox – the fact that in this size range more light can be intercepted than would be by the size of the spherical particle alone. It can be observed that the scattering efficiency increases sharply through the DNA virus size range while the absorption efficiency peaks and then decreases. It can also be seen that the fraction of scattered UV is relatively minor for most RNA viruses, but increases sharply through the DNA virus size range, approaching a limit of about 0.68. The values for UV scatter, last column in Table 1, are hereafter used to decrease the incident UV irradiance (in effect decreasing the UV dose), and may be thought of as correction factors.


          Table 2 shows the diameters of the viruses used in this study and the associated UV scatter correction factors, (which are later applied to the raw D90 values shown in Tables 3 and 4). Virus diameters were obtained from various sources (i.e. Kowalski 2006) and some online databases. Diameters are generally logmean values of the smallest dimension or logmean values of ovoid envelopes, since the logmean value always represents the natural distribution when multiple sizes or a range of sizes occurs. It should be noted that for larger viruses that have an envelope, secondary UV scattering effects may also occur in the nucleocapsid, but these effects are ignored in the current model.


The Genomic UV Susceptibility Model

Double stranded DNA viruses are likely to be the most resistant to UV than single stranded viruses and therefore separate models for ssRNA and dsDNA are appropriate (Gerba et al (2002). Van der Eb and Cohen (1967) demonstrated that the double stranded version of Polyoma virus DNA was four times more resistant to UV inactivation. Capsid structure, as well as nucleic acid size, render double-stranded DNA less susceptible to UV inactivation (Thurston-Enriquez et al 2003). Based on an extensive review of UV rate constants (data not shown), this does appear to be the case, with dsDNA and dsRNA viruses having almost half the UV rate constant of ssRNA and ssDNA viruses.

          The disruption of normal DNA processes occurs as the result of the formation of photodimers, but not all photoproducts appear with the same frequency. Purines are approximately ten times more resistant to photoreaction than pyrimidines (Smith and Hanawalt 1969). Minor products other than CPD dimers, such as interstrand cross-links, chain breaks, and DNA-protein links occur with much less frequency, typically less than 1/1000 of the number of cyclobutane dimers and hydrates may occur at about 1/10 the frequency of cyclobutane dimers (Setlow 1966). Although irradiated vegetating cells produce large amounts of cyclobutane pyrimidine dimers, thymine-containing photoproducts isolated from bacterial spores do not include cyclobutane pyrimidine dimers but include spore photoproducts. Spore photoproducts decrease when the spore transforms to a vegetative state and thymine dimers increase. Spore photoproducts also appear in dry DNA (A conformation) and in RNA, which is in permanent A conformation. For DNA, the thymine dimers decrease under dry conditions (A-DNA) and the spore photoproduct is formed and can become the dominant photoproduct (Rahn and Hosszu 1969). The rate of spore photoproduct formation is unaffected by high concentrations of thymine dimers but high concentrations of spore photoproduct inhibit dimer formation. Wang (1964) first suggested that dimerization is favored when adjacent pyrimidine triplets in ice are suitably oriented and positioned. The effect of base composition can impact the intrinsic sensitivity of DNA to UV irradiation (Smith and Hanwalt 1969). The specific sequence of adjacent base pairs, as well as the frequency of thymines, can de determinants of UV sensitivity. Setlow and Carrier (1966) stated that the probability of photodimerization is approximately proportionally to the nearest-neighbor frequencies of the various pyrimidine sequences. Some 80% of pyrimidines and 45% or purines form UV photoproducts in double-stranded DNA, per studies by Becker and Wang (1989), who also showed that purines only form dimers when adjacent to a pyrimidine doublet. The formation of purine dimers requires transfer of energy in neighboring pyrimidines, and will only occur on the 5’ side of the purine base (50% probability). Becker and Wang (1985) formulated these simple rules for sequence-dependent DNA photoreactivity:

1. Whenever two or more pyrimidine residues are adjacent to one another, photoreactions are observed at both pyrimidines.

2. Non-adjacent pyrimidines, surrounded on both sides by purines, exhibit little or no photoreactivity.

3. The only purines that readily form UV photoproducts are those that are flanked on their 5’ side by two or more contiguous pyrimidine residues.

These rules can be used to extract information from DNA and RNA genomes and will enable computation of the relative probability of photoreactions taking place, a parameter that can be directly compared to UV rate constants as a possible predictor. Table 1 summarizes these rules in terms that can be computed numerically. The doublets and triplets in Table 3 were counted using base counting software written by the author (in C++) and reading from genomes obtained from NCBI (2009). Similar base-counting programs (wordcount programs) are publicly available, such as EMBOSS (Rice et al 2000).

Ignoring the other factors that may determine the UV rate constant, such as the protein coat in viruses and the cell walls of bacteria, about which not enough is known, a function can be written to sum the dimerization probabilities. The probability density map of a spherical genome can be represented by a circular cross-section of the sphere which is subject to a collimated beam of irradiance. The volume of the sphere will be directly proportional to the genome size, since the nucleic acids are essentially packed tight inside a capsid, and because almost all animal viruses of interest are spherical, ovoid, or possess a spherical capsid atop a tail. The size of the model sphere is directly proportional to the base pairs (bp) of the genome cubed, and the area of the cross-section is then the square root of the cube of the base pairs, as illustrated in Figure 4. The dimerization probabilities can be viewed as collapsed onto a circular cross-section exposed to a collimated beam of UV rays. The probability map is illustrative purposes only – the square root of the total dimer probabilities is assumed to be distributed evenly across the cross-sectional area in this model.

Figure 4: The spherical model of DNA has a circular cross-section with a collapsed

dimerization probability density map subject to collimated UV rays.

The square root of the dimer probabilities, counted as per Table 1, is used because it was found on analysis that this produces the best fit overall (for both RNA and DNA), and so without further theoretical justification the dimerization probability equation for ssRNA viruses is written:

Some evidence is available in the literature to allow some starting estimates of the dimer proportionality constants. Per Setlow and Carrier (1966) the average for three bacteria is 1:0.25:0.13. Patrick (1977) suggests ratios of 1:1:1. Unrau (1973) found the ratio was 1:0.5:0.5. Meistrich et al (1970) indicate that in E. coli DNA, the proportions of TT dimers, CT dimers, and CC dimers are in the ratio 1:0.8:0.2, as did Lamola (1973). Table 4 lists 62 of the 70 virus data sets that were used in the ssRNA model, along with the average rate constants and the average D90 values representing 27 single-stranded RNA viruses. These D90 values are not adjusted for UV scatter (per the Table 2 correction factors).

Figure 6: Plot of Dv versus effective UV dose for DNA viruses – the D90 is the effective dose because it has been corrected for UV scattering. The line represents a curve fit (equation shown on graph). A total of 77 data sets were used, weighted in the curve fit of the 22 viruses.

          The lower R2 value may be due to the previously mentioned factor of size – DNA viruses are larger than RNA viruses and may have more innate photoprotection. No available data was omitted from Figure 5 other than a few redundant data sets that were unavailable and the only real outliers are the four Adenovirus sets at about Dv=0.7. Adenovirus is unusually resistant to UV and may have a chromophore-rich envelope to protect the DNA from UV damage or may have robust photorepair mechanisms. Adenoviruses also have hemagglutinins on their outer surfaces that may cause them to clump or aggregate. The aggregation of cells or virions can drastically affect the absorbance through scattering of the incident light (Smith and Hanawalt 1969). Future research on such outliers may provide insight into photoprotection that will lead to improved models of UV susceptibility.

           Table 6 compares the published estimates of the relative proportions of the various dimer types with the values used in the previous models. The factors shown in the table are the three constants in equations (7) and (16). The best fit constants are those that were used in the model in the previous Figures. The zero values assumed for the constants that were not given by the indicated sources did not have any great influence of the R2 value. The hyperchromicity factor was zero for all RNA models, and kept at 0.67 for all DNA models. The results for the DNA model are shown with and without corrections for UV scattering, which make about an 12% difference in the DNA model, but had only a 1% difference on the RNA model, as would be expected from their size. Hyperchromicity had no effect on the RNA model but produced a 1% improvement in the DNA model.




A mathematical model has been presented for the prediction of UV susceptibility of RNA and DNA viruses based on base-counting of potential dimers in the virus genomes. The results correlate well with available data on UV rate constants. This model has been used to estimate the UV rate constants for a range of pathogenic animal viruses and bioweapon agents for which complete genomes were available from the NCBI database and Table 7 summarizes these predictions. Minimum and maximum D90 values are listed that are within the confidence intervals (CIs) of 86% for DNA viruses and 93% for RNA viruses. These CIs represent only the intervals of the data as summarized and do not include any uncertainty in the original 147 data sets, most of which included no error analysis. These rate constant predictions remain to be corroborated by future laboratory testing. Future research will include application of the DNA model to bacteria. Although this genomic model is based on UV rate constants in water it has a direct bearing on airborne UV rate constants as well, since by establishing a theoretical basis for the UV susceptibility of viruses in water, it becomes possible to link them to airborne rate constants – water-based rate constants represent a limit towards which airborne rate constants converge in high humidity (Peccia et al 2001). The variation of UV rate constants with relative humidity (RH) in air is also a function of the DNA conformation which, in turn, determines the relative ratios of pyrimidine dimers, and so a more fundamental understanding of RH effects, and a testable model, may now be possible. Future research into a more complete model of virus inactivation that addresses the photoprotective effects of UV scattering and UV absorption by viral envelopes and nucleocapsids may lead to even greater predictive accuracy. The limits of accuracy of the present model may also be improved as more genomes and data on UV rate constants become available, and as more precise UV experiments are performed using collimated beam systems, and the authors hope that researchers will be inclined to either challenge or confirm the predictions in Table7. If the latter is the case, this model may ultimately enable UV susceptibilities of dangerous pathogens to be determined without the risk of handling them in laboratory tests. The novel approach developed for this research, the use of base-counting software to establish dimerization probabilities, may also have applications in fields unrelated to air and water disinfection, such as ultraviolet photochemistry, mutation research, and solar mutagenesis or skin cancer research.



