Dear list,

What on earth is spec.pgram() normalized too? If you would like to skip my

proof as to why it's not normed too the mean squared or sum squared

amplitude of the discrete function a[], feel free too skip the rest of the

message. If it is, but you know why it's not exact in spec.pgram() when it

should be, skip the rest of this message. The issue I refer herein refers

only too a single univariate time series, and may not reflect the issues

encountered in 3 or more dimensions.

I've been using Numerical Recipes too confirm my own routines, because the

periodogram estimate is not normalized to the sum squared, nor the mean

squared of the original discrete function, a[]. I'd expect the

non-overlapped, and non-tapered version of the periodogram too sum to

EXACTLY too some normalization of the discrete signal a[], and the word

"estimate" should come into play only when making inferences about the real

signal 'a' that the discretely sampled signal came from.

>a <- sin(2*pi*(0:127)/16)

>N <- length(a) # 128

>PSD <- spec.pgram(a, spans=NULL, detrend=F)

## Sum Squared amplitude of a[t] on [0, 127].

>sum(abs(a)^2)

[1] 64

## Mean Squared amplitude of a[t] on [0, 127]

sum(a^2)/N

[1] 0.5

By Parseval's theorem, the integral of the one-sided PSD over zero and

positive frequencies should equal the mean squared or sum squared amplitude

of the discrete signal a[], assuming the PSD is normalized too the

mean-square or sum squared of the signal a[] respectively.

## Integral of the PSD returned by spec.pgram() does not equal MS or SS of

a[]

>sum(PSD$spec)

[1] 32.28501

Since the integral over positive frequencies does not equal the mean or sum

squared of the signal, I did some digging. The documentation for

spec.pgram() states that only the power at positive frequencies is returned

'due to symmetry'. Press, et al (2002) make mention of this situation.

"Be warned that one occasionally sees PSDs defined without this factor two.

These, strictly speaking, are called two-sided power spectral densities, but

some books are not careful about stating whether one- or two-sided is to be

assumed" (2002, p. 504).

As a result, I infer that spec.pgram() is returning what Press, et al. would

call a two-sided PSD. Thus, the power spectrum returned by spec.pgram()

should be multiplied by 2 between (DC, nyquist) non-inclusive, and the mean

can be resolved separately as the DC component is not returned by

spec.pgram(), as:

## N/2 used here because the length of PSD$spec is N/2 rather than N/2 + 1

# as it does not return a DC value.

>mean(a) + PSD$spec[floor(N/2)] + 2 * sum(PSD$spec[2:floor(N/2)-1])

[1] 64.54347

This situation is closer, and indicates that the normalization is likely too

the sum squared amplitude of a[], but it is not EXACT, as I would expect.

But I also checked a manual PSD estimate just to show the power spectrum of

a single periodicity would actually match the mean or sum squared amplitude

of a[] (the discrete function) exactly.

>a.fft <- fft(a)

# Two-sided estimate normed to SS Amplitude

>1/N * sum(Mod(a.fft[1:floor(N/2)+1])^2)

[1] 32

# One-sided estimate normed to SS Amplitude. The first part gets the

# quantities across the nyquist range from 0 too fc. The next 2 lines

# take out the factor 2 from DC and nyquist since those 2 terms are not

# doubled in the one-sided estimate.

>a.PSD <- 1/N * 2 * Mod(a.fft[1:(floor(N/2)+1)])^2

>a.PSD[1] <- a.PSD[1]/2 #DC freq

>a.PSD[floor(N/2)+1] <- a.PSD[floor(N/2)+1]/2

>sum(a.PSD)

[1] 64

This proves that the integral of the one-sided PSD estimate of the discrete

function a[] is actually exactly the same as the sum squared amplitude of

a[], at least in this simple case. Likewise when the periodogram is

normalized to the mean-squared amplitude of a[],

>sum(a.PSD)/N # MS amplitude of a[]

[1] 0.5

## So I don't have to dimension a.PSD[], I took twice the modulus squared

# of f[0] too f[c] inclusive all at once, and took out the erroneous

# factor 2 from f[1] and f[c] independently

>a.PSD <- 1/(N^2) * 2 * Mod(a.fft[1:(floor(N/2)+1)])^2

>a.PSD[1] <- a.PSD[1]/2

>a.PSD[floor(N/2)+1] <- a.PSD[floor(N/2)+1]/2

Of note, is that I have yet to encounter a case where the normalized raw

periodogram didn't equal the quantity it was normalized too, even when

segmenting data into multiple PSD estimates, and taking their average at

each frequency. I have not applied a 'window function' (Welch, Hanning, etc)

yet, but the only case I encountered where it's not exact is with

overlapping segments, and that, I suspect, is because the first section of

the first window, and last section of the last window are only used once as

opposed to twice like all of the other data points. If the last segment

wrapped around too the first in order too form the last window, I'd expect

the norming to be exact again. I have yet to encounter an estimate returned

by spec.pgram() which actually did equal the norming conventions that I've

expected too see (MS or SS amplitude).

I have tried various combinations of parameters in spec.pgram() to turn off

tapering and spans, so that essentially there is no leakage correction of

the data so I could reproduce the MS and SS amplitudes exactly, just as I

did calculating them independently. Either that can not be done with

spec.pgram(), or spec.pgram() is normalized too something other than the sum

squared amplitude of the signal a[].

What does spec.pgram() normalize too? Perhaps the periodogram returned by

spec.pgram() is normalized to the variance or some other quantity? If so,

perhaps someone could indicate why that particular norming convention is

used?

The reason that this is so important is that the harmonic content fit by

least squares used in the chi-square periodogram is normalized to the

variance (I think), and so too is the Lomb periodogram. Those periodograms

can be used to get p-values from chi-square and exponential distributions

respectively, too assess "important" frequencies, but they're very slow

transforms. The fast fourier transform and/or periodograms can be related

too both methods, for fast computation (albeit, especially for the Lomb

method, it's not a direct relationship), so if the justification for

relating the fft's to any particular method for fast computation depends

partly on what the data are normed too, then it is important to be very

explicit about what conventions are used in any given routine. That way any

transforms on fourier amplitudes or, in the case of a periodogram, powers -

can be translated too and from one form of estimate or another; or if they

cannot be translated, the reason would be evident in the stated conventions.

Thus, I should hope that if spec.pgram() is normed too the SS amplitude of

the discrete function, someone could communicate why it isn't exactly the SS

amplitude, or conversely what the norming convention is.

Thank you very much for your feedback,

KeithC. - an aspiring signal analysis guru & honorary enginerd

[hidden email]
Psych Undergrad, CU Boulder

RE McNair Scholar

"Perhaps we're the reason they call it 'psycho'physics..."

______________________________________________

[hidden email] mailing list

https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide!

http://www.R-project.org/posting-guide.html