Hi,

Apology for this question being off the topic (OT) of

R, though I expect

this list might be the best place on the net to ask

this question.

In brief, the question is: what classification

algorithm

can one use if the features are histograms?

I have a classification problem, and believe that

histograms

of the distribution of some values may be the best

"feature" to use.

To make the mail shorter, here's a simpler example

problem:

Try to classify a person as e.g. drunk or not given

the histogram

of their driving speed.

In the training phase, we have a table whose rows

contain the driver,

whether they are drunk, and a sample of driving speed.

>From this one can build separate histograms of driving

speed

for drunk/non drunk.

(In my actual application, I have several such

histogram features, and they

are visibly different; they are also ranked now by

some analytic

pdf-distance measures such as KL).

Now, how to classify...

given a single speed, its probability can be evaluated

under the two classes,

but a single speed sample is not going to be reliable

in this problem.

Suppose instead that the _distribution_ of speeds is

sufficient

to discriminate.

We have a driver, and a distribution of their speeds

over time. A histogram

can be built. What to do with this histogram?...

Is there a standard classifier that can deal with this

situation?

My thought(s):

- the test histogram could be compared to each

of the training histograms with the Chi^2 measure -

sum of squared Gaussian deviations, then get a

probability from this?

- Alternately, consider training histograms with n

bins as points

in N-dimensional space, use euclidean closeness in

this space.

This may not generalize to more than one such

histogram feature though....

Thanks for any thoughts.

(Also thanks for the replies to my recent question

about hashtable/dictionary.)

______________________________________________

[hidden email] mailing list

https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide!

http://www.R-project.org/posting-guide.html