feature extraction in high dimensional data
Hi All:
I have a question for you. Given a training set of n m-dimensional
samples, if n is not significantly larger than m, what would be a good
number of features to extract. We definitely don't want to use all m
dimensions.
In other words, the question would be, for two sets of random noise
samples with same normal distribution, n samples each with m-dimension,
what's the expected separability value (like Jeffries-Matusita) of this
two sets. Of cuz, the mean and covariance will be calculated from the
sample sets. We all know when n >> m, the JM value equals to zero. But
how about when n is at the magnitude of m?
I appreciate your inputs.
Fnews-brouse 1.9(20180406) -- by Mizuno, MWE <mwe@ccsf.jp>
GnuPG Key ID = ECC8A735
GnuPG Key fingerprint = 9BE6 B9E9 55A5 A499 CD51 946E 9BDC 7870 ECC8 A735