Help to develop novel association measure with t-SNE
I am an MSc. student in computing and I am currently writing my thesis on radiogenomics (imaging genomics) for brain cancer research. I NEED YOUR HELP.
My supervisor and I are thinking about developing a novel tool to test association between two datasets.
Essentially, in this project, I will have two datasets: i) Gene mutations, and ii) Imaging data.
One way to find associations between the multi-view datasets is to perform Canonical Correlation Analysis (CCA). However, this has been done before and often with poor results.
So what we are thinking about is the following algorithmic approach:
i. Reduce dimensions of both datasets using either PCA (reduction) or random forest (selection)
ii. Visualise both datasets separately with t-SNE
iii. Compare neighbors: Pick an ID/subject in one visualization and find its neighbors. Then pick the same ID in the other visualisation and see if neighbors are the same.
iv. Do this for all IDs
v. Use some metric to assess performance. If neighbours are similar the performance should be high.
How does this sound? More concretely:
1) Does the overall approach make sense?
2)To identify neighbors, I am thinking of using the Perplexity parameter of t-SNE (i.e. if perplexity = 30, pick 30 closest neighbours)
3)What would be a good metric to report aggregate results for this new algorithm?
Thanks a lot!