I have been learning about Machine Learning (via Udacity) and Statistics (via Coursera) the past few months and trying to figure out a good way to combine them for a general approach to explaining data trends. I am aware so much of ML and stats rely on domain knowledge, and that without any domain knowledge we easily encounter the results of the "No Free Lunch" theorem. As such, I have incorporated domain knowledge in my approach accordingly.I walk through a simple example of stock market volumes below.
I know the answer to this question is "It depends on the data and a variety of factors", but it I wanted a general rule of thumb on how to approach a situation, does a general approach such as the below work?
Sample case study: The stock market is higher in year X compared to year Y, why?
Approach to analyzing the trend:
1) Exploratory Analysis: Using my domain knowledge of the stock market it is reasonable to look at features X_1, X_2,..X_n. I then perform exploratory data analysis on each of the features relative to the stock market value (ie, how has the volatility market place changed over time relative to the value of the Nasdaq, for example). I do this for each one of the features I thought was worth investigating to get a feel for the data.
2) Feature Selection : Use either information gain, or regression (or both) to see determine which features have the highest predictive power. Take the top n features and use them in the next step.
3) Hypothesis Tests: Run hypothesis tests on the data to try to see if each of the individual features had an impact on the market volume change.
4) Supervised Learning: With fewer features, we need less data to generalize our findings and are less likely to falling prey to the curse of dimensionality.So we can then can train our historical data on a decision tree, use cross validation, and generate a model. We can then use this model as a heuristic to see what feature had the largest impact on classifying an "up" stock market instead of a "down" stock market.
Is this a reasonable way to look to analyze events that have transpired (in this hypothetical example, the "events" being that the stock market went up)? I am trying to integrate all of the content from my introductory courses in a way that can be practical and aid in drawing insights from data. This is a general question having nothing to do with stock market volume, I just wanted to use an example for the sake of discussion if it made life easier.