[link]
Summary by Shagun Sodhani 8 years ago
It presents an informal discussion about do’s and dont’s of research — more specifically experimental study of algorithms. The author presents 10 important principles backed by examples from his long experience. I am listing points from the reading.
#### 1. Solve a problem worth solving.
Take some time to find a good problem. Decide what questions you want to answer before you actually start finding an answer. Finding a new solution would take up time and resources. So think before you compute. Do exploratory research and experimentation. But do not get into the endless loop of exploration.
#### 2. Tie your paper to the literature.
Know the prior art for your problem. Find what are the existing solutions. Think if another solution is actually needed and if yes, what needs to be improved. Tell how your work relates to existing literature. This will help you and your readers. It will make your experiments newsworthy.
#### 3. Use instance testbeds that can support general conclusions.
When doing research, especially experimental research, you are highly likely to use data — for making some hypothesis, for validating results and so on. More general your data set is, more general or applicable your results would be. Applying conclusions drawn from a specific data set on a general problem would be disastrous.
#### 4. Document everything.
Document your code. Document your data sets. Document your experimental setup, your results, your mistakes, your conclusions. Document everything that some other person would need to replicate your experiments. This will help your future-self save some time and he would be grateful to you. Read #6. When using randomly generated data sets, use the same data set for benchmarking all approaches. It would reduce time and variance. Use sampling and bootstrapping.
#### 5. Use reasonably efficient implementations.
Seems obvious. But efficiency requires effort and we may be tempted especially when execution time in not a metric of interest. More efficient code means you can perform more experiments and on larger data sets. But do not over-optimise. Remember — Premature optimization is the root of all evil.
#### 6. Ensure Reproducibility.
Provide information on how to reproduce your experiments. Read #4. See gitxiv. Share your code and data set with others. If the data set is generated using some algorithm, share its details. Use reproducible standards of comparison. Do not say “results after running the algorithm for 1 hour”. Machines are evolving every day and hence doing more operations than before for the same amount of time.
#### 7. Ensure Compatibility.
Comparability goes beyond reproducibility and documentation. Benchmark the machine speed and the other experimental setup that you use so that other researchers can accurately compare their result with yours. Take a full backup of your code and data.
#### 8. Report the full story.
Report the complete story, including the parts you do not like. Do not hide anomalous results. Instead, try to explain them. It could lead to more insights to the problem and to the solution. Report the full running time, even if time is not a metric of interest from the problem’s point of view. Many times readers want to see how your solution performs on the scale of time.
#### 9. Draw well-justified conclusions and look for explanations.
Summarize trends, infer implications and provide conclusions. If there are no conclusions and inferences, then you fail the newsworthiness test. Probably you picked the wrong question or your solution is incomplete. Justify your conclusions with data. If needed, use techniques like profiling.
#### 10. Present Your Data in Informative Ways.
Your data representation should highlight what you want the reader to infer from the data. This can include things like appropriate ordering of columns in a table or choice of variables to be shown along the axis. If you want the reader to compare percentage rise in time, provide a column for that. Do not make the reader do the arithmetic. Take care of not overwhelming your readers with data alone. Put additional data in appendix.
more
less