Coding

1.INTRODUCTION

Naturally, the question arises - what programming language to use for implementing factor investing strategies. Well, as most of the time, it depends on you and your programming skills, available resources, as well as on your goals and needs. Since we have experience in Excel, VBA, Matlab, R, and Python, we comment on each one of them and stress some pros and cons, without going into details.  Note, that other languages like Mathematica, Java,  C++ or C# are certainly also capable of solving most problems faced. However, given our limited exposure to these languages, we skip them for now. We also do not cover how you should design your data feed/database here. It is an important topic on its own, which should be taken care of thoroughly.

Before we come to particular languages, one very general remark applies to all of them (and also to the data):  garbage in, garbage out!

Try to avoid reinventing the wheel whenever you face a new problem. The more code you write yourself the higher the chance you make at least one error (eventually, the probability converges to one). Use the available resources as much as possible! With time we will provide code examples on our page, which will be mainly Python code.

 

2. POSSIBLE SOFTWARE/LANGUAGE CHOICES

 

Excel

Excel might be enough for some applications. If you, for example, simply plan to optimize a market value-size-momentum strategy on ETF level you might get away with an excel sheet. Maybe you want to have a look at Excel’s solver (quite a robust optimization algorithm). Define some objective function, get your optimal weights. Done. If you plan however, to start stock picking with an excel sheet, it is not a good idea. We don’t say it does not work. We have seen it, but sooner or later you will realize it was not a good idea. One advantage of Excel, is that almost everyone knows how to use. Hence, it is easy to share the results with almost everyone.

 

VBA

VBA is almost always applied in combination with Excel. VBA might be helpful in some situations where Excel reaches its limitations. However, VBA is not fun (at least for us).  The lack of basic libraries requires you to implement lots of algorithms (or to find code online), which you do not want to spend time on. As an example, VBA does not come with a simple sort function (you can utilize excel if you want though). Statistical functions are very basic, so more than a simple linear regression is not provided. However, sometimes an Excel/VBA solution is sufficient.

 

Matlab

Matlab is definitely suitable for testing and applying factor strategies. It is, of course, harder if you have only the basic version, without useful toolboxes installed. However, if needed, you get an own statistics or regression library built up quite fast. There is also lots of Matlab code available online for free. Matlab becomes especially handy, when you face linear algebra problems. For example, most portfolio problems can be stated with just a few lines of code. Matlab also comes with a sorting function, so you have the basics needed for building your portfolio. Matlab, of course, has its own limitations. A very important aspect in finance is data handling, which is, unfortunately, not Matlab’s key strength. Starting with performance issues of the built-in I/O parsers, especially if you parse non-numerical data. Generally, handling and dealing with mixed data types in Matlab is not exactly fun. The date functionality lacks features like, frequency conversions, but also aligning different data with each other always requires caution and resources. All of the above is definitely manageable, but in our opinion, not very convenient.

www.mathworks.com/products/matlab/

 

R

R would certainly be a good choice mainly for two reasons: solid data handling and the variety of available statistical libraries. All basic functions you need to get started with factor investing are available: sorting, linear regressions, optimizers, performance analytics and great visualisation and plotting resources thanks to ggplot2, just to mention a few. The biggest downside, as we see it, is that R is already quite specialized and tilted towards statistics. You should be fine as long as you do purely research, especially when you require a more recently introduced method or estimator. R-Studio is probably the best available IDE, which lacks some functionality compared, for example, to IDE resources available in Python.  However, this shouldn't bother you if you focus on research and not on software development.

https://www.r-project.org/

 

Python

Probably the best choice. The language is rich, as it is a multipurpose programming language. Python is easy to learn. Python is open source.  Syntax forces you to keep your code structure, which pays off especially for bigger and longer lasting projects, as maintenance is easier. Libraries like, numpy (for matrix algebra), pandas (data handling), statsmodels (statistical functions) and matplotlib (for plotting) provide you already with most of functions/classes you need to implement your own factor strategies. In particular and not surprisingly, pandas is of great use for factor investing, as it emerged partly out of AQR (a quantitative asset management company). Some people criticize the performance. However, compared to the other languages presented here, Python is probably the best performing language especially aiming at implementing factor investing. Although pure Python can be too slow, there are many options for high performance computing. For example, vectorization with NumPy or compiling with Numba can speed up the code execution by orders of magnitude, comparing to pure Python. We recommend the anaconda distribution of Python, as it comes with the most popular scientific packages - https://www.continuum.io/downloads. Our IDE recommendation is pycharm - https://www.jetbrains.com/pycharm/, if you prefer a lighter IDE than pycharm, try out spyder (it comes with anaconda). We will stop here, all the rest you will find online, https://www.Python.org/ provides all basic information about Python. One last comment, if you start to learn it, start with Python 3.x.

 

3. SUMMARY

To summarize the key points, it all depends, but generally: try to avoid Excel/VBA unless you are really forced to use it (unfortunately it happens way too often), Matlab/R/Python are all good choices, in doubt choose Python. And don’t forget, the wheel has already been invented, see http://www.stackoverflow.com/