Popular Post

Friday, June 14, 2019


Re-posting interesting article from


R vs. Python for Data Science

Norm Matloff, Prof. of Computer Science, UC Davis; my bio

Hello! This Web page is aimed at shedding some light on the perennial R-vs.-Python debates in the Data Science community. As a professional computer scientist and statistician, I hope to shed some useful light on the topic. I have potential bias — I've written 4 R-related books, and currently serve as Editor-in-Chief of the R Journal — but I hope this analysis will be considered fair and helpful.


Clear win for Python.
This is subjective, of course, but having written (and taught) in many different programming languages, I really appreciate Python's greatly reduced use of parentheses and braces:
if x > y: 
   z = 5
   w = 8
if (x > y)
   z = 5
   w = 8
Python is sleek!

Learning curve

Huge win for R.
To even get started in Data Science with Python, one must learn a lot of material not in base Python, e.g., NumPy, Pandas and matplotlib.
By contrast, matrix types and basic graphics are built-in to base R. The novice can be doing simple data analyses within minutes. Python libraries can be tricky to configure, even for the systems-savvy, while most R packages run right out of the box.

Available libraries

Call it a tie.
CRAN has over 12,000 packages. PyPI has over 183,000, but it seems thin on Data Science.
For example, I once needed code to do fast calculation of nearest-neighbors of a given data point. (NOT code using that to do classification.) I was able to immediately find not one but two packages to do this. By contrast, just now I tried to find nearest-neighbor code for Python and at least with my cursory search, came up empty-handed; there was just one implementation that described itself as simple and straightforward, nothing fast.
The following searches in PyPI turned up nothing: log-linear model; Poisson regression; instrumental variables; spatial data; familywise error rate; etc.

Machine learning

Slight edge to Python here.
The Pythonistas would point to a number of very finely-tuned libraries, e.g. AlexNet, for image recognition. Good, but R versions easily could be developed. The Python libraries' power comes from setting certain image-smoothing ops, which easily could be implemented in R's Keras wrapper, and for that matter, a pure-R version of TensorFlow could be developed. Meanwhile, I would claim that R's package availabity for random forests and gradient boosting are outstandng.

Statisical correctness

Big win for R.
In my book, the Art of R Programming, I made the statement, "R is written by statisticians, for statisticians," which I'm pleased to see pop up here and there on occasion. It's important!
To be blunt, I find the machine learning people, who mostly advocate Python, often have a poor understanding of, and in some cases even a disdain for, the statistical issues in ML. I was shocked recently, for instance, to see one of the most prominent ML people, state in his otherwise outstanding book that standardizing the data to mean-0, variance-1 means one is assuming the data are Gaussian — absolutely false and misleading.

Parallel computation

Let's call it a tie.
Neither the base version of R nor Python have good support for multicore computation. Threads in Python are nice for I/O, but parallel computation using them is impossible, due to the infamous Global Interpreter Lock. Python's multiprocessing package is not a good workaround, nor is R's 'parallel' package. External libraries supporting cluster computation are OK in both languages.
Currently Python has better interfaces to GPUs.

C/C++ interface

Slight win for R.
Though there are tools like swig etc. for interfacing Python to C/C++, as far is I know there is nothing remotely as powerful as R's Rcpp for this at present. The Pybind11 package is being developed.
In addition, R's new ALTREP idea has great potential for enhancing performance and usability.
On the other hand, the Cython and PyPy variants of Python can in some cases obviate the need for explicit C/C++ interface in the first place.

Object orientation, metaprogramming

Slight win for R.
For instance, though functions are objects in both languages, R takes that more seriously than does Python. Whenever I work in Python, I'm annoyed by the fact that I cannot print a function to the terminal, which I do a lot in R.
Python has just one OOP paradigm. In R, you have your choice of several, though some may debate that this is a good thing.
Given R's magic metaprogramming features (code that produces code), computer scientists ought to be drooling over R.

Language unity

Horrible loss for R.
Python is currently undergoing a transition from version 2.7 to 3.x. This will cause some disruption, but nothing too elaborate.
By contrast, R is rapidly devolving into two mutually unintelligible dialects, ordinary R and the Tidyverse. Sadly, this is a conscious effort by a commercial entity that has come to dominate the R world, RStudio. I know and admire the people at RStudio, but a commercial entity should not have such undue influence on an open-source project.
It might be more acceptable if the Tidyverse were superior to ordinary R, but in my opinion it is not. It makes things more difficult for beginners. E.g. the Tidyverse has so many functions, some complex, that must be learned to do what are very simple operations in base R. Pipes, apparently meant to help beginners learn R, actually make it more difficult, I believe. And the Tidyverse is of questionable value for advanced users.

Linked data structures

Likely win for Python.
Classical computer science data structures, e.g. binary trees, are easy to implement in Python. While this can be done in R using its 'list' class, I'd guess that it is slow.

R/Python interoperability

RStudio is to be commended for developing the reticulate package, to serve as a bridge between Python and R. It's an outstanding effort, and works well for pure computation. But as far as I can tell, it does not solve the knotty problems that arise in Python, e.g. virtual environments and the like.
At present, I do not recommend writing mixed Python/R code.


  1. Independent of the size of an endeavor, it in every case needs the influence that information examination (of the correct information obviously) can give. data science course in pune

  2. Well, the most on top staying topic is Data Analytics. Data Analytics is one of the most promising technique in the growing world. I would like to add Data Analytics training to the preference list. Out of all, Data analytics course in Mumbai is making a huge difference all across the country. Thank you so much for showing your work and thank you so much for this wonderful article.

  3. This is evidence that such innovation can possibly change organizations and altogether increment profitability. machine learning course

  4. Attend The Data Science Courses in Bangalore From ExcelR. Practical Data Science Courses in Bangalore Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Data Science Courses in Bangalore.
    ExcelR Data Science Course Bangalore

  5. Attend The Data Analytics Courses From ExcelR. Practical Data Analytics Courses Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Data Analytics Courses.
    ExcelR Data Analytics Courses

  6. Such a very useful article. I have learn some new information.thanks for sharing.
    data scientist course in mumbai

  7. I was blown out after viewing the article which you have shared over here. So I just wanted to express my opinion on Data Science, as this is best trending medium to promote or to circulate the updates, happenings, knowledge sharing.. Aspirants & professionals are keeping a close eye on Data science course in Mumbai to equip it as their primary skill.

    Data science course in mumbai


  8. Excelr is providing emerging & trending technology training, such as for data science, Machine learning, Artificial Intelligence, AWS, Tableau, Digital Marketing. Excelr is standing as a leader in providing quality training on top demanding technologies in 2019. Excelr`s versatile training is making a huge difference all across the globe. Enable ?business analytics? skills in you, and the trainers who were delivering training on these are industry stalwarts. Get certification on "
    data science training institute in hyderabad
    and get trained with Excelr.

  9. Such a very useful Blog. Very interesting to read this article. I have learn some new information.thanks for sharing. know more about

  10. After reading your article I was amazed. I know that you explain it very well. And I hope that other readers will also experience how I feel after reading your article.
    ExcelR Data Analytics courses

  11. Cool stuff you have and you keep overhaul every one of us.
    excelr data science

  12. Awesome..I read this post so nice and very imformative information...thanks for sharing
    Click here for data science course

  13. I am looking for and I love to post a comment that "The content of your post is awesome" Great work!
    ExcelR data science course in mumbai

  14. thank you for the valuable information giving on data science it is very helpful.
    Data Science Training in Hyderabad

  15. nice information on data science has given thank you very much.
    Data Science Training in Hyderabad

  16. It is extremely nice to see the greatest details presented in an easy and understanding manner...!. pmp training in Bangalore

  17. It is extremely nice to see the greatest details presented in an easy and understanding manner..!. data science course Bangalore

  18. You actually make it look so easy with your performance but I find this matter to be actually something which I think I would never comprehend. It seems too complicated and extremely broad for me. I'm looking forward for your next post, I’ll try to get the hang of it!
    ExcelR Data Analytics Course
    Data Science Interview Questions

  19. I want to say thanks to you. I have bookmark your site for future updates. ExcelR Data Scientist Course Pune

  20. Awesome blog. I enjoyed reading your articles. This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work!
    data analytics course mumbai
    data science interview questions

  21. Very interesting blog. Alot of blogs I see these days don't really provide anything that I'm interested in, but I'm most definately interested in this one. Just thought that I would post and let you know.
    Data science course
    Data analytics course
    Business analytics course
    Data science interview questions