Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

the Scicloj stack https://scicloj.github.io/

in my opinion competes with python when it comes to DS/ML. I find it a lot more comfortable to use if you use emacs bindings



sort by: page size:

I am not in the ML/Datascience field but I use python a lot. Pycharm is my preferred IDE.

Python has sophisticated ML tools and Neural Net packages, I'm honestly not sure what they are called in Python land, but I see Python on the listings all the time for the top ML libraries. Clojure may also have really great frameworks available by now.

I was guessing C++, definitely not JS (very wild guess) There is a strong ecosystem for ML in Python (scikit-learn for example).

all the good ML tools are in python

Seems everybody uses Python for ML.

Actually, I use both but my production models are in Clojure.

I often end up implementing minor things myself using lower level abstractions (e.g., Linear Regressions or PCA with whitening using Matrix libraries) and I check the results and/or try new things using scikit-learn.

So in general, I'd say I do the programming (outputing intermediate CSVs, tests, web service, thread handling, UI, ...) in Clojure(Script), and try other approaches (e.g., other models/parameters/...) in Python.

I'm quite happy with this pipeline but probably to some extent because I really love to understand how things work and nothing pushes you to learn as much as a missing function in your ML library :-)


I'd hardly say MLlib matches sci-kit in the number of algorithms available! For example we recently had to resort to a third-party implementation of DBSCAN.

It does have most of the important ones though. Also the pipeline API feels slightly cleaner than the one in sci-kit.


Java and Scala? Who uses that in ML? Python has long been the best language for ML, with some competition from Matlab.

Our data scientists are learning Scala and Spark (MLLib) as a replacement for Python and R. So sure, maybe Python has long been the "best language for ML" but also one time in the not so far past "MySpace was the best social network"

Your web stack and your machine learning stack don't have to be the same. http://scikit-learn.org/ is Python and very popular for machine learning. Here is a in depth training video http://pyvideo.org/video/972/tutorial-scikit-learn-machine-l...

Hello, people of HN. Let me first say that this post is about promoting an open source project which I've been working on for the past 10 months or so. I'm leaving it here with hopes of getting in touch with other devs who might be interested in machine learning, Scala or both.

Those of you who have stumbled upon ML before will know that Python is the go-to language for data-related things. It has high-quality libraries for analysis, modeling, and visualization. scikit-learn is a notable example and for good reasons; it's well maintained, has a large community, it's performant and it has a really good API (there's a paper about how they designed it: https://arxiv.org/abs/1309.0238).

I had been looking for a Scala equivalent for quite some time and then finally decided to start coding it myself. The main reason is that JVM-based languages are very common for building data pipelines and having the ability to serve predictive models directly within the pipeline offers several advantages. Here's some data to back-up my claims: https://cloud.google.com/solutions/comparing-ml-model-predic... (comparison of serving the model within the pipeline vs. calling a REST API).

The project currently has two main goals. It tries to expose its functionality through an intuitive API (mimic scikit-learn but use idiomatic Scala features and functional constructs) and provides performant implementations of common algorithms (here is a limited set of comparisons with scikit-learn implementations: https://github.com/picnicml/doddle-benchmark).

Here are some links if you are interested in taking a look: - website: https://picnicml.github.io - GitHub repo: https://github.com/picnicml/doddle-model - code examples: https://github.com/picnicml/doddle-model-examples - a blog post: https://towardsdatascience.com/recognising-handwritten-digit...


if you are into ML libraries, take a look at fklearn, a scikitlearn-like lib, but written in a functional. Fun read to compare both side by side.

Why use Python though? Once you are rolling with an ML language wouldn't it make sense to use the C++ interfaces of the machine learning/data science projects that make Python interesting at all?

As a researcher in RL & ML in a big industry lab, I would say most of my colleagues are moving to JAX [https://github.com/google/jax], which this article kind of ignores. JAX is XLA-accelerated NumPy, it's cool beyond just machine learning, but only provides low-level linear algebra abstractions. However you can put something like Haiku [https://github.com/deepmind/dm-haiku] or Flax [https://github.com/google/flax] on top of it and get what the cool kids are using :)

Particularly curious how it compares to Wes' Python for Data Analysis, aside from the sklearn stuff.

Simplicity is one of the main traits of Python. It’s conducive to prototyping and GTD in a quick fashion. Libraries like scikit and pytorch have also helped developers build larger solutions using smaller building blocks without worrying too much about implementations.

Python suffers from serious performance constraints nevertheless and productionizing an ML service that requires real-time analysis is going to take some effort. For such systems, folks usually tend to lean towards a hybrid stack.


Can someone explain why should I pick this over scikit? I don't have any ML exp. I found ML quite magically :/ and totally difficult to start if you don't have a phd in mathematics

As a side note, a lot of tutorials I've seen on machine learning use Python, and I'm curious as to why. Is it simply the number of libraries that have been developed for ML tasks, or is there something about Python the language that makes it especially suitable (versus, say, Ruby or Haskell).

100% agree, and there are a number of efforts in the space. mlpack (https://www.github.com/mlpack/mlpack/), Shogun (https://www.shogun-toolbox.org/), and Shark (https://www.shark-ml.org/) are three that have been around for over a decade now. They're a little niche because C++ is not that popular for data science, but they are generally pretty fast (especially mlpack, which focuses on speed).
next

Legal | privacy