Don't use it:
> By providing the beta version of FireDucks free of charge and enabling data scientists to actually use it, NEC will work to improve its functionality while verifying its effectiveness, with the aim of commercializing it within FY2024.
In other words, it's free only to trap you.
Important to upvote this. If there's room for improvement for Polars (which I'm sure there is), go and support the project. But don't fall for a commercial trap when there are competent open source tools available.
I thought I saw on the documentation that it was released under the modified BSD license. I guess they could take future versions closed source, but the current version should be available for folks to use and further develop.
I don't trust their benchmarks. I ran their benchmarks source locally on my machine TPCH scale 10. Polars was orders of magnitudes faster and didn't SIGABORT at query 10 (I wasn't OOM).
(.venv) [fireducks] ritchie46 /home/ritchie46/Downloads/deleteme/polars-tpch[SIGINT] $ SCALE_FACTOR=10.0 make run-polars
.venv/bin/python -m queries.polars
{"scale_factor":10.0,"paths":{"answers":"data/answers","tables":"data/tables","timings":"output/run","timings_filename":"timings.csv","plots":"output/plot"},"plot":{"show":false,"n_queries":7,"y_limit":null},"run":{"io_type":"parquet","log_timings":false,"show_results":false,"check_results":false,"polars_show_plan":false,"polars_eager":false,"polars_streaming":false,"modin_memory":8000000000,"spark_driver_memory":"2g","spark_executor_memory":"1g","spark_log_level":"ERROR","include_io":true},"dataset_base_dir":"data/tables/scale-10.0"}
Code block 'Run polars query 1' took: 1.47103 s
Code block 'Run polars query 2' took: 0.09870 s
Code block 'Run polars query 3' took: 0.53556 s
Code block 'Run polars query 4' took: 0.38394 s
Code block 'Run polars query 5' took: 0.69058 s
Code block 'Run polars query 6' took: 0.25951 s
Code block 'Run polars query 7' took: 0.79158 s
Code block 'Run polars query 8' took: 0.82241 s
Code block 'Run polars query 9' took: 1.67873 s
Code block 'Run polars query 10' took: 0.74836 s
Code block 'Run polars query 11' took: 0.18197 s
Code block 'Run polars query 12' took: 0.63084 s
Code block 'Run polars query 13' took: 1.26718 s
Code block 'Run polars query 14' took: 0.94258 s
Code block 'Run polars query 15' took: 0.97508 s
Code block 'Run polars query 16' took: 0.25226 s
Code block 'Run polars query 17' took: 2.21445 s
Code block 'Run polars query 18' took: 3.67558 s
Code block 'Run polars query 19' took: 1.77616 s
Code block 'Run polars query 20' took: 1.96116 s
Code block 'Run polars query 21' took: 6.76098 s
Code block 'Run polars query 22' took: 0.32596 s
Code block 'Overall execution of ALL polars queries' took: 34.74840 s
(.venv) [fireducks] ritchie46 /home/ritchie46/Downloads/deleteme/polars-tpch$ SCALE_FACTOR=10.0 make run-fireducks
.venv/bin/python -m queries.fireducks
{"scale_factor":10.0,"paths":{"answers":"data/answers","tables":"data/tables","timings":"output/run","timings_filename":"timings.csv","plots":"output/plot"},"plot":{"show":false,"n_queries":7,"y_limit":null},"run":{"io_type":"parquet","log_timings":false,"show_results":false,"check_results":false,"polars_show_plan":false,"polars_eager":false,"polars_streaming":false,"modin_memory":8000000000,"spark_driver_memory":"2g","spark_executor_memory":"1g","spark_log_level":"ERROR","include_io":true},"dataset_base_dir":"data/tables/scale-10.0"}
Code block 'Run fireducks query 1' took: 5.35801 s
Code block 'Run fireducks query 2' took: 8.51291 s
Code block 'Run fireducks query 3' took: 7.04319 s
Code block 'Run fireducks query 4' took: 19.60374 s
Code block 'Run fireducks query 5' took: 28.53868 s
Code block 'Run fireducks query 6' took: 4.86551 s
Code block 'Run fireducks query 7' took: 28.03717 s
Code block 'Run fireducks query 8' took: 52.17197 s
Code block 'Run fireducks query 9' took: 58.59863 s
terminate called after throwing an instance of 'std::length_error'
what(): vector::_M_default_append
Code block 'Overall execution of ALL fireducks queries' took: 249.06256 s
Traceback (most recent call last):
File "/home/ritchie46/miniconda3/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/ritchie46/miniconda3/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/ritchie46/Downloads/deleteme/polars-tpch/queries/fireducks/__main__.py", line 39, in <module>
execute_all("fireducks")
File "/home/ritchie46/Downloads/deleteme/polars-tpch/queries/fireducks/__main__.py", line 22, in execute_all
run(
File "/home/ritchie46/miniconda3/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/home/ritchie46/Downloads/deleteme/polars-tpch/.venv/bin/python', '-m', 'fireducks.imhook', 'queries/fireducks/q10.py']' died with <Signals.SIGABRT: 6>.
make: \*\* [Makefile:52: run-fireducks] Error 1
(.venv) [fireducks] ritchie46 /home/ritchie46/Downloads/deleteme/polars-tpch[2] $
If it's good, then why not just fork it when (if) the license changes? It is 3-clause BSD.
In fact, what's stopping the pandas library from incorporating fireducks code into the mainline branch? pandas itself is BSD.
Thanks for the warning.
I nearly made the mistake of merging Akka into a codebase recently; fortunately I double-checked the license and noticed it was the bullshit BUSL and it would have potentially cost my employer tens of thousands of dollars a year [1]. I ended up switching everything to Vert.x, but I really hate how normalized these ostensibly open source projects are sneaking scary expensive licenses into things now.
[1] Yes I'm aware of Pekko now, and my stuff probably would have worked with Pekko, but I didn't really want to deal with something that by design is 3 years out of date.