logoalt Hacker News

randomtoastyesterday at 11:43 AM11 repliesview on HN

One of my biggest points of criticism of Python is its slow cold start time. I especially notice this when I use it as a scripting language for CLIs. The startup time of a simple .py script can easily be in the 100 to 300 ms range, whereas a C, Rust, or Go program with the same functionality can start in under 10 ms. This becomes even more frustrating when piping several scripts together, because the accumulated startup latency adds up quickly.


Replies

az09mugentoday at 8:30 AM

I don't know why people care so much about a few hundreds of milliseconds for python scripts versus compiled languages that take just ten times less.

Real question : what would you do more with the spared time ? You are that in a hurry in your life ?

smartmicyesterday at 1:58 PM

Yes, that is also my feeling. But comparing an interpreted language with a compiled one is not really fair.

Here is my quick benchmark. I refrain from using Python for most scripting/prototyping task but really like Janet [0] - here is a comparison for printing the current time in Unix epoch:

    $ hyperfine --shell=none --warmup 2  "python3 -c 'import time;print(time.time())'" "janet -e '(print (os/time))'"
  Benchmark 1: python3 -c 'import time;print(time.time())'
  Time (mean ± σ):      22.3 ms ±   0.9 ms    [User: 12.1 ms, System: 4.2 ms]
  Range (min … max):    20.8 ms …  25.6 ms    126 runs

  Benchmark 2: janet -e '(print (os/time))'
  Time (mean ± σ):       3.9 ms ±   0.2 ms    [User: 1.2 ms, System: 0.5 ms]
  Range (min … max):     3.6 ms …   5.1 ms    699 runs

  Summary
  'janet -e '(print (os/time))'' ran
    5.75 ± 0.39 times faster than 'python3 -c 'import time;print(time.time())''
[0]: https://janet-lang.org/
show 1 reply
nickjjyesterday at 12:17 PM

> The startup time of a simple .py script can easily be in the 100 to 300 ms range

I can't say I've ever experienced this. Are you sure it's not related to other things in the script?

I wrote a single file Python script, it's a few thousand lines long. It can process a 10,000 line CSV file and do a lot of calculations to the point where I wrote an entire CLI income / expense tracker with it[0].

The end to end time of the command takes 100ms to process those 10k lines, that's using `time` to measure it. That's on hardware from 2014 using Python 3.13 too. It takes ~550ms to fully process 100k lines as well. I spent zero time optimizing the script but did try to avoid common pitfalls (drastically nested loops, etc.).

[0]: https://github.com/nickjj/plutus

show 4 replies
ndryesterday at 4:03 PM

It might not be the fastest but I suspect something weird is happening with python resolution.

For instance `uv run` has its own fair share of overhead.

    $ hyperfine --warmup 10 -L py "uv run python,~/.local/bin/python3.14,/usr/local/bin/python3.12,~/.local/share/uv/python/pypy-3.11.13-macos-aarch64-none/bin/pypy3.11" "{py} -c 'exit(0)'"
    Benchmark 1: uv run python -c 'exit(0)'
      Time (mean ± σ):      58.4 ms ±  19.3 ms    [User: 26.4 ms, System: 21.7 ms]
      Range (min … max):    48.2 ms … 138.0 ms    50 runs
    
    Benchmark 2: ~/.local/bin/python3.14 -c 'exit(0)'
      Time (mean ± σ):      13.3 ms ±   6.9 ms    [User: 8.0 ms, System: 2.5 ms]
      Range (min … max):     9.9 ms …  53.7 ms    174 runs
    
    Benchmark 3: /usr/local/bin/python3.12 -c 'exit(0)'
      Time (mean ± σ):      16.4 ms ±   7.6 ms    [User: 8.9 ms, System: 3.7 ms]
      Range (min … max):    12.2 ms …  65.2 ms    152 runs
    
    Benchmark 4: ~/.local/share/uv/python/pypy-3.11.13-macos-aarch64-none/bin/pypy3.11 -c 'exit(0)'
      Time (mean ± σ):      18.6 ms ±   7.4 ms    [User: 10.0 ms, System: 5.0 ms]
      Range (min … max):    14.4 ms …  63.5 ms    138 runs
    
    Summary
      ~/.local/bin/python3.14 -c 'exit(0)' ran
        1.23 ± 0.86 times faster than /usr/local/bin/python3.12 -c 'exit(0)'
        1.40 ± 0.92 times faster than ~/.local/share/uv/python/pypy-3.11.13-macos-aarch64-none/bin/pypy3.11 -c 'exit(0)'
        4.40 ± 2.72 times faster than uv run python -c 'exit(0)'
dekhnyesterday at 6:02 PM

Run strace on Python starting up- you will see it statting hundreds if not thousands of files. That gets much worse the slower your filesystem is.

On my linux system where all the file attributes are cached, it takes about 12ms to completely start, run a pass statement, and exit.

syrusakbaryyesterday at 2:10 PM

Completely agree on this.

Regarding cold-starts, I strongly believe V8 snapshots are perhaps not the best way to achieve fast cold starts with Python (they may be if you are tied to using V8, though!), and will have wide side effects if you go out of the standards packages included on the Pyodide bundle.

To put some perspective: V8 snapshots are storing the whole state of an application (including it's compiled modules). This means that for a Python package that is using Python (one wasm module) + Pydantic-core (one wasm module) + FastAPI... all of those will be included in one snapshot (as well as the application state). This makes sense for browsers, where you want to be able to inspect/recover everything at once.

The issue about this design is that the compiled artifacts and the application state are bundled into one piece artifact (this is not great for AOT designed runtimes, but might be the optimal design for JITs though).

Ideally, you would separate each of the compiled modules from the state of the application. When you do this, you have some advantages: you can deserialize the compiled modules in parallel, and untie the "deserialization" from recovering the state of the application. This design doesn't adapt that well into the V8 architecture (and how it compiles stuff) when JavaScript is the main driver of the execution, however it's ideal when you just use WebAssembly.

This is what we have done at Wasmer, which allows for much faster cold starts than 1 second. Because we cache each of the compiled modules separately, and recover the state of the application later, we can achieve cold-starts that are a magnitude faster than Cloudflare's state of the art (when using pydantic, fastapi and httpx).

If anyone is curious, here is a blogpost where we presented fast-cold starts for the application state (note that the deserialization technique for Wasm modules is applied automatically in Wasmer, and we don't showcase it on the blogpost): https://wasmer.io/posts/announcing-instaboot-instant-cold-st...

Note aside: congrats to the Cloudflare team on their work on Python on Workers, it's inspiring to all providers on the space... keep it up and let's keep challenging the status quo!

mixmastamykyesterday at 4:42 PM

Big packages shouldn’t be imported until the cli has been parsed, and handed off to main. There’s been work to do this automatically, but it’s good hygiene to avoid it anyway.

A modern machine shouldn’t take this long, so likely something big is being imported unnecessarily at startup. If the big package itself is the issue, file it on their tracker.

TudorAndreiyesterday at 1:47 PM

Are you comparing the startup time of an interpreted language with the startup time of a compiled language? or you mean that `time python hello.py` > `( time gcc -O2 -o hello hello.c ) && ( time ./hello )` ?

show 2 replies
rcarmoyesterday at 6:45 PM

You can run .pyc stuff “directly” with some creativity, and there are some tools to pack “executables” that are just chunked blobs of bytecode.

baqyesterday at 12:20 PM

it depends somewhat on what you import, too. some people would sell their grandmothers to get below 1s when you start importing numpys and scikits.

show 1 reply
dilawaryesterday at 4:17 PM

Reminds me of mercurial cvs!!

show 1 reply