logoalt Hacker News

westurnertoday at 5:20 PM2 repliesview on HN

cloudpickle serializes code without signatures; which is an RCE vuln.

It is much safer to distribute signed code in signed packages out of band and send only non-executable data in messages.

It is more safe to store distributed messages in a page with the NX bit flipped.

A compromise of any client in this system results in DOS and arbitrary RCE; but that's an issue with most distributed task worker systems.

To take a zero trust approach, you can't rely on the shared TLS key or the workers never being compromised.

mDNS doesn't scale beyond a broadcast segment without mDNS repeaters (which don't scale) which are on multiple network segments.

Something centralized like Dask, for example, can log and track state centrally to handle task retries on network, task, and worker failure.

But Dask doesn't satisfy zero trust design guidelines either.

How are these systems distinct from botnets with a shared TLS key and no certificate revocation?


Replies

bzuraktoday at 8:04 PM

I'm running this in a trusted environment, I'm not so ambitious as to try to make this some sort of whacky trustless distributed Python runtime. Just a fun project that's been marinating for a while, and now I have an army of clankers to do the dirty work of documenting and testing it.

westurnertoday at 7:18 PM

Basically, there's no good way to sidestep the authentication and authorization and resource quota controls of a resource grid scheduler.

With redundancy and idempotency, distributed computation can work.

In order to run computation distributedly with required keys, sharded distributed ledger protocol nodes cost the smart contracts that they execute in a virtual machine with network access limited to in-protocol messages. Each "smart contract" costs money to run redundantly on multiple nodes, and so it should or must have an account with a verifiably-sufficient balance in order to run.

Smart contracts must be uploaded using a private key with a corresponding public key.

Smart contracts are identified by and so are addressed by a hash of their bytecode.

Wool, Dask, and Celery @task don't solve for smart contract output storage costs with redundancy. They don't set up database(s) with replication large enough for each computation step for you. Dask and Celery model and track the state of each executed DAG centrally with logs as trustable as the centralized nodes which are a single point of failure.

Why isn't Docker Swarm - which L2 bridges amongst all nodes without restriction - appropriate for a given application, given the Access Control Lists and cloud (pod,) configuration necessary for e.g. AWS or GCP to prevent budget overflow? Why quota grid users somehow?

Serverless functions must be uploaded/deployed before being run, too. To orchestrate a bunch of web services is to execute the DAG and handle errors due to network, node, and service failures and latency

But then what protocol do the (serverless function) services all implement, so that we don't have to have a hodgepodge of API clients to use all the services in the grid?

With (serverless) functions bound to /URL routes, cost each function and estimate resource requirements to continue to run a function of that cost. To handle just benign resource exhaustion, say, Scale up the databases and/or redundant block storages, change other distributed computation grid parameters for that pod or those pods of resources which service a named, signed function like /api/v1/helloworld_v2?q=select%20* which is creating contention for the costed resources of the organization

On what signals do you scale up or scale down - within a resource budget - to afford fanning out over multiple actually parallel nodes to compute and sign and store the data?