logoalt Hacker News

A “frozen” dictionary for Python

184 pointsby jwilkyesterday at 9:51 AM145 commentsview on HN

Comments

pansa2yesterday at 12:45 PM

I wonder whether Raymond Hettinger has an opinion on this PEP. A long time ago, he wrote: "freezing dicts is a can of worms and not especially useful".

https://mail.python.org/pipermail/python-dev/2006-February/0...

show 7 replies
perrygeoyesterday at 3:02 PM

Python discovers immutable data, and gets it wrong. Frozendict is a blunt instrument - instead of a mutable free-for-all, it just locks the data for the entire lifecycle. Brace for the wave of code littered with deep copies, proudly proclaiming how functional it all is.

If you want real immutable data structures, not a cheap imitation, check out pyrsistent.

show 2 replies
polyrandyesterday at 11:33 AM

A frozen dictionary would be very welcome. You can already do something similar using MappingProxyType [0]

  from types import MappingProxyType
  
  d = {}
  
  d["a"] = 1
  d["b"] = 2
  
  print(d)
  
  frozen = MappingProxyType(d)
  
  print(frozen["a"])
  
  # Error:
  frozen["b"] = "new"

[0]: https://docs.python.org/3/library/types.html#types.MappingPr...
show 1 reply
code_biologistyesterday at 11:20 AM

Shoutout to pyrsistent, a personal favorite library for frozen/immutable/hashable data structures whenever I need to use lists, sets, or dicts as dict keys, or make sets of such items: https://github.com/tobgu/pyrsistent

sevensoryesterday at 1:37 PM

Concurrency is a good motivation, but this is super useful even in straight line code. There’s a huge difference between functions that might mutate a dictionary you pass in to them and functions that definitely won’t. Using Mapping is great, but it’s a shallow guarantee because you can violate it at run time.

sundarurfriendyesterday at 12:13 PM

Can someone ELI5 the core difference between this and named tuples, for someone who is not deep into Python? ChatGPT's answer boiled down to: unordered (this) vs ordered (NTs), "arbitrary keys, decided at runtime" vs "fixed set of fields decided at definition time" (can't an NT's keys also be interpolated from runtime values?), and a different API (`.keys()`, `.items()`), etc (I'm just giving this as context btw, no idea if there's inaccuracies in these).

So could this also have been approached from the other side, as in making unordered NamedTuples with support for the Mapping API? The line between dictionaries and named tuples and structs (across various languages) has always seemed a bit blurry to me, so I'm trying to get a better picture of it all through this.

show 5 replies
rurbanyesterday at 4:27 PM

> so having a safe way to share dictionaries between threads will be a boon

Since only the keys are const, the values not, frozendict is not thread-safe per se. There needs to be a small lock around the value getter and setter.

show 1 reply
bilsbieyesterday at 1:05 PM

Wow weird Mandela effect for me. I really remember this being a built and actually using it.

show 4 replies
Strilancyesterday at 2:34 PM

This is a type that I would use a lot.

For example, I often write classes that do cacheable analysis that results in a dict (e.g. the class stores a list of tiles defined by points and users want a point-to-tiles mapping for convenience). It's worth caching those transformations, e.g. using @functools.cached_property, but this introduces a risk where any caller could ruin the returned cached value by editing it. Currently, I take the safety hit (cache a dict) instead of the performance hit (make a new copy for each caller). Caching a frozendict would be a better tradeoff.

show 1 reply
codethiefyesterday at 11:17 AM

Next step: Auto-inferring the correct (most narrow) TypedDict type from a frozendict. (Think `const foo = { … } as const` in TypeScript.) I miss this TS feature in Python on the daily.

show 5 replies
mwshermanyesterday at 2:17 PM

I’ve found C#’s frozen dictionary to be useful: https://learn.microsoft.com/en-us/dotnet/api/system.collecti...

It’s optimized for fast reads in exchange for expensive creation.

drhagenyesterday at 11:42 AM

If this gets wide enough use, they could add an optimization for code like this:

    n = 1000
    a = {}
    for i in range(n):
        a[i] = str(i)
    a = frozendict(a)  # O(n) operation can be turned to O(1)
It is relatively easy for the JIT to detect the `frozendict` constructor, the `dict` input, and the single reference immediately overwritten. Not sure if this would ever be worth it.
show 1 reply
mont_tagtoday at 1:34 AM

Note that Python already offers MappingProxy that wraps a dictionary to hide mutating methods.

People hardly ever use this tool. That suggests that there isn't much of a need for a frozendict.

vlovich123yesterday at 9:06 PM

I’m curious why the dict->frozendict operation is not an O(1) operation when there’s a refcnt of 1 on the dict - that resolves the spooky action at a distance problem raised and solves the performance for the most common usage pattern (build up the dict progressively and convert to a frozendict for concurrent reads).

steadyelkyesterday at 3:31 PM

Agree it's about time to make this built in. Other functional packages like JAX [0] are already using the concept but they build it into their library from scratch.

[0] https://flax.readthedocs.io/en/latest/api_reference/flax.cor...

the__alchemistyesterday at 3:18 PM

I wish Python could move away from types-as-mutability-determiners. Or paper over them; all could have mutable and non-mutable variants, with "to_mut()" and "to_immut()" methods. And in the same vein, explicit control over whether functions mutate a variable in place, or make a local copy. Python is my first lang, and I still am unable to consistently reason about these.

show 1 reply
bvrmnyesterday at 7:26 PM

It would be great to have **kwargs as frozendict by default. It could help with caching decorators to speedup key construction. For example natural key is simply `(args, kwargs)`.

ok123456yesterday at 2:36 PM

Ruby has had frozen dictionaries since version 1.0, and they are generally considered a good thing to use where possible.

zzzeekyesterday at 12:48 PM

SQLAlchemy has its own frozendict which we've had in use for many years, we have it as a pretty well performing cython implementation these days, and I use it ubiquitously. Would be a very welcome addition to the stdlib.

This proposal is important enough that I chimed in on the thread with a detailed example of how SQLAlchemy uses this pattern:

https://discuss.python.org/t/pep-814-add-frozendict-built-in...

xamuelyesterday at 1:33 PM

This subject always seems to get bogged down in discussions about ordered vs. unordered keys, which to me seems totally irrelevant. No-one seems to mention the glaring shortcoming which is that, since dictionary keys are required to be hashable, Python has the bizarre situation where dicts cannot be dict keys, as in...

{{'foo': 'bar'}: 1, {3:4, 5:6}: 7}

...and there is no reasonable builtin way to get around this!

You may ask: "Why on earth would you ever want a dictionary with dictionaries for its keys?"

More generally, sometimes you have an array, and for whatever reason, it is convenient to use its members as keys. Sometimes, the array in question happens to be an array of dicts. Bang, suddenly it's impossible to use said array's elements as keys! I'm not sure what infuriates me more: said impossibility, or the python community's collective attitude that "that never happens or is needed, therefore no frozendict for you"

show 1 reply
shadowgovtyesterday at 3:11 PM

Regarding the spooky-action-at-a-distance concerns of a `.freeze()` method on dict:

`.freeze()` should probably just return a frozendict instead of in-place mutating the dict, and they should be separate types. Under the hood, you'll have to build the hashtable anyway to make the frozendict; as long as you're doing that work, you may as well build an object to contain the hashtable and just have that object be separate from the dict that birthed it.

The values referenced by both the original dict and the frozendict can be the same values; no need to clone them.

whimsicalismyesterday at 4:56 PM

yes! no more `MappingProxyType` hacks

westurneryesterday at 7:09 PM

PMap and PVector, functional Python libraries,

"PEP 351 – The freeze protocol" (2005, rejected) https://peps.python.org/pep-0351/ ; IIUC the freeze protocol proposed basically:

  def freeze(obj)
    return cache.setsefault(hash(obj),
      obj.__freeze__())

/? "Existing threads re: consts and applications thereof"

I wasn't able to find a URL to this post (2021) from the python-ideas mailing list archives using a double quoted search term today; I had to use the python mailing list search engine? Did something break crawling of mailing lists? Old mailman HTML archives were very simple to crawl.. ENH: pypa: add a sitemap.xml for/of mailing list archives, forums; @pypa: ask for search engine indexing advice: "How do we make sure that the python mailing list archives will be search indexed?" (as they traditionally were)

How to find the .txt of mailing list archives posts these days?

From "[Python-ideas] Re: Introduce constant variables in Python" (2021) https://mail.python.org/archives/list/[email protected]... :

- pyrsistent: PMap, PVector

I'm out of time for this; (reformatting this for HN so that URLs will be auto-linkified but newlines won't be eliminated) here's the full email as .txt, the mailing list archive has a hyperlinkified version with newlines preserved. GH Markdown and CommonMark Markdown also preserve newlines and auto-linkify:

  From: [@westurner]
  Date: Thu, Jun 17, 2021, 10:43 AM
  Subject: Re: [Python-ideas] Re: Introduce constant variables in Python
  Cc: python-ideas <[email protected]>
  
  
  On Mon, May 24, 2021 at 5:43 PM Chris Angelico <[email protected]> wrote:
  
  Requiring that a name not be rebound is well-defined and testable.
  Requiring that an object not change is either trivial (in the case of,
  say, an integer) or virtually impossible (in the case of most
  objects).
  
  What would be the advantage of such a declaration?
  
  ChrisA
  
  ## Existing threads re: consts and applications thereof
  
   So, `/? from:me pyrsistent` I found a few results:
  - "[Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)" 2016-04
    https://mail.python.org/pipermail/python-dev/2016-April/143958.html
    - ~Sandboxing python within python is nontrivial to impossible; consts might help a bit
      - https://mail.python.org/pipermail/python-dev/2016-April/143958.html
  
  - "Proposal to add const-ness type hints to PEP-484"
    https://mail.python.org/archives/list/[email protected]/thread/OVPF5I6IOVF6GOJQRH5UGCCU3R7PQHUF/
    - https://github.com/python/typing/issues/242
      - "Final names and attributes" https://github.com/python/mypy/pull/5522
         This is where `typing.Final` comes from.
  
  - "[Python-ideas] "Immutable Builder" Pattern and Operator"
    https://mail.python.org/pipermail/python-ideas/2017-January/044374.html
    - [pyrsistent] and "fn.py [do] immutables:
      https://github.com/kachayev/fn.py/blob/master/README.rst#persistent-data-structures "
  
  - "[Python-ideas] Add recordlcass to collections module"
    https://groups.google.com/g/python-ideas/c/9crHfcCBgYs/m/6_EEaWJAAgAJ
    - ORMs (e.g. Django, SQLAlchemy) require "dirty state" checking to know which object attributes have changed and need an SQL statement to be executed to synchronize the state; this is relevant because when we're asking for mutable namedtuple we're often trying to do exactly this pattern.
  
  - "[Python-ideas] Suggestions: dict.flow_update and dict.__add__"
    https://www.google.com/search?q=%22%5BPython-ideas%5D+Suggestions%3A+dict.flow_update+and+dict.__add__%22
  
    > dicttoolz has functions for working with these objects; including dicttoolz.merge (which returns a reference to the merged dicts but does not mutate the arguments passed).
    > 
    > https://toolz.readthedocs.io/en/latest/api.html#dicttoolz
    > https://toolz.readthedocs.io/en/latest/api.html#toolz.dicttoolz.merge
    > 
    > pyrsistent has a PRecord class with invariants and type checking that precedes dataclasses. pyrsistent also has 'freeze' and 'thaw' functions for immutability. PRecord extends PMap, which implements __add__ as self.update(arg) (which does not mutate self)
  https://github.com/tobgu/pyrsistent/blob/master/README.rst#precord
    > 
    > https://github.com/tobgu/pyrsistent/blob/master/pyrsistent/_pmap.py
  
  - "[Python-ideas] How to prevent shared memory from being corrupted ?"
    https://www.google.com/search?q=%22How+to+prevent+shared+memory+from+being+corrupted+%3F%22
    > PyArrow Plasma object ids, "sealing" makes an object immutable, pyristent
    > 
    > https://arrow.apache.org/docs/python/plasma.html#object-ids
    > https://arrow.apache.org/docs/python/plasma.html#creating-an-object-buffer
  
    > > Objects are created in Plasma in two stages. First, they are created, which allocates a buffer for the object. At this point, the client can write to the buffer and construct the object within the allocated buffer. [...]
  
  - [Python-ideas] Experimenting with dict performance, and an immutable dict
    https://mail.python.org/archives/list/[email protected]/message/DNBGUJHDH4UTPSETMFFWMJHNXQXIWX4I/
  
    > https://pyrsistent.readthedocs.io/en/latest/intro.html#pyrsistent :
    > 
    >> Pyrsistent is a number of persistent collections (by some referred to as functional data structures). Persistent in the sense that they are immutable.
    >>
    >> All methods on a data structure that would normally mutate it instead return a new copy of the structure containing the requested updates. The original structure is left untouched.
    >>
    >> This will simplify the reasoning about what a program does since no hidden side effects ever can take place to these data structures. You can rest assured that the object you hold a reference to will remain the same throughout its lifetime and need not worry that somewhere five stack levels below you in the darkest corner of your application someone has decided to remove that element that you expected to be there.
    >>
    >> Pyrsistent is influenced by persistent data structures such as those found in the standard library of Clojure. The data structures are designed to share common elements through path copying. It aims at taking these concepts and make them as pythonic as possible so that they can be easily integrated into any python program without hassle.
  
    
  
  > What would be the advantage of such a declaration?
  
  Constants don't need to be locked or unlocked; which is advantageous for parallelism and reasoning about program correctness.
  True consts (wherein everything referred to in that object is 'frozen' and immutable or at least only modifiable with e.g. copy-on-write)
  wouldn't require locks,
  which would be post-GIL advantageous.
  
  You could do consts by never releasing a threading.Lock (or similar):
  - https://docs.python.org/3/library/asyncio-sync.html#locks
  - https://docs.python.org/3/library/threading.html#lock-objects
  - This from 
  https://docs.python.org/2/library/sets.html?highlight=immutable#immutable-transforms re ImmutableSet/FrozenSet
  is not present in the python 3 docs: 
  https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset 
  
  Though - even if Python enforced normal consts in the language - all of the other code objects would still be mutable, so you still have the impossibility of sandboxing python.
  
  Functional and contracts coding styles rely upon invariance; 
  which can be accomplished with various third-party packages that enforce const-ness throughout what may be an object tree behind that reference that would otherwise need to be copy.deepcopy()'d.
  
  ## pyrsistent
  Src: https://github.com/tobgu/pyrsistent
  
  > - PVector, similar to a python list
  > - PMap, similar to dict
  > - PSet, similar to set
  > - PRecord, a PMap on steroids with fixed fields, optional type and invariant checking and much more
  > - PClass, a Python class fixed fields, optional type and invariant checking and much more
  > - Checked collections, PVector, PMap and PSet with optional type and invariance checks and more
  > - PBag, similar to collections.Counter
  > - PList, a classic singly linked list
  > - PDeque, similar to collections.deque
  > - Immutable object type (immutable) built on the named tuple
  > -   freeze and thaw functions to convert between python standard collections and pyrsistent collections.
  > - Flexible transformations of arbitrarily complex structures built from PMaps and PVectors.
  
  
  ## icontract
  Src: https://github.com/Parquery/icontract
  
  > icontract provides design-by-contract to Python3 with informative violation messages and inheritance.
  >
  > It also gives a base for a flourishing of a wider ecosystem:
  > 
  > - A linter pyicontract-lint,
  > - A sphinx plug-in sphinx-icontract,
  > - A tool icontract-hypothesis for automated testing and ghostwriting test files which infers Hypothesis strategies based on the contracts,
  together with IDE integrations such as icontract-hypothesis-vim, icontract-hypothesis-pycharm, and icontract-hypothesis-vscode,
  > - Directly integrated into CrossHair, a tool for automatic verification of Python programs,
  together with IDE integrations such as crosshair-pycharm and crosshair-vscode, and
  > - An integration with FastAPI through fastapi-icontract to enforce contracts on your HTTP API and display them in OpenAPI 3 schema and Swagger UI.
  
  
https://en.wikipedia.org/wiki/Design_by_contract

https://en.wikipedia.org/wiki/Invariant_(mathematics)#Invari... [ https://en.wikipedia.org/wiki/Class_invariant ]

> What is the difference between "invariant" and "constant" and "final"?

DeathArrowyesterday at 6:45 PM

>But dictionaries are mutable, which makes them problematic for sharing data in concurrent code.

Not really, C# has ConcurrentDictionary.

semiinfinitelyyesterday at 6:27 PM

its called dataclass

lou1306yesterday at 1:30 PM

Can someone give a strong rationale for a separate built-in class? Because "it prevents any unintended modifications" is a bit weak.

If you have fixed keys, a frozen dataclass will do. If you don't, you can always start with a normal dict d, then store tuple(sorted(d.items())) to have immutability and efficient lookups (binary search), then throw away d.

drhagenyesterday at 11:45 AM

Great! Now make `set` have a stable order and we're done here.

show 1 reply
malkiayesterday at 3:57 PM

Welcome to starlark :)

neonsunsetyesterday at 3:09 PM

[dead]