Project Valhalla, Explained: How a Decade of Work Arrives in JDK 28

349 points • by philonoist • today at 6:35 AM • 192 comments • view on HN

Comments

> But the difference in memory is fundamental. The JVM can now store the values themselves in the array, laid out densely one after another: 8 bytes per point (plus a possible null flag), in a contiguous block. No headers per element. No pointers. No jumping around the heap.

How much was this article proof-read? Didn't they just get finished talking about how heap flattening won't work for objects with > 64-bit representations? Their `Point` is at least 65 bits (two 32-bit ints plus the null flag). The "plus a possible null flag" and oddly short following statements seem to suggest this was some AI that got sidetracked by trying to make emphatic statements... oh and also the "[IMAGE: the same Point[] array in two variants..." block halfway down the page is unfortunate.

➕ show 3 replies

rf15 • today at 7:49 AM

I appreciate the hard work that went into the things that did make it into Valhalla eventually, but:

> The model was powerful, but also mentally heavy

No it isn't! it is this interpretation that kills off the null-safety debate entirely. Saying you have a variable that cannot be null is not a mentally taxing distinction, especially since everything is labelled thoroughly.

> The team, faithful to the lesson “simplify the model for the user, even at the cost of the performance ceiling,” ultimately dismantled this dualism.

but it would have simplified it for the user.

The whole attitude and process around this and the other topics gives me very little faith that Java can be steered in a sensible direction here. The type system of a programming language is supposed to give convenient guarantees to the developer on a CPU that can only do numbers. There is no reason to reduce the optional(!) safety guarantees you can offer with the excuse of "too mentally taxing".

Hell, they even get there half way by recognising:

> the language model and the JVM model don’t have to overlap one hundred percent

➕ show 2 replies

cogman10 • today at 2:52 PM

> There’s a catch worth knowing about here, though: flattened data has to be readable and writable atomically (otherwise it risks “tearing” under concurrent access).

I really hope they give an escape hatch for this. It will make it really hard to extract a lot of the benefit of valhala if you can't make a thread unsafe value class. It's also one of those problems that will be quite hard to run into. You basically need something like this

    class Bar {
      static Foo value[] = new Foo[10];
      static void setFooFromManyThreads(Foo foo) {
        value[0] = foo;
      }
      
      value record Foo(int x, int y, int z) {};
    }

Not something you typically run into and generally already a thread safety problem.

The solution is also simple, a `synchronized{}` block will fix it if you need to have a tearable class that's written from multiple threads.

But the other thing is that for SIMD operations, you really need flattening, and that really does typically mean having something like `Foo(double x, double y, double z)` in play. It'd be a shame if the way we have to do this is a struct of arrays.

tomaytotomato • today at 8:45 AM

A lot of the comments on here are a bit unfair on what is great work being done and even more awesome work (JEPs) in the pipeline for the future.

If Java was a child, imagine it being brought up by loving parents for the first few years (Sun) then it was thrown in a garage with some other children and neglected by its evil guardian (Oracle)

Neglected and unloved till JDK 8, its basically been playing catch up.

So when people say "oh so its now got structs or value types of X", yes it has but that's because it has been stunted in its development due to big bureaucratic and hostile corporate processes, but its free now and is getting love through the OpenJDK family.

I will continue to enjoy writing once and deploying anywhere!

➕ show 4 replies

narag • today at 2:59 PM

I found a solution for what seems to be the same problem, in a different language: a particular type of lists, where the class metadata is stored once and the data for each instance is contiguously stored in a flat array.

Not sure if it covers exactly the same terrain, but perusing the article, it seems to be the case, with a single instance being the degenerate case.

DarkNova6 • today at 7:36 AM

You could probably a whole tech thriller on the evolution on Value Types in Java.

I’ve been reading the mailing lists and watched all videos on the topic and it is truly inspiring how much they managed to consolidate the design to something that always looked like java.

But while also going far deeper in granularity and understanding what it even means to be a value type and what optimizations can be done where

dllrr • today at 2:59 PM

And here I thought engineers were mostly logical and objective. This thread is very entertaining.

layer8 • today at 8:18 AM

> But careful: == looks at internal state, which isn’t always what the object represents, so for “is this the same data” comparisons keep using equals.

So == for value classes will basically be like memcmp(). That is a bit unfortunate, as it breaks encapsulation, exposing implementation details. Client code can use this to do case distinctions based on how a given value is internally represented. In a way, it’s worse than identity comparison, because identity comparison at least doesn’t expose internal state.

➕ show 4 replies

newsoftheday • today at 2:44 PM

I just got my projects up to JDK 21 a few months ago. Working on trying to get one upgraded to JDK 25 now and now they're talking about delivering JDK 28 in less than a year from now. How are you supposed to keep up with these rapid updates?

➕ show 1 reply

torginus • today at 7:55 AM

I know its a faux pas in the Java world to acknowledge the existence of .NET, but how does this differ from .NET structs?

Value types, generic specialization, boxing - a quick skim makes it looks like they picked the same choices.

➕ show 2 replies

vlovich123 • today at 1:28 PM

> Before we pop the champagne, though: this is preview, disabled by default, and, as Brian Goetz was quick to cool everyone down, “only the first part of Valhalla.” Goetz added a great observation that the “they’ll never ship it” crowd will now smoothly switch over to “but they didn’t ship the most important part” (and a joke has been going around the community for years that we’ll sooner end up in Valhalla ourselves, the Norse-afterlife one, than the project ships).

I don’t know if this is fair way to try to disarm your critics. The only thing that’s remained after this decade is the slogan so it’s a real ship of Theseus question if Valhalla has shipped since what’s delivered doesn’t achieve it. Congrats on the accomplishment, but from looking at what ended up, I’m not sure it’s a huge improvement.

> The trouble is that this optimization is unpredictable and fragile.

Is this describing escape analysis or value classes? Because the list of exclusions where this does anything is so large and the conversion to a heap type under the hood is so transparent and opaque, I think it can describe this technique as well.

Also, the whole “works like an int” motto is violated - int is never null, int-> integer boxing is explicit and well understood.

> In the new model, the wrapper classes themselves become value classes (when preview is on, Integer, Long, Double, and company lose their identity

Oh neat, they sidestep that by changing the definition of an int. I’m sure it’ll be trivial to turn this on in the wild on code that may be relying on identity for boxed numerics. I think this alone shows this project can’t ever be turned on by default and now we’ll have a decade of two Java languages (one with value types and one without) as they try to convince everyone to migrate and then just turn it on (ie python3).

So much opportunity squandered and dismissing critics as always having something to complain about is a neat way to sidestep legitimate criticism that this approach is not going to work out for Java.

orthoxerox • today at 8:24 AM

> Will I get a fast, flat `ArrayList<Point>`? Not yet.

Sad. Hope they can do this by the next LTS JDK.

➕ show 2 replies

minitech • today at 1:40 PM

> How is this different from struct in C#? A struct in C# has identity

Since when? I’m pretty sure structs didn’t have identity last time I used C#, and that would be a very surprising thing to add.

spbaar • today at 11:29 AM

I have such an urge to comment "lgtm" on the 197k line change PR

➕ show 1 reply

nu11ptr • today at 2:23 PM

I'm a little unclear as to when and under what conditions this results in non-heap objects, now (<= 64-bits?) and in the future (???). I thought that was the _ENTIRE_ point of this project, so I was surprised to see they can be null (did that change from before?). If it is always and forever limited to 64-bits, I fail to see the point of this entire project, as it would have been far simpler to add syntactic sugar (simply pass primitives underneath the covers) as Scala did to create value types vs. JVM changes.

leiroigh • today at 10:17 AM

I'll be interested in seeing the fallout of the (unavoidable) compat issue:

If I have a function that has a value `x` that erases to `java.lang.Object` (e.g. a parametric function with no lower bound); then it used to be safe to check for nullity and then synchronize on the object.

This is no longer safe: This can now throw `IdentityException` into your face. (it was _never_ a good idea)

In other words, a lot of old code must be reviewed.

I suspect that `-XX:DiagnoseSyncOnValueBasedClasses=2` will need to stay (with the semantics: if user tries to synchronize on identity-less object, then log a JFR event and make it a NOP, don't throw an exception)!

The current JEP text is a little too ambiguous to figure out whether that is the plan, anyways.

➕ show 1 reply

exabrial • today at 2:12 PM

Tons of armchair critics but dang this is freaking cool!!! Thanks everyone for working on this an THANK YOU for moving slow and getting the design right!

DarkmSparks • today at 2:14 PM

Why remove identity from Double and Integer? This is going to break so much stuff for no reason when double and int were already a thing.

Alexander-Barth • today at 9:18 AM

I think this is quite similar to julia's handling of a struct. An array of mutable structs is just an array of pointers, where every pointer directs to the underlying structure. However with an array of structs (immutable is the default), there is no such indirection. The value of all fields are stored as array element (unless you have an array of heterogeneous elements).

If you want to change an element of such an array you need to create a new immutable struct which in practice it is quite fast, but a bit verbose to write.

drzaiusx11 • today at 12:57 PM

Am I understanding this correctly: a value type really only works when it fits on a 64 bit "cache line", and when larger, it falls back to normal heap allocated objects as before? Seems extremely limiting, no? Great for a boxing optimization, but not much else unless you're deal with very small data types regularly...

➕ show 1 reply

maelito • today at 12:20 PM

What I have in mind when I read Valhalla : https://valhalla.openstreetmap.de/

GYLQ • today at 2:19 PM

The gap between demo and production is always bigger than it looks. Things that work great on the examples in the README tend to fall apart on edge cases that aren't covered. Worth running it against your actual data before committing to it.

jessinra98 • today at 12:28 PM

The article has a section about that. For me, a struct in C/C# can be modified and is passed by copy while a value class can not be modified and is passed by value.

I do not think you can do stack allocation in Java.

Hendrikto • today at 11:48 AM

> The pull request alone adds over 197 thousand lines of code across 1,816 files.

And that across 2819 commits.

Wow, that’s insane.

smallnix • today at 12:00 PM

> [IMAGE: the same Point[] array in two variants: “before” (an array of arrows → scattered boxes with headers) and “after” (a uniform strip of number pairs)]

The `Point[]` in the image tag of your LLM output crashed your image generation post processing.

pregnenolone • today at 10:22 AM

Looking into the negative comments is quite amusing. Not only do most of them contain technical inaccuracies, but of course, they also need to mention how great .NET supposedly has been from the beginning and how Java supposedly copied everything.

Let's take a stroll down memory lane. First of all, .NET literally started as a Java copy. On top of it, a non-cross-platform one for almost two decades! After having shamed Linux for so long Microsoft finally started porting .NET to other platforms in a non-backward compatible way. A lot of .NET proponents will tell you porting from legacy .NET to .NET Core (which was renamed once again to .NET) would be a quick fix, but it isn't. For example, the shop I used to work in had some important cryptographic libraries which were very painful to port. And then, there's .NET's simplistic garbage collector, which can be quite annoying because it tries to be a one-fit-all solution that basically cannot be tweaked at all, often resulting in unresolvable latency problems. There’s a lot of other stuff, like its ghetto-like ecosystem and the insane fragmentation of GUI libraries.

I also don't get the C# praise. Over the years, it has become quite the bloated language. It feels like Microsoft tries to implement every feature possible without realizing that an enterprise language is supposed to be streamlined. Async/await? Very ugly, very annoying. Java has solved this a lot better with virtual threads and structured concurrency.

I could go on, but these "language wars" are silly and pointless. Both platforms have their pros and cons. Besides, I have a lot of bad things to say about the JVM as well, but it's nice to see Valhalla finally beocming reality. Too late for me personally though.

➕ show 1 reply

aykutseker • today at 9:25 AM

I think a lot of people will file this under Java got structs.

That seems off. They're still objects, the new thing is that they can give up identity.

rom1v • today at 9:51 AM

> The difference in the code is exactly one word: value.

What is unclear to me is why the decision to use a Point instance as a value or as a reference is made in the class definition rather than by the caller.

> Point[] point = new Point[10];

For the same class, I might need an array of values in one place and an array of references elsewhere within the same codebase.

➕ show 1 reply

ahartmetz • today at 8:26 AM

From the article:

> In 1995, a memory access cost roughly the same as a CPU operation

Uhm... no?!

Here's a CS paper from 1993(!) about prefetching from cache(!!) because the cache was slower than the ALU. https://www.eecs.umich.edu/techreports/cse/93/CSE-TR-152-93....

It would perhaps make Java look a little bad to say that, in 1995, the prevailing attitude in certain circles was "If it's too slow, just wait for faster hardware - Moore's Law forever baby!" (Of course, Sun was selling, at the time, relatively fast hardware - the slower the software, the faster the required hardware)

➕ show 2 replies

jeandrek • today at 11:43 AM

Anyone know why the article's 4th picture is about the Jobs obituary gaffe? (It's not just for me, right?)

➕ show 1 reply

LelouBil • today at 12:55 PM

I'm wondering what this means for Kotlin now

greekrich92 • today at 1:29 PM

Great write-up. Java is getting so good. The improvements over the last decade have been unbelievable. The negativity here is bizarre. Just a reflex I suppose.

➕ show 1 reply

nasso_dev • today at 10:24 AM

stopped reading when i saw the AI illustration. wholly unnecessary, and it feels insulting to be fed slop like this...

if you really want a fun drawing get a human artist to do it. it doesn't need to be complicated, for example https://www.code-cartoons.com/ is mostly just stick figures and does an excellent job

but you don't even need any of that, a mermaid diagram would have worked perfectly fine too. instead you chose to use a technology that is known to be harmful

➕ show 1 reply

geokon • today at 8:18 AM

a few questions for the pros

> "The defining trait: no identity"

I get that this makes objects behave like primitive types. Maybe thats reason enough. But is it necessary for the performance boost and de-fluffing the objects? Seems like an orthogonal objective

> There’s a catch worth knowing about here, though: flattened data has to be readable and writable atomically (otherwise it risks “tearing” under concurrent access).

Isn't this a race condition and "undefined bahvior"..? Having to limit yourself to atomic sizes seems like a huge limitation, to accomodate what is most likely buggy code. Is all the effort only gunna help lil toy ColorRGB examples?

> The points array is a million pointers. Each pointer leads to a separate Point object lying somewhere on the heap.

Does this happen in actuality? One would assume the allocator tries to put stuff sequentially on the heap? Its not a guarantee as with these Value Types, but I'd think you could get similar-ish perf with prefetching in cache. I dunno whats happening under the hood.. But when writing Clojure apps the JVM always reserves absurd amounts of heapspace on my machine (to my annoyance). Id assume it can find some place to do contiguous allocations..

Which i guess gets me to my last question... where are the benchmarks broski? It all sounds great, but does it actually yield the insane speedups promised?

Great article, well written. But a benchmark would have been a nice "punchline"

➕ show 3 replies

theanonymousone • today at 7:29 AM

Dupe? https://news.ycombinator.com/item?id=48590056

FrustratedMonky • today at 1:42 PM

So, is this going to allow an F# like language to run on the JVM?

vanyaland • today at 1:36 PM

[dead]

evdubs • today at 8:05 AM

[flagged]

fsuts • today at 12:31 PM

Java = Oracle = Ellisons way of doing business

Unless your company forces you to use Java for new projects, consider a change

➕ show 1 reply

alt Hacker News

Project Valhalla, Explained: How a Decade of Work Arrives in JDK 28

Comments