I have a hot take. Modern computer graphics is very complicated, and it's best to build up fundamentals rather than diving off the deep end into Vulkan, which is really geared at engine professionals who want to shave every last microsecond off their frame-times. Vulkan and D3D12 are great, they provide very fine-grained host-device synchronisation mechanisms that can be used to their maximum by seasoned engine programmers. At the same time a newbie can easily get bogged down by the sheer verbosity, and don't even get me started on how annoying the initial setup boilerplate is, which can be extremely daunting for someone just starting out.
GPUs expose a completely different programming memory model, and the issue I would say is conflating computer graphics with GPU programming. The two are obviously related, don't get me wrong, but they can and do diverge quite significantly at times. This is more true recently with the push towards GPGPU, where GPUs now combine several different coprocessors beyond just the shader cores, and can be programmed with something like a dozen different APIs.
I would instead suggest:
1) Implement a CPU rasteriser, with just two stages: a primitive assembler, and a rasteriser.
2) Implement a CPU ray tracer.
Web links for tutorials respectively: https://haqr.eu/tinyrenderer/
https://raytracing.github.io/books/RayTracingInOneWeekend.html
These can be extended in many, many ways that will keep you sufficiently occupied trying to maximise performance and features. In fact to even achieve some basic correctness will require quite a degree of complexity: the primitive assembler will of course need frustum- and back-face culling (and these will mean re-triangulating some primitives). The rasteriser will need z-buffering. The ray-tracer will need lighting, shadow, and camera intersection algorithms for different primitives, accounting for floating-point divergence; spheres, planes, and triangles can all be individually optimised.Try adding various anti-aliasing algorithms to the rasteriser. Add shading; begin with flat, then extend to per-vertex to per-fragment. Try adding a tessellator where the level of detail is controlled by camera distance. Add in early discard instead of the usual z-buffering.
To the basic Whitted CPU ray tracer, add BRDFs; add microfacet theory, add subsurface scattering, caustics, photon mapping/light transport, and work towards a general global illumination implementation. Add denoising algorithms. And of course, implement and use acceleration data structures for faster intersection lookups; there are many.
Working on all of these will frankly give you a more detailed and intimate understanding of how GPUs work and why they have been developed a certain way, rather than programming with something like Vulkan, spending time filling in struct after struct.
After this, feel free to explore any one of the two more 'basic' graphics APIs: OpenGL 4.6, or D3D11. shadertoy.com and shaderacademy.com are great resources to understand fragment shaders. There are again several widespread shader languages, though most of industry uses HLSL. GLSL can be simpler, but HLSL is definitely more flexible.
At this point, explore more complicated scenarios: deferred rendering, pre- and post-processing for things like ambient occlusion, mirrors, temporal anti-aliasing, render-to-texture for lighting and shadows, etc. This is video-game focused; you could go another direction by exploring 2D UIs, text rendering, compositing, and more.
As for why I recommend starting with CPUs, only to end up back with GPUs again, and one may ask: 'hey, who uses CPUs any more for graphics?' Let me answer: WARP[1] and LLVMpipe[2] are both production-quality software rasterisers; frequently loaded during remote desktop sessions. In fact 'rasteriser' is an understatement: they expose full-fledged software implementations of D3D10/11 and OpenGL/Vulkan devices respectively. And naturally, most film renderers still run on the CPU, due to their improved floating-point precision; films can't really get away with the ephemeral smudging of video games. Also, CPU cores are quite cheap nowadays, so it's not unusual to see a render farm of a million+ cores chewing away at a complex Pixar or Dreamworks frame.
[1]: https://learn.microsoft.com/en-gb/windows/win32/direct3darti...
I would simplify further:
1) Implement 2D shapes and sprites with blits
With modern compute shaders, this has 95% of "How to use a GPU" while omitting 99% of the "Complicated 3D Graphics" that confuses everybody.