My impression was that the state of the are was still to generate high-level data from your inputs, then react with a mixture of ML and algorithmic rules to those inputs. For example you'd use a mix of LIDAR and vision to detect that there's a pedestrian, use past frames and ML to predict the pedestrian's next position, then algorithmically check whether your vehicle's path is likely to intersect with the pedestrian's path and take appropriate action if that's the case
Under that model, LIDAR training data is easy to generate. Create situations in a lab or take recordings from real drives, label them with the high-level information contained in them and train your models to extract it. Making use of that information is the next step but doesn't fundamentally change with your sensor choice, apart from the amount of information available at different speeds, distances and driving conditions