gtag('config', 'G-0PFHD683JR');
Price Prediction

How did an independent AI vehicle make 40 % faster without losing accuracy

Self -government vehicles need to see the world quickly and precisely. But real -time perception is difficult when your model is drowned by millions of Lidar points and high -resolution images. the solution? The sensor merge, the typical quantity, the tensorrt acceleration workshop.

Allow me to walk for you through how I improved a Multimedia perception model To operate 40 % faster while keeping hard enough to avoid pedestrians and parked cars.


Porting neck: a million points, one million problems

Self -government vehicles are adopted Lidar and camerasEach with strengths and weaknesses. Lidar excels in Appreciation of depthBut it spits Millions of points per second. Cameras provide Rich textures and colors But lack of depth. Fusion in one pipeline means balance Speed ​​and accuracy.

Problem 1: The delay between Lidar and camera data

By the time the Lidar ends from the survey, the camera has already acquired a new frame. This imbalance means that the positions of the object do not match.

It is suitable: I used to Ego compensation with Kalman filters To adjust Lidar dots dynamically. Here is an excerpt from how he worked in Bethon:

import numpy as np

class EgoMotionCompensation:
    def __init__(self, initial_pose, process_noise, measurement_noise):
        self.state = initial_pose  
        self.P = np.eye(5) * 0.01
        self.Q = np.diag(process_noise)
        self.R = np.diag(measurement_noise)

    def predict(self, dt):
        # Unpack state
        x, y, yaw, vx, vy = self.state
        x_pred = x + vx * dt
        y_pred = y + vy * dt
        yaw_pred = yaw 
        self.state = np.array([x_pred, y_pred, yaw_pred, vx, vy])
        F = np.eye(5)
        F[0, 3] = dt
        F[1, 4] = dt
        self.P = F @ self.P @ F.T + self.Q

    def update(self, measurement):
        #measurement = [x_meas, y_meas, yaw_meas]
        H = np.array([
            [1, 0, 0, 0, 0],
            [0, 1, 0, 0, 0],
            [0, 0, 1, 0, 0]
        ])
        z = np.array(measurement)
        y_err = z - H @ self.state
        S = H @ self.P @ H.T + self.R
        K = self.P @ H.T @ np.linalg.inv(S)
        self.state += K @ y_err
        I = np.eye(5)
        self.P = (I - K @ H) @ self.P

    def apply_correction(self, lidar_points, dt):
        #predict and update using IMU/odometry
        self.predict(dt)
        #Suppose we have a measurement from the IMU or odometry
        #assume partial measurement [x_meas, y_meas, yaw_meas]
        measurement = [self.state[0], self.state[1], self.state[2]]
        self.update(measurement)

        x, y, yaw, _, _ = self.state
        cos_yaw, sin_yaw = np.cos(yaw), np.sin(yaw)
        transformed = []
        for px, py, pz in lidar_points:
            #rotate and translate
            x_new = (px * cos_yaw - py * sin_yaw) + x
            y_new = (px * sin_yaw + py * cos_yaw) + y
            transformed.append((x_new, y_new, pz))
        return np.array(transformed)

This disk 85 % decreaseEnhance the accuracy of the object detection.


Problem 2: sensor merger: Make Lidar and Cameras play well

Now that we have fixed the time difference, we have another problem: Lidar gives CloudsWhile providing cameras RGB photos. We needed One representation For both of them.

Benina a Bird Display Model (bev) That projects Lidar points to the image level And align them with a pixel camera.

How did we do that:

  1. Caliber and calibration camera With essential and external matrices.
  2. Expected Lidar points In the image space.
  3. Features fuse Using an intersecting witness transformer.

Here’s how to drop the projection in Bethon:

import torch
import torch.nn.functional as F
import cv2
import numpy as np

def voxelize_points(lidar_points, voxel_size=(0.1, 0.1, 0.2)):
    #Transform LiDAR points into 3D voxel indices
    coords = np.floor(lidar_points / np.array(voxel_size)).astype(int)
    coords -= coords.min(0)
    voxels = np.zeros((coords[:,0].max()+1, coords[:,1].max()+1, coords[:,2].max()+1))
    for (x_idx, y_idx, z_idx) in coords:
        voxels[x_idx, y_idx, z_idx] += 1
    return voxels

def project_voxel_to_image(voxel_coords, camera_matrix, rvec, tvec, voxel_size):
    """
    Projects 3D voxel centers into image space using a known camera matrix (including distortion).
    """
    #compute center for each voxel
    voxel_centers = []
    for (x_idx, y_idx, z_idx) in voxel_coords:
        center_x = x_idx * voxel_size[0]
        center_y = y_idx * voxel_size[1]
        center_z = z_idx * voxel_size[2]
        voxel_centers.append([center_x, center_y, center_z])

    voxel_centers = np.array(voxel_centers, dtype=np.float32).reshape(-1, 1, 3)
    #project to image plane
    img_pts, _ = cv2.projectPoints(voxel_centers, rvec, tvec, camera_matrix, None)
    return img_pts

def fuse_features(camera_features, lidar_features):
    """
    cross-attention: queries come from camera_features,
    keys/values from lidar_features, to refine camera representation
    based on 3D data.
    """
    #camera_features, lidar_features are [Batch, Channels, Height, Width]
    fused = torch.cat([camera_features, lidar_features], dim=1)
    fused = F.relu(F.conv2d(fused, torch.randn(128, fused.size(1), 3, 3)))
    return fused

this 29 % detection improvementHelping the model to better understand its surroundings.


40 % decrease in reasoning with Tensorrt and quantity

Even with better integration, the time of reasoning was killing us. The operation of deep learning models on compact devices is the same GPU Cloud. We need Reducing the model without losing accuracy.

Solution: Tensorrt + quantity

  1. Convert Pytorch to TensorrtTensorrt improves calculating graphics, valve layer, and reduce the general expenditures of memory.

  2. INT8 quantum application: These are 32 -bit floating points 8 bit representation Conducting accounts 4x faster.

Measurement technology used:

import torch
from torch.ao.quantization import (
    get_default_qconfig_mapping,
    prepare_fx,
    convert_fx
)

def apply_quantization(model, calibration_data):
    model.eval()

    #Use a QConfigMapping that sets INT8 or mixed precision
    qconfig_mapping = get_default_qconfig_mapping("fbgemm")
    example_inputs = (calibration_data[0].unsqueeze(0), calibration_data[1].unsqueeze(0))
    prepared_model = prepare_fx(model, qconfig_mapping, example_inputs)
    
    #Run calibration
    with torch.no_grad():
        for data in calibration_data:
            fused_input = data[0].unsqueeze(0), data[1].unsqueeze(0)
            prepared_model(*fused_input)
    
    quantized_model = convert_fx(prepared_model)
    return quantized_model

An example of how to convert the Pytorch model:

import tensorrt as trt
import os

def build_tensorrt_engine(onnx_file_path, precision="int8", max_batch_size=4, input_shape=(3, 256, 256)):
    trt_logger = trt.Logger(trt.Logger.INFO)
    builder = trt.Builder(trt_logger)
    network_flags = 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
    network = builder.create_network(flags=network_flags)
    parser = trt.OnnxParser(network, trt_logger)

    if not os.path.exists(onnx_file_path):
        raise FileNotFoundError(f"ONNX file not found: {onnx_file_path}")

    with open(onnx_file_path, "rb") as f:
        if not parser.parse(f.read()):
            for i in range(parser.num_errors):
                print(parser.get_error(i))
            raise ValueError("Failed to parse the ONNX file.")

    config = builder.create_builder_config()
    if precision == "int8":
        config.set_flag(trt.BuilderFlag.INT8)
    elif precision == "fp16":
        config.set_flag(trt.BuilderFlag.FP16)

    profile = builder.create_optimization_profile()
    #define min/opt/max batch sizes
    min_shape = (1,) + input_shape
    opt_shape = (max_batch_size // 2,) + input_shape
    max_shape = (max_batch_size,) + input_shape

    input_name = network.get_input(0).name
    profile.set_shape(input_name, min_shape, opt_shape, max_shape)
    config.add_optimization_profile(profile)

    engine = builder.build_engine(network, config)
    return engine

results:

  • The inference time decreased from 250MS → 75MS per frame.
  • The memory imprint decreased by 40 %.
  • Real time performance led to 5 frames per second.

Next steps: Comprehensive Learning

We are still exploring Models to the end of the endwhere Conception and planning occurs in the one deep learning pipeline. The idea? Instead of multiple units for Discovering organisms, estimating the corridor and planning the pathWe left a nervous network Learn everything from scratch.

Some early results show the promise, but Training needs huge data sets and Interpretation is a challenge. Still, we are heading towards AI-AIC navigation completely.


Final ideas: Amnesty International, which you see and think quickly

By mixing The sensor integration, the ego correction, and the tensorrt accelerationWe have made independent self -intelligence Faster and smarter.

The next step? Disturbing it beyond the search and Get it on the road. If you are working on AV models, let’s communicate, I would like to hear how to address the challenges of the speed of reasoning and the challenges of perception.


More reading:

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button