vitruvianos display pipeline: from pixels to screen

This is a technical deep dive into how V\OS gets pixels from application code to your display. The display pipeline in V\OS inherits from Haiku (which inherits from BeOS) but adds Linux DRM/GBM integration, per-window compositing, and an optional GPU compositor. Understanding the layers helps explain why certain design decisions were made and where the system is heading.

the BeOS/Haiku display model#

BeOS had an elegant display architecture. The app_server owned the entire display pipeline. Applications communicated with it via IPC ports (message queues). There was no X11, no Wayland, no separate compositor process. One server, one buffer, one path from application draw call to screen pixel.

graph TD
    A[BApplication] -->|BMessage IPC| B[app_server]
    B --> C[ServerWindow]
    C --> D[DrawingEngine]
    D --> E[Painter / AGG]
    E --> F[Shared Back Buffer]
    F -->|CopyBackToFront| G[Front Buffer]
    G -->|Display Hardware| H[Screen]

Every BWindow in an application maps to a ServerWindow in the app_server. Drawing commands flow through the DrawingEngine which drives the Painter (AGG, a CPU software rasterizer). All windows render into a single shared back buffer, clipped to their visible region. When drawing completes, CopyBackToFront copies the dirty rectangles from back to front buffer, composites the software cursor, and updates the display.

This model is simple and fast for its era. The shared buffer means no extra copies, no texture uploads, no GPU involvement. But it has a cost: when you drag a window, every window behind it must redraw its exposed region. The app_server does not remember what was behind the window. It was drawn there, then overwritten, and now it needs to be drawn again.

the V\OS display stack#

V\OS runs on Linux, so the display path goes through DRM (Direct Rendering Manager) and GBM (Generic Buffer Manager) instead of a proprietary framebuffer driver. The stack looks like this:

graph TD
    subgraph "Application Process"
        A1[BApplication]
        A2[BWindow / BView]
    end
    
    subgraph "app_server Process"
        S1[ServerWindow]
        S2[DrawingEngine]
        S3[Painter / AGG]
        S4[HWInterface]
        
        subgraph "Display Backend"
            D1[GBMHWInterface]
            D2[GBMBuffer - Front]
            D3[GBMBuffer - Back]
            D4[GLCompositor]
        end
    end
    
    subgraph "Kernel"
        K1[DRM / KMS]
        K2[Nexus IPC]
    end
    
    subgraph "Hardware"
        H1[GPU]
        H2[Display]
    end
    
    A1 -->|Nexus ports| S1
    A2 -->|Draw commands| S2
    S2 --> S3
    S3 --> D3
    D3 -->|CopyBackToFront| D2
    D2 -->|drmModeSetCrtc| K1
    K1 --> H1
    H1 --> H2
    D4 -.->|GPU path| D3
    S4 --> D1

The GBMHWInterface manages two GBMBuffer objects (front and back). Each is a GBM buffer object mapped into CPU memory via gbm_bo_map. AGG renders to the back buffer. CopyBackToFront copies dirty regions from back to front, draws the cursor, flushes the buffer (gbm_bo_write for drivers that use shadow copies), and calls drmModeSetCrtc to tell the GPU to scan out the front buffer.

the drawing cycle#

A single frame follows this sequence:

sequenceDiagram
    participant App as BApplication
    participant SW as ServerWindow
    participant DE as DrawingEngine
    participant P as Painter/AGG
    participant HW as GBMHWInterface
    participant DRM as DRM/KMS
    
    Note over App,DRM: Window needs redraw
    
    SW->>SW: RedrawDirtyRegion()
    SW->>DE: LockParallelAccess()
    SW->>DE: ConstrainClippingRegion(dirtyRegion)
    
    Note over DE,P: Decorator draws border/title
    DE->>P: StrokeRect, FillRect, DrawString
    P->>P: AGG rasterizes to back buffer
    
    SW->>App: AS_BEGIN_UPDATE (via Nexus port)
    App->>SW: Drawing commands (FillRect, DrawString, etc.)
    SW->>DE: Execute drawing commands
    DE->>P: AGG rasterizes to back buffer
    App->>SW: AS_END_UPDATE
    
    SW->>DE: CopyToFront(dirtyRegion)
    DE->>HW: InvalidateRegion(dirtyRegion)
    HW->>HW: CopyBackToFront(frame)
    
    Note over HW: Base class copies back->front
    Note over HW: Draw software cursor
    Note over HW: GBMBuffer::Flush()
    
    HW->>DRM: drmModeSetCrtc(frontBuffer)
    DRM->>DRM: GPU scans out front buffer

The key insight: drawing happens in two phases. First, the server draws the window decorator (title bar, borders, resize handles). Then it sends an update message to the client application, which draws its content. The client’s drawing commands are serialized over a Nexus port and executed by the ServerWindow on the server side. When the client calls EndUpdate, the dirty region is flushed to the display.

the GBM buffer lifecycle#

GBM buffers are the bridge between CPU rendering and GPU display. Here is how they are managed:

stateDiagram-v2
    [*] --> Allocated: gbm_bo_create(LINEAR | WRITE)
    Allocated --> Mapped: gbm_bo_map(READ_WRITE)
    Mapped --> Drawing: AGG writes via CPU pointer
    Drawing --> Drawing: Multiple draw operations
    Drawing --> Flushing: Drawing complete
    Flushing --> Flushing: gbm_bo_write() syncs shadow copy
    Flushing --> Scanout: drmModeSetCrtc(fb_id)
    Scanout --> Drawing: Next frame
    Scanout --> [*]: Window closed / buffer freed
    
    note right of Flushing
        Required for virtio-gpu/virgl where
        gbm_bo_map gives a shadow copy.
        On direct-mapped hardware this is
        a no-op memcpy.
    end note

The Flush() step is critical. On virtio-gpu (QEMU’s GPU emulation), gbm_bo_map returns a shadow buffer in guest memory. CPU writes go to this shadow but the GPU-visible buffer object is not updated until gbm_bo_write explicitly transfers the data. On real hardware with direct-mapped LINEAR buffers, this is effectively a no-op. We create all buffers with both LINEAR and WRITE flags so the flush works everywhere.

drawing strategies: what we tried#

We explored three display strategies. Each has different tradeoffs:

graph LR
    subgraph "Strategy 1: Software Path"
        S1A[All windows] --> S1B[Shared back buffer]
        S1B --> S1C[Partial region copy]
        S1C --> S1D[Software cursor]
        S1D --> S1E[SetCrtc]
    end
    
    subgraph "Strategy 2: Per-Window CPU"
        S2A[Each window] --> S2B[Own buffer]
        S2B --> S2C[CPU blit to back buffer]
        S2C --> S2D[Base class copy + cursor]
        S2D --> S2E[SetCrtc]
    end
    
    subgraph "Strategy 3: Per-Window GPU"
        S3A[Each window] --> S3B[Own buffer]
        S3B --> S3C[Upload as GL texture]
        S3C --> S3D[Render quads to FBO]
        S3D --> S3E[glReadPixels to back buffer]
        S3E --> S3F[SetCrtc]
    end

strategy 1: software path (default, production-ready)#

The original Haiku model adapted for Linux DRM. All windows share one back buffer. Partial dirty-region copies minimize work. The base class HWInterface::CopyBackToFront handles cursor compositing with save/restore.

Pros: Simple, correct, no extra memory per window, cursor works perfectly, no coordinate translation issues.

Cons: Window drag causes full redraw of exposed content. No path to transparency or composition effects.

Status: Default. Rock solid.

strategy 2: per-window CPU compositing (implemented, opt-in)#

Each window gets its own WindowBuffer (a MallocBuffer). AGG renders into the window’s buffer using a virtual base pointer trick that maps screen coordinates to buffer-local positions. Desktop::_CompositeAllWindows assembles all window buffers onto the shared back buffer via memcpy, back-to-front in stacking order.

graph TD
    W1[Desktop Window Buffer] -->|blit at 0,0| BB[Shared Back Buffer]
    W2[Deskbar Window Buffer] -->|blit at 504,0| BB
    W3[App Window Buffer] -->|blit at 80,40| BB
    BB -->|CopyBackToFront| FB[Front Buffer]
    FB -->|SetCrtc| Display
    
    style BB fill:#e1f5fe
    style FB fill:#c8e6c9

Pros: Window drag is just re-blitting at new position (no content redraw). Foundation for composition effects. Each window’s content is preserved independently.

Cons: Extra memory per window. Virtual base pointer causes crashes when windows go off-screen (mitigated by screen bounds clamping). Menu timing issues (Haiku’s async update model means menus flash briefly before items draw). Stale content visible during initial window show.

Status: Implemented, disabled by default. Enable with VOS_PER_WINDOW_COMPOSITING=1.

strategy 3: per-window GPU compositing (implemented, readback issue on virtio)#

Same per-window buffers as strategy 2, but instead of CPU memcpy blit, the GLCompositor uploads each window buffer as a GL texture and renders positioned quads via GLES2 into an FBO. glReadPixels reads the composited result back to the back buffer.

graph TD
    subgraph "GLCompositor (GLES2)"
        T1[Window 1 Texture] --> Q1[Quad at position 1]
        T2[Window 2 Texture] --> Q2[Quad at position 2]
        T3[Window 3 Texture] --> Q3[Quad at position 3]
        Q1 --> FBO[Framebuffer Object]
        Q2 --> FBO
        Q3 --> FBO
    end
    
    FBO -->|glReadPixels| BB[Back Buffer]
    BB -->|CopyBackToFront| FB[Front Buffer]
    FB -->|SetCrtc| Display
    
    note1[Upload: glTexImage2D per window]
    note2[Render: back-to-front quad draw]
    note3[Readback: row-by-row with Y-flip]

Pros: GPU does the compositing work. Foundation for shader effects (blur, transparency, color correction). On real hardware, the texture upload and quad render are fast.

Cons: Requires GPU with working EGL/GLES2. glReadPixels returns zeros intermittently on virtio-gpu (virgl), making it unusable in QEMU. Row-by-row readback is slow through virgl. The ideal path would skip readback entirely and render directly to the scanout buffer, but that requires a different EGL surface model.

Status: Code complete, tested on virgl (RTX 4090 passthrough). Disabled due to readback reliability. Will work on real hardware.

the compositor loop#

When per-window compositing is active, the compositor runs on every display update:

flowchart TD
    A[Window draws to per-window buffer] --> B{All windows have content?}
    B -->|No| C[CPU Fallback]
    B -->|Yes| D{GPU compositor available?}
    
    D -->|No| C
    D -->|Yes| E[GPU Path]
    
    subgraph CPU[CPU Fallback Path]
        C --> C1[Blit each window buffer to back buffer]
        C1 --> C2[memcpy, back-to-front order]
    end
    
    subgraph GPU[GPU Compositor Path]
        E --> E1[Upload window buffers as GL textures]
        E1 --> E2[Clear FBO to workspace bg color]
        E2 --> E3[Render positioned quads back-to-front]
        E3 --> E4[glReadPixels to back buffer]
    end
    
    C2 --> F[CopyBackToFront]
    E4 --> F
    
    F --> G[Base class: copy back to front buffer]
    G --> H[Draw software cursor]
    H --> I[GBMBuffer::Flush]
    I --> J[drmModeSetCrtc]
    J --> K[Display updated]

The fallback is important. If the GPU compositor fails (EGL init error, readback zeros, shader compile failure), the CPU path handles it identically. The display always works.

the virtual base pointer trick#

Per-window compositing requires translating screen coordinates to window-local buffer positions. Haiku’s entire drawing stack uses screen coordinates. Every Painter method, every AGG rasterizer, every view clipping region operates in screen space.

Rather than modifying hundreds of drawing methods, we shift the AGG rendering buffer’s base pointer:

graph LR
    subgraph "Screen Space"
        SS[Screen 640x480]
        WF["Window at (100, 50)<br/>Size: 200x150"]
    end
    
    subgraph "Buffer Space"
        B["WindowBuffer 200x150<br/>Actual allocation"]
    end
    
    subgraph "Virtual Space"
        VB["Virtual buffer 300x200<br/>Base pointer shifted back<br/>by (-100, -50) * stride"]
    end
    
    SS --> VB
    VB --> B
    
    note["Screen coord (100,50) = <br/>virtualBase + 50*stride + 100*4 = <br/>bufferBits + 0 = buffer[0][0]"]

The virtual buffer appears to start at screen position (0,0) and extend to (300,200). But the actual pixel data only occupies (100,50) to (299,199) in this virtual space. AGG’s clip box limits rendering to the window’s region. Screen coordinate (150, 100) maps to virtual buffer position (150, 100), which maps to actual buffer position (50, 50). No coordinate translation needed in any drawing method.

The tradeoff: if a window is dragged so its top-left goes off-screen (negative coordinates), the virtual base points before the buffer allocation, causing segfaults. We handle this by clamping all drawing regions to screen bounds and falling back to the software path for off-screen windows.

the Nexus IPC layer#

The drawing commands between application and app_server flow through Nexus, V\OS’s kernel IPC module:

graph TD
    subgraph "Application"
        BV[BView::Draw]
        BV --> BC[BView drawing API]
        BC --> PL[PortLink - serialize commands]
        PL --> NP[Nexus Port - write]
    end
    
    subgraph "Kernel Module"
        NP --> NQ[Nexus Port Queue]
        NQ --> NR[Nexus Port - read]
    end
    
    subgraph "app_server"
        NR --> SW[ServerWindow::_DispatchMessage]
        SW --> DE[DrawingEngine methods]
        DE --> PA[Painter / AGG rasterize]
    end
    
    style NQ fill:#fff3e0

Nexus ports are bounded message queues implemented as a character device (/dev/nexus). They provide blocking read/write semantics with proper sleep/wake, which is why they are kernel modules rather than userspace shims. The port system handles the thread synchronization that makes BeginUpdate/EndUpdate work correctly across process boundaries.

outcomes and what we learned#

timeline
    title Display Pipeline Development Timeline
    
    section Boot & Basics
        DRM backend boots : Software cursor
                          : GBM buffer management
                          : drmModeSetCrtc display
    
    section Per-Window Foundation
        WindowBuffer class : DrawingEngine attachment
                           : Virtual base pointer
                           : Compositor orchestration
    
    section GPU Compositor
        EGL/GLES2 FBO : Multi-texture rendering
                      : Readback (virgl zeros)
                      : CPU fallback chain
    
    section Polish & Demos
        BMenu fixes : Painter bounds checking
                    : Screen clamp safety
                    : es2gears demo
                    : setres tool

The biggest lesson: the Haiku display pipeline is well-designed for its model. Every attempt to “improve” it by bypassing the base class (CopyBackToFront, cursor handling, partial region copy) broke something. The successful approach was always to build on top of what exists rather than replacing it.

The per-window compositing model is architecturally sound but collides with Haiku’s assumption of a single shared buffer in subtle ways. Menu timing, update session clipping, buffer lifecycle during window creation – these are all places where the single-buffer assumption is baked into the design. A complete per-window compositor would need to rethink the BeginUpdate/EndUpdate model to work more like Wayland’s “commit when ready” approach.

The GPU compositor works technically (confirmed by pixel-level verification of the readback data) but is limited by virtio-gpu’s inability to reliably transfer rendered pixels back to guest memory. On real hardware, the entire GPU path would activate immediately.

where this is going#

graph TD
    subgraph "Current (Working)"
        A[Software Path] --> B[GBM + SetCrtc]
        C[Per-Window CPU] --> B
    end
    
    subgraph "Near Term"
        D[GPU Compositor on Real HW]
        E[BGLView for App Rendering]
        F[Desktop Icons Fix]
    end
    
    subgraph "Future"
        G[DMA-buf Window Sharing]
        H[Per-Window GPU Rendering]
        I[Shader Effects - Blur, Transparency]
        J[Hardware Cursor Planes]
    end
    
    B --> D
    D --> E
    E --> G
    G --> H
    H --> I
    D --> J

The immediate priority is testing the GPU compositor on real hardware. The code is complete and wired up. After that, implementing BGLView would let applications render directly with OpenGL into GPU-backed window buffers, which the compositor could sample as textures without any CPU readback. That is the path to true GPU-accelerated application rendering on V\OS.

The code is at github.com/VitruvianOS.