STRAMER: A Structural Tree-Aware Multi-Modal Architecture for Handwritten Mathematical Expression Recognition

Main Article Content

R.Sridevi, G.Sudheer, D.Lalitha Bhaskari

Abstract

Handwritten Mathematical Expression Recognition (HMER) remains challenging because mathematical notation is both spatially structured and highly variable across writers. Existing methods either operate on offline raster images, thereby discarding pen-stroke dynamics, or rely on modular online pipelines in which segmentation and relation errors propagate downstream. We present STRAMER (Structural Tree-Aware Mathematical Expression Recognizer), an end-to-end multi-modal architecture that integrates a DenseNet image encoder and a BiLSTM stroke encoder through bidirectional cross-modal fusion, combines a coverage-augmented Transformer decoder with a tree-aware biaffine dependency module for explicit syntactic supervision, and appends a spatial relation head that yields a complete Stroke Label Graph (SLG). To ensure globally valid structure, tree decoding is performed with maximum spanning arborescence inference and SLG relation labels are obtained with constrained CRF decoding. The model is trained with a triple joint objective comprising sequence, tree, and relation losses. Over five independent runs, STRAMER achieves  ExpRate on CROHME 2019 while additionally producing interpretable SLG outputs. We further report an inference latency of  per expression on a single GPU and provide a full complexity analysis of the relation component. These results indicate that jointly modelling visual appearance, stroke dynamics, and explicit structure is a promising direction for robust HMER.

Article Details

Section
Articles