TFLite Articles

From PyTorch to TFLite on Android: A Practical Model Format Conversion Guide

A hands-on guide to converting PyTorch models through ONNX into TFLite and MediaPipe for on-device Android deployment, covering dynamic shapes, operator compatibility, and INT8 quantization.

A Practical Guide to On-Device AI Pipelines with Kotlin Flow

Model on-device inference as a declarative, composable pipeline of Preprocess, Inference, and Postprocess stages wired together with Kotlin Flow operators, backpressure, and layered tests.

Android On-Device AI Inference Warmup: From Model Loading to First-Token Latency

A practical breakdown of on-device AI cold-start latency: model loading, GPU Delegate initialization, KV cache prefill, warmup inference, long-lived contexts, and memory tradeoffs.

Designing an On-Device LLM Inference Scheduler: Priority Queues and Backpressure in Practice

This article shows how to build a scheduling layer above an on-device inference engine, using priority queues, preemption, and backpressure to avoid OOMs, unpredictable latency, and out-of-order results.

Android On-device AI Image Preprocessing: From Bitmap Pixels to Tensor Input

November 14, 2025

A full Android on-device AI preprocessing pipeline from Bitmap pixels to tensor input, covering memory layout, pixel conversion, resize choices, normalization, and zero-copy optimization.