Developer Reference

Base models: where specialisation starts

The models we customise into specialised language models, and the platforms they deploy to.

From Base to Specialised

A base model is the starting point, not the product

A base model ships with broad general capability and no knowledge of your standards. We post-train it against verification signal built from your experts, until it performs your work to your bar. On open weights the result is a specialised language model you own and run; for Azure-hosted models the same customisation runs through Microsoft Foundry.

Read how the training works →

Supported Families

Four families, from on-device to frontier scale

Most engagements start from an open-weight base we post-train with reinforcement learning. On Azure, we also customise hosted models through Microsoft Foundry. Smaller bases are cheaper to train and faster to serve, so we pick the smallest model that clears your quality bar.

Gemma 4

Google

Google's open-weight family, from a workstation-grade flagship down to phones.

Model card ↗

31BThe flagship dense model. Server-grade quality that still fits a single workstation GPU.Docs ↗
12BThe mid-size model. Unified multimodal: text, vision, and audio in one set of weights.Docs ↗
E4BBuilt for on-device use. Per-layer embeddings keep the working footprint near 4.5B parameters.Docs ↗
E2BThe smallest Gemma. Runs on mobile hardware and in the browser.Docs ↗

Qwen3.5

Alibaba

Alibaba's open-weight family, from a 27B mid-size model down to 0.8B.

Model collection ↗

27BThe largest size we support. 262K-token native context, extensible toward one million.Docs ↗
9BThe largest of the small series. Compact multimodal built for edge serving.Docs ↗
4BA small vision-language model. Covers 201 languages with 262K-token context.Docs ↗
2BA tiny multimodal model for prototyping and task-specific training.Docs ↗
0.8BThe smallest base, for research and rapid iteration.Docs ↗

gpt-oss

OpenAI

OpenAI's open-weight family under Apache 2.0. Both sizes are sparse mixture-of-experts (MoE) models.

Model collection ↗

120BThe larger MoE: 117B total parameters, 5.1B active per token. Runs on a single 80GB GPU.Docs ↗
20BThe smaller MoE: 21B total, 3.6B active. Runs in 16GB of memory.Docs ↗

Microsoft Foundry

Microsoft

Hosted models customised through Microsoft Foundry, plus Microsoft's own MAI line.

Fine-tuning docs ↗

gpt-4.1Hosted on Azure. Customisable with supervised fine-tuning (SFT) and direct preference optimisation (DPO).Docs ↗
gpt-5Hosted on Azure. Customisable with reinforcement fine-tuning, currently by invitation.Docs ↗
MAI-Thinking-1Microsoft's first in-house reasoning model: sparse MoE, 35B active parameters, 256K context.Coming soonDocs ↗
MAI-Code-1-FlashA lightweight agentic coding model, rolling out in GitHub Copilot and VS Code.Coming soonDocs ↗

Deployment

Trains and serves where your data lives

Training runs and the finished model deploy inside your environment, on the platform your stack already trusts.

AWS
Modal
Azure
Databricks
Snowflake
Custom environmentsOn-premises, sovereign cloud, or your own cluster.

We support these, and more on request.

Start From the Right Base

Tell us your domain, your constraints, and where your data lives. We will recommend the base model and deployment that fit.