Cirrascale Cloud Services

The Cerebras AI Model Studio

The Cerebras AI Model Studio is a simple pay by the model computing service powered by dedicated clusters of Cerebras CS-2’s and hosted by Cirrascale Cloud Services. It is a purpose-built platform, optimized for training large language models on dedicated clusters of millions of cores. It provides deterministic performance, requires no distributed computing headaches, and is push-button simple to start.

The Problem

Training large Transformer models such as GPT and T5 on traditional cloud platforms can be painful, expensive, and time consuming. Gaining access to large instances typically offered in the cloud can often takes weeks just to get access. Networking, storage, and compute can cost extra, and setting up the environment is no joke. Models with tens of billions of parameters end up taking weeks to get going and months to train.

If you want to train in less time, you can attempt to reserve additional instances – but unpredictable inter-instance latency, makes distributing AI work difficult, and achieving high performance across multiple instances challenging .

Our Solution

The Cerebras AI Model Studio makes training large Transformer models fast, easy, and affordable. With Cerebras, you have millions of cores, predictable performance, no parallel distribution headaches – all of this enables you to quickly and easily run existing models on your data or to build new models from scratch optimized for your business.

A dedicated cloud-based cluster powered by Cerebras CS-2 systems with millions of AI cores for large language models and generative AI:

Train 1-175 billion parameter models quickly and easily
No parallel distribution pain: single-keystroke scaling over millions of cores
Zero DevOps or firewall pain: simply SSH in and go
Push-button performance: models in standard PyTorch or TensorFlow
Flexibility: pre-train or fine-tune models with your data
Train in a known amount of time, for a fixed fee

No items found.

Should You Fine-Tune or Train from Scratch?

Ultimately the decision to fine-tune a pre-trained model or to train a model from scratch depends on various factors such as the size of the dataset, the similarity between the pre-trained model's task and the new task, the availability of necessary computational resources (we got you covered there), and overall time constraints. To make it as easy as possible, Cerebras developed the flow chart to the right to help guide you.

If you have a small dataset, fine-tuning a pre-trained model can be a good option. In fine-tuning, you take a pre-trained model and retrain it on a new dataset specific to your task. This approach can save you time since the pre-trained model has already learned general features from a large dataset. Fine-tuning can also help to avoid overfitting on small datasets.

However, if you have a large dataset, training a model from scratch may be a better option. Training a model from scratch allows you to have more control over the architecture, hyperparameters, and optimization strategy, which can lead to better performance on the specific task. Additionally, if the pre-trained model's task is significantly different from your task, fine-tuning may not be as effective.

1. Dataset size is dependent upon the model architecture used for training and the task. If you are unsure, our world-class engineers will be happy to help
2. Domain similarity assess if the data used to pre-train a generic model and your data are similar enough such that your fine-tuned model will perform well on downstream tasks

‍

Fine-Tuning

Standard Offering

The Fine-Tuning Standard Offering is a self-service process, similar to the Training from Scratch Standard Offering. Pricing is based per 1,000 tokens so there's no surprises. Minimum spend is $10,000.

White-Glove Support with Cerebras Experts

With White-Glove Support, Cerebras thought leaders will fine-tune a model on the Cerebras Wafer-Scale Cluster on your behalf and will deliver you trained weights. Contact us directly for pricing.

Train From Scratch

Train your own state-of-the-art GPT model for your application on your data. The process is simple:

Pick a large model from the list below (or contact us for custom projects)
See the price, time to train: no surprises
SSH in and get going
- Enjoy secure, dedicated access to programming environment for the training period
- Cerebras model implementation for the chosen model appear
- Systems, code examples, documentation are at your fingertips
- Scripts allow the user to vary training parameters, e.g. batch, learning rate, training steps, checkpointing frequency
- Use Cerebras-curated Pile dataset to train upon, if desired
Save and export trained weights and training log data from your work to use as you see fit

Additional Services Available

Cirrascale and Cerebras provides additional services as needed, such as:

Bigger dedicated clusters to are available to reduce time to accuracy and work on larger models
Additional cluster time for hyperparameter tuning, pre-production training runs, post-production continuous pre-training or fine-tuning is available by the hour
CPU hours from Cirrascale for dataset preparation
CPU or GPU support from Cirrascale for production model inference

Pricing

AMD Instinct Series Instance Pricing

8X AMD Instinct MI325X

Dual 48-Core

2.3TB

(1) 960 NVMe
(4) 3.84TB NVMe

25Gb Bonded
_{(3200Gb Available)}

8X AMD Instinct MI300X

Dual 48-Core

2.3TB

(1) 960 NVMe
(4) 3.84TB NVMe

25Gb Bonded
_{(3200Gb Available)}

$22,499

$20,249

$17,999

4X AMD Instinct MI250

Dual 64-Core

1TB

(1) 960 NVMe
(1) 3.84TB NVMe

25Gb Bonded

$4,679

$4,211

$3,743

8X AMD Instinct MI300X

4X AMD Instinct MI250

Dual 48-Core

Dual 64-Core

2.3TB

1TB

(1) 960 NVMe
(4) 3.84TB NVMe

(1) 960 NVMe
(1) 3.84TB NVMe

25Gb Bonded
_{(3200Gb Available)}

25Gb Bonded

$22,499

$4,679

$20,249

$4,211

$17,999

$3,743

All pricing above is based on Cirrascale's No Surprises billing model. There are no hidden fees and discounts may apply for long-term commitments depending on the service requested. All pricing shown for servers are per server per month.

Pricing

NVIDIA GPU Cloud

8-GPU
NVIDIA B300

Dual 64-Core

3TB

960 NVMe
(4) 3.84TB NVMe

25Gb Bonded
_{(3200Gb Available)}

8-GPU
NVIDIA B200

Dual 48-Core

2TB

960 NVMe
(4) 3.84TB NVMe

25Gb Bonded
_{(3200Gb Available)}

$34,999

$31,499

$27,999

8-GPU
NVIDIA H200

Dual 48-Core

2TB

960 NVMe
(4) 3.84TB NVMe

25Gb Bonded
_{(3200Gb Available)}

$26,499

$23,849

$21,199

8-GPU
NVIDIA H100

Dual 48-Core

2TB

(1) 960 NVMe
(4) 3.84TB NVMe

25Gb Bonded
_{(3200Gb Available)}

$24,999

$22,499

$19,999

Cirrascale Cloud Services has one of the largest selections of NVIDIA GPUs available in the cloud.
The above represents our most popular instances, but check out our pricing page for more instance types.
Not seeing what you need? Contact us for a specialized cloud quote for the configuration you need.

8-GPU NVIDIA B300

8-GPU NVIDIA B200

8-GPU NVIDIA H200

8-GPU NVIDIA H100

Dual 64-Core

Dual 48-Core

3TB

2TB

(1) 960 NVMe
(4) 3.84TB NVMe

25Gb Bonded
_{(3200Gb Available)}

$34,999

$26,499

$24,999

$31,499

$23,849

$22,499

$27,999

$21,199

$19,999

Pricing

Qualcomm Cloud AI 100 Series Bare-Metal Pricing

8X AI 100 Ultra

128

512GB

(2) 3.84TB NVMe

$4,699

$3,759

Octo AI 100 Pro

384GB

1TB NVMe

$2,499

$2,019

Quad AI 100 Pro

182GB

1TB NVMe

$1,259

$1,009

Dual AI 100 Pro

48GB

1TB NVMe

$629

$519

Single AI 100 Pro (128)

128GB

1TB NVMe

$549

$439

Single AI 100 Pro (64)

64GB

1TB NVMe

$369

$289

Single AI 100 Pro (48)

48GB

1TB NVMe

$329

$259

8X AI 100 Ultra

Octo AI 100 Pro

Quad AI 100 Pro

Dual AI 100 Pro

Single AI 100 Pro (128)

Single AI 100 Pro (64)

Single AI 100 Pro (48)

128

512GB

384GB

182GB

48GB

64GB

48GB

(2) 3.84TB NVMe

1TB NVMe

$4,699

$2,499

$1,259

$629

$549

$369

$329

$3,759

$2,019

$1,009

$519

$439

$289

$259

Pricing

The Cerebras AI Model Studio

Fine-Tuning - Standard Offering Pricing

Eleuther GPT-J

$0.00055

$0.0011

$0.0023

132

Eleuther GPT-NeoX

$0.00190

$0.0039

$0.0078

451

CodeGen* 350M

0.35

$0.00003

$0.00006

$0.00013

CodeGen* 2.7B

2.7

$0.00026

$0.0005

$0.0027

CodeGen* 6.1B

6.1

$0.00065

$0.0013

$0.0030

154

CodeGen* 16.1B

16.1

$0.00147

$0.0030

$0.011

350

Eleuther GPT-J

Eleuther GPT-NeoX

CodeGen* 350M

CodeGen* 2.7B

CodeGen* 6.1B

CodeGen* 16.1B

0.35

2.7

6.1

16.1

$0.00055

$0.00190

$0.00003

$0.00026

$0.00065

$0.00147

$0.0011

$0.0039

$0.00006

$0.0005

$0.0013

$0.0030

$0.0023

$0.0078

$0.00013

$0.0027

$0.0030

$0.011

132

451

154

350

* T5 tokens to train from the original T5 paper. Chinchilla scaling laws not applicable.
‍
** Note that GPT-J was pre-trained on ~400B tokens. Fine-tuning jobs can employ a wide range of dataset sizes, but often use order 1-10% of the pre-training tokens. As such, one might fine-tune a model like GPT-J with ~4-40B tokens. We provide estimated wall clock time to fine-tune train the model checkpoints above with 10B tokens on Cerebras AI Model Studio and an AWS p4d instance in the table above to give you a sense of how much time jobs of this scale could take.

Fixed-Price Production Model Training

GPT3-XL

1.3

0.4

$2,500

GPT-J

120

$45,000

GPT-3 6.7B

6.7

134

$40,000

T-5 11B

34*

$60,000

GPT-3 13B

260

$150,000

GPT NeoX

400

$525,000

GPT 70B

1,400

Contact For Quote

GPT 175B

175

3,500

Contact For Quote

GPT3-XL

GPT-J

GPT-3 6.7B

T-5 11B

GPT-3 13B

GPT NeoX

GPT 70B

GPT 175B

1.3

6.7

175

120

134

34*

260

400

1,400

3,500

0.4

Contact For Quote

$2,500

$45,000

$40,000

$60,000

$150,000

$525,000

Contact For Quote

* T5 tokens to train from the original T5 paper. Chinchilla scaling laws not applicable.
‍
** Expected number of days, based on training experience to date, using a 4-node Cerebras Wafer-Scale Cluster. Actual training of model may take more or less time.

Cerebras Cloud

The Cerebras AI Model Studio

The Cerebras AI Model Studio

The Problem

Our Solution

Discover the Benefits of the Cerebras AI Model Studio

Train Large Models in Less Time

Ease of Use

Price

Flexibility

Ownership

Simple & Secure Cloud Operations

Should You Fine-Tune or Train from Scratch?

Fine-Tuning

Standard Offering

White-Glove Support with Cerebras Experts

Train From Scratch

Train your own state-of-the-art GPT model for your application on your data. The process is simple:

Additional Services Available

AMD Instinct Series Instance Pricing

NVIDIA GPU Cloud

Qualcomm Cloud AI 100 Series Bare-Metal Pricing

The Cerebras AI Model Studio

Fine-Tuning - Standard Offering Pricing

Fixed-Price Production Model Training

Ready To Get Started?