In
an
unmarked
office
building
in
Austin,
Texas,
two
small
rooms
contain
a
handful
of
Amazon
employees
designing
two
types
of
microchips
for
training
and
accelerating
generative
AI.
These
custom
chips,
Inferentia
and
Trainium,
offer
AWS
customers
an
alternative
to
training
their
large
language
models
on
Nvidia
GPUs,
which
have
been
getting
difficult
and
expensive
to
procure.
“The
entire
world
would
like
more
chips
for
doing
generative
AI,
whether
that’s
GPUs
or
whether
that’s
Amazon’s
own
chips
that
we’re
designing,”
Amazon
Web
Services
CEO
Adam
Selipsky
told
CNBC
in
an
interview
in
June.
“I
think
that
we’re
in
a
better
position
than
anybody
else
on
Earth
to
supply
the
capacity
that
our
customers
collectively
are
going
to
want.”
Yet
others
have
acted
faster,
and
invested
more,
to
capture
business
from
the
generative
AI
boom.
When
OpenAI
launched
ChatGPT
in
November,
Microsoft
gained
widespread
attention
for
hosting
the
viral
chatbot,
and
investing
a
reported
$13
billion
in
OpenAI.
It
was
quick
to
add
the
generative
AI
models
to
its
own
products,
incorporating
them
into
Bing
in
February.
That
same
month,
Google
launched
its
own
large
language
model,
Bard,
followed
by
a
$300
million
investment
in
OpenAI
rival
Anthropic.
It
wasn’t
until
April
that
Amazon
announced
its
own
family
of
large
language
models,
called
Titan,
along
with
a
service
called
Bedrock
to
help
developers
enhance
software
using
generative
AI.
“Amazon
is
not
used
to
chasing
markets.
Amazon
is
used
to
creating
markets.
And
I
think
for
the
first
time
in
a
long
time,
they
are
finding
themselves
on
the
back
foot
and
they
are
working
to
play
catch
up,”
said
Chirag
Dekate,
VP
analyst
at
Gartner.
Meta
also
recently
released
its
own
LLM,
Llama
2.
The
open-source
ChatGPT
rival
is
now
available
for
people
to
test
on
Microsoft‘s
Azure
public
cloud.
Chips
as
‘true
differentiation’
In
the
long
run,
Dekate
said,
Amazon’s
custom
silicon
could
give
it
an
edge
in
generative
AI.
“I
think
the
true
differentiation
is
the
technical
capabilities
that
they’re
bringing
to
bear,”
he
said.
“Because
guess
what?
Microsoft
does
not
have
Trainium
or
Inferentia,”
he
said.
AWS
quietly
started
production
of
custom
silicon
back
in
2013
with
a
piece
of
specialized
hardware
called
Nitro.
It’s
now
the
highest-volume
AWS
chip.
Amazon
told
CNBC
there
is
at
least
one
in
every
AWS
server,
with
a
total
of
more
than
20
million
in
use.
AWS
started
production
of
custom
silicon
back
in
2013
with
this
piece
of
specialized
hardware
called
Nitro.
Amazon
told
CNBC
in
August
that
Nitro
is
now
the
highest
volume
AWS
chip,
with
at
least
one
in
every
AWS
server
and
a
total
of
more
than
20
million
in
use.
Courtesy
Amazon
In
2015,
Amazon
bought
Israeli
chip
startup
Annapurna
Labs.
Then
in
2018,
Amazon
launched
its
Arm-based
server
chip,
Graviton,
a
rival
to
x86
CPUs
from
giants
like
AMD
and
Intel.
“Probably
high
single-digit
to
maybe
10%
of
total
server
sales
are
Arm,
and
a
good
chunk
of
those
are
going
to
be
Amazon.
So
on
the
CPU
side,
they’ve
done
quite
well,”
said
Stacy
Rasgon,
senior
analyst
at
Bernstein
Research.
Also
in
2018,
Amazon
launched
its
AI-focused
chips.
That
came
two
years
after
Google
announced
its
first
Tensor
Processor
Unit,
or
TPU.
Microsoft
has
yet
to
announce
the
Athena
AI
chip
it’s
been
working
on,
reportedly
in
partnership
with
AMD.
CNBC
got
a
behind-the-scenes
tour
of
Amazon’s
chip
lab
in
Austin,
Texas,
where
Trainium
and
Inferentia
are
developed
and
tested.
VP
of
product
Matt
Wood
explained
what
both
chips
are
for.
“Machine
learning
breaks
down
into
these
two
different
stages.
So
you
train
the
machine
learning
models
and
then
you
run
inference
against
those
trained
models,”
Wood
said.
“Trainium
provides
about
50%
improvement
in
terms
of
price
performance
relative
to
any
other
way
of
training
machine
learning
models
on
AWS.”
Trainium
first
came
on
the
market
in
2021,
following
the
2019
release
of
Inferentia,
which
is
now
on
its
second
generation.
Inferentia
allows
customers
“to
deliver
very,
very
low-cost,
high-throughput,
low-latency,
machine
learning
inference,
which
is
all
the
predictions
of
when
you
type
in
a
prompt
into
your
generative
AI
model,
that’s
where
all
that
gets
processed
to
give
you
the
response,
”
Wood
said.
For
now,
however,
Nvidia’s
GPUs
are
still
king
when
it
comes
to
training
models.
In
July,
AWS
launched
new
AI
acceleration
hardware
powered
by
Nvidia
H100s.
“Nvidia
chips
have
a
massive
software
ecosystem
that’s
been
built
up
around
them
over
the
last
like
15
years
that
nobody
else
has,”
Rasgon
said.
“The
big
winner
from
AI
right
now
is
Nvidia.”
Amazon’s
custom
chips,
from
left
to
right,
Inferentia,
Trainium
and
Graviton
are
shown
at
Amazon’s
Seattle
headquarters
on
July
13,
2023.
Joseph
Huerta
Leveraging
cloud
dominance
AWS’
cloud
dominance,
however,
is
a
big
differentiator
for
Amazon.
“Amazon
does
not
need
to
win
headlines.
Amazon
already
has
a
really
strong
cloud
install
base.
All
they
need
to
do
is
to
figure
out
how
to
enable
their
existing
customers
to
expand
into
value
creation
motions
using
generative
AI,”
Dekate
said.
When
choosing
between
Amazon,
Google,
and
Microsoft
for
generative
AI,
there
are
millions
of
AWS
customers
who
may
be
drawn
to
Amazon
because
they’re
already
familiar
with
it,
running
other
applications
and
storing
their
data
there.
“It’s
a
question
of
velocity.
How
quickly
can
these
companies
move
to
develop
these
generative
AI
applications
is
driven
by
starting
first
on
the
data
they
have
in
AWS
and
using
compute
and
machine
learning
tools
that
we
provide,”
explained
Mai-Lan
Tomsen
Bukovec,
VP
of
technology
at
AWS.
AWS
is
the
world’s
biggest
cloud
computing
provider,
with
40%
of
the
market
share
in
2022,
according
to
technology
industry
researcher
Gartner.
Although
operating
income
has
been
down
year-over-year
for
three
quarters
in
a
row,
AWS
still
accounted
for
70%
of
Amazon’s
overall
$7.7
billion
operating
profit
in
the
second
quarter.
AWS’
operating
margins
have
historically
been
far
wider
than
those
at
Google
Cloud.
AWS
also
has
a
growing
portfolio
of
developer
tools
focused
on
generative
AI.
“Let’s
rewind
the
clock
even
before
ChatGPT.
It’s
not
like
after
that
happened,
suddenly
we
hurried
and
came
up
with
a
plan
because
you
can’t
engineer
a
chip
in
that
quick
a
time,
let
alone
you
can’t
build
a
Bedrock
service
in
a
matter
of
2
to
3
months,”
said
Swami
Sivasubramanian,
AWS’
VP
of
database,
analytics
and
machine
learning.
Bedrock
gives
AWS
customers
access
to
large
language
models
made
by
Anthropic,
Stability
AI,
AI21
Labs
and
Amazon’s
own
Titan.
“We
don’t
believe
that
one
model
is
going
to
rule
the
world,
and
we
want
our
customers
to
have
the
state-of-the-art
models
from
multiple
providers
because
they
are
going
to
pick
the
right
tool
for
the
right
job,”
Sivasubramanian
said.
An
Amazon
employee
works
on
custom
AI
chips,
in
a
jacket
branded
with
AWS’
chip
Inferentia,
at
the
AWS
chip
lab
in
Austin,
Texas,
on
July
25,
2023.
Katie
Tarasov
One
of
Amazon’s
newest
AI
offerings
is
AWS
HealthScribe,
a
service
unveiled
in
July
to
help
doctors
draft
patient
visit
summaries
using
generative
AI.
Amazon
also
has
SageMaker,
a
machine
learning
hub
that
offers
algorithms,
models
and
more.
Another
big
tool
is
coding
companion
CodeWhisperer,
which
Amazon
said
has
enabled
developers
to
complete
tasks
57%
faster
on
average.
Last
year,
Microsoft
also
reported
productivity
boosts
from
its
coding
companion,
GitHub
Copilot.
In
June,
AWS
announced
a
$100
million
generative
AI
innovation
“center.”
“We
have
so
many
customers
who
are
saying,
‘I
want
to
do
generative
AI,’
but
they
don’t
necessarily
know
what
that
means
for
them
in
the
context
of
their
own
businesses.
And
so
we’re
going
to
bring
in
solutions
architects
and
engineers
and
strategists
and
data
scientists
to
work
with
them
one
on
one,”
AWS
CEO
Selipsky
said.
Although
so
far
AWS
has
focused
largely
on
tools
instead
of
building
a
competitor
to
ChatGPT,
a
recently
leaked
internal
email
shows
Amazon
CEO
Andy
Jassy
is
directly
overseeing
a
new
central
team
building
out
expansive
large
language
models,
too.
In
the
second-quarter
earnings
call,
Jassy
said
a
“very
significant
amount”
of
AWS
business
is
now
driven
by
AI
and
more
than
20
machine
learning
services
it
offers.
Some
examples
of
customers
include
Philips,
3M,
Old
Mutual
and
HSBC.
The
explosive
growth
in
AI
has
come
with
a
flurry
of
security
concerns
from
companies
worried
that
employees
are
putting
proprietary
information
into
the
training
data
used
by
public
large
language
models.
“I
can’t
tell
you
how
many
Fortune
500
companies
I’ve
talked
to
who
have
banned
ChatGPT.
So
with
our
approach
to
generative
AI
and
our
Bedrock
service,
anything
you
do,
any
model
you
use
through
Bedrock
will
be
in
your
own
isolated
virtual
private
cloud
environment.
It’ll
be
encrypted,
it’ll
have
the
same
AWS
access
controls,”
Selipsky
said.
For
now,
Amazon
is
only
accelerating
its
push
into
generative
AI,
telling
CNBC
that
“over
100,000”
customers
are
using
machine
learning
on
AWS
today.
Although
that’s
a
small
percentage
of
AWS’s
millions
of
customers,
analysts
say
that
could
change.
“What
we
are
not
seeing
is
enterprises
saying,
‘Oh,
wait
a
minute,
Microsoft
is
so
ahead
in
generative
AI,
let’s
just
go
out
and
let’s
switch
our
infrastructure
strategies,
migrate
everything
to
Microsoft.’
Dekate
said.
“If
you’re
already
an
Amazon
customer,
chances
are
you’re
likely
going
to
explore
Amazon
ecosystems
quite
extensively.”
—
CNBC’s
Jordan
Novet
contributed
to
this
report.
CORRECTION:
This
article
has
been
updated
to
reflect
Inferentia
as
the
chip
used
for
machine
learning
inference.