Hacker Read

hackerlight · 2024-03-08 01:52:19

What's the cost per inference relative to H100? Isn't that the number to care about?

hackerlight | karma 2680 | avg karma 1.46 · | 2024-03-12 16:29:47

This is where inference speed starts to matter. H100 might be cheaper per inference than Groq but cutting down the wait time from 1 minute to 10 seconds could be a big deal.

gmerc | karma 2537 | avg karma 2.88 · | 2023-12-10 00:39:15

Inference costs more.

lucubratory | karma 1579 | avg karma 2.76 · | 2023-08-14 04:09:26

Yes, only interested in the inference cost.

__loam | karma 1729 | avg karma 1.13 · | 2024-03-12 21:01:14

Id bet $50 the inference is more expensive

coffeebeqn | karma 3040 | avg karma 2.28 · | 2023-11-17 15:34:43

Is inference really that expensive? Anyway if the price is too low they could easily charge by query

claytonjy | karma 1395 | avg karma 2.22 · | 2023-10-31 14:51:23

Ah, that's fair, and faster than any of the LMDeploy stats for batch size 1; nice work!

Using an H100 for inference, especially without batching, sounds awfully expensive. Is cost much of a concern for you right now?

reply

m0zg | karma 1604 | avg karma 0.37 · | 2020-02-24 22:41:19+00:00

Some kinds of inference are expensive, yes, not going to dispute that. But 99.95% of it is actually surprisingly inexpensive. Hell, a lot of useful workloads can be deployed on a cell phone nowadays, and that fraction will increase over time, further reducing inference costs or eliminating them outright (or rather moving them to the consumer).

For the vast majority of people the main expense is creating the combination of a dataset and model that works for their practical problem, with the dataset being the harder (and sometimes more expensive) problem of the two.

The dataset is also their "moat", even though most of them don't realize it, and don't put enough care into that part of the pipeline.

reply

vercantez | karma 81 | avg karma 4.5 · | 2024-03-21 15:06:29

Thanks! Yes inference costs are non-negligible right now but we think this will come down over time

christkv | karma 1794 | avg karma 1.37 · | 2023-08-10 07:22:56

Is it possible that inference cost is so high it’s viable?

pwarner | karma 620 | avg karma 2.53 · | 2023-11-15 18:01:49

Shouldn't inference be 99.999% of the compute cost over time? Especially for MS. Look how many Copilots they are cramming into their products

lxgr | karma 11963 | avg karma 2.2 · | 2024-06-18 19:02:59

Are you referring to the (one-time) training cost or the cost per inferred token? The latter is pretty acceptable these days, especially with smaller models.

varunkmohan | karma 366 | avg karma 2.93 · | 2022-08-29 14:08:59

Agreed that there are workloads where inference is not expensive, but it's really workload dependent. For applications that run inference over large amounts of data in the computer vision space, inference ends up being a dominant portion of the spend.

pj_mukh | karma 2336 | avg karma 3.17 · | 2023-10-19 13:47:59

haha a costly demo to run with the hn hug + inference costs..

emadm | karma 401 | avg karma 2.88 · | 2023-08-01 09:42:23

Also missed in the post is fp8 is really much more efficient

The H100s are actually very good for inference..

reply

pityJuke | karma 924 | avg karma 5.78 · | 2024-03-19 13:04:56

Isn’t this already the case? It’s probably still cheaper considering the inference costs.

kajecounterhack | karma 2820 | avg karma 2.23 · | 2023-09-21 19:03:54

Scaled inference isn't cheap either :/

initplus | karma 1817 | avg karma 4.82 · | 2024-03-06 12:23:26

Inference compute costs and training compute costs aren’t the same. Training costs are an order of magnitude higher.

hnav | karma 677 | avg karma 3.76 · | 2024-04-18 18:39:29

They're saying with this architecture there's a tradeoff between training and inference cost where a 10x smaller model (much cheaper to run inference) can match a bigger model if the smaller is trained on 100x data (much more expensive to train) and that the improvement continues log-linearly.

ahoho | karma 86 | avg karma 2.53 · | 2023-02-01 08:12:59

Inference costs are non-trivial, and I wouldn’t be surprised if the cost of running ChatGPT (given the 3M/day figure) has surpassed that of training it. Without optimizations, training only uses ~3 times the memory as inference, so exponential parameter/cost scaling still affects both.

There’s ongoing research to reduce the computational costs of inference, but to my knowledge they only offer linear improvements (although I wouldn’t bet against more substantial reductions in the near future, particularly as these techniques are compounded).

reply