How I cut GTA Online loading times by 70%


GTA On-line. Frightful for its slack loading times. Having picked up the game again to halt seemingly the most more contemporary heists I changed into as soon as nervous (/s) to stare that it calm hundreds fair as slack as the day it changed into as soon as launched 7 years ago.

It changed into as soon as time. Time to resolve this.

Recon

First I wished to verify if somebody had already solved this train. Loads of the outcomes I discovered pointed in opposition to anecdata about how the game is so sophisticated that it needs to load goodbye, stories on how the p2p community structure is garbage (no longer asserting that it isn’t), some elaborate systems of loading into yarn mode and a solo session after that and a pair of mods that allowed skipping the startup Rimprint video. Some extra discovering out advised me we would possibly per chance well also place a whopping 10-30 seconds with these mixed!

Meanwhile on my PC…

Benchmark

1
2
3
4
5
6
7
8
Memoir mode load time:  ~1m 10s
On-line mode load time: ~6m flat
Startup menu disabled, time from Rimprint except in-sport (social membership login time is now not any longer counted).

Veteran but decent CPU: AMD FX-8350
Low-rate-o SSD: KINGSTON SA400S37120G
We've to possess RAM: 2x Kingston 8192 MB (DDR3-1337) 99U5471
True-ish GPU: NVIDIA GeForce GTX 1070

I know my setup is dated but what on earth would possibly per chance well also take 6x longer to load into on-line mode? I couldn’t measure any distinction the exercise of the yarn-to-on-line loading methodology as others possess discovered sooner than me. Despite the truth that it did work the outcomes would be down in the noise.

I Am (No longer) By myself

If this poll is to be relied on then the train is current adequate to mildly annoy better than 80% of the participant inappropriate. It’s been 7 years R*!

🎵What does the poll say?🎵

Looking around moderately to procure who’re the lucky ~20% that procure sub 3 minute load times I came all over about a benchmarks with high-stop gaming PCs and an on-line mode load time of about 2 minutes. I’d abolish hack for a 2 minute load time! It does seem like hardware-dependent but something doesn’t add up here…

How advance their yarn mode calm takes advance a minute to load? (The M.2 one didn’t count the startup logos btw.) Additionally, loading yarn to on-line takes them supreme a minute extra whereas I’m getting about five extra. I know that their hardware specs are grand better but no doubt no longer 5x better.

Highly correct measurements

Armed with such highly superb instruments as the Job Supervisor I began to evaluation what sources would possibly per chance well also very well be the bottleneck.

Can you smell it?

After taking a minute to load the frequent sources feeble for each and every yarn and on-line modes (which is advance on par with high-stop PCs) GTA decides to max out a single core on my machine for four minutes and accomplish nothing else.

Disk usage? None! Network usage? There’s moderately, but it drops normally to zero after about a seconds (aside from loading the rotating info banners). GPU usage? Zero. Memory usage? Fully flat…

What, is it mining crypto or something? I smell code. The truth is base code.

Single thread-sure

While my normal AMD CPU has 8 cores and it does pack a punch, it changed into as soon as made in the olden days. Reduction when AMD’s single-thread efficiency changed into as soon as manner tiring Intel’s. This would possibly per chance also no longer present the total load time variations but it’ll present most of it.

What’s uncommon is that it’s the exercise of up fair the CPU. I changed into as soon as looking at for wide amounts of disk reads loading up sources or hundreds of community requests attempting to negotiate a session in the p2p community. However this? Right here’s potentially a bug.

Profiling

Profilers are a wide manner of discovering CPU bottlenecks. There’s supreme one train – most of them rely on instrumenting the source code to procure image of what’s going down in the intention. And I don’t possess the source code. Nor attain I want microsecond-perfect readings – I if truth be told possess 4 minutes’ price of a bottleneck.

Enter stack sampling: for closed source applications there’s supreme one choice. Dump the operating process’ stack and contemporary instruction pointer’s space to assemble a calling tree in space intervals. Then add them as a lot as procure statistics on what’s happening. There’s supreme one profiler that I know of (would possibly per chance well even be ignorant here) that would possibly per chance well attain this on Dwelling windows. And it hasn’t been as a lot as this point in over 10 years. It’s Luke Stackwalker! Somebody, please give this venture some handle 🙂

The power of statistics compels you!

Fundamentally Luke would community the same capabilities collectively but since I don’t possess debugging symbols I had to eyeball nearby addresses to guess if it’s the same space. And what attain we look? No longer one bottleneck but two of them!

Down the rabbit hole

Having borrowed my buddy’s fully legit reproduction of the exchange-normal disassembler (no, I in actuality can’t afford the train… gonna learn to ghidra one among in the intervening time) I went to take GTA apart.

Gibberish Galore

That doesn’t look correct at all. Most high-profile video games advance with constructed-in protection in opposition to reverse engineering to protect away pirates, cheaters, and modders. No longer that it has ever stopped them.

There looks to be some sort of an obfuscation/encryption at play here that has replaced most instructions with gibberish. No longer to concern, we simply must dump the game’s memory whereas it’s executing the half we’re alive to to have a study. The instructions possess to be de-obfuscated sooner than operating one manner or one other. I had Task Dump mendacity around, so I feeble that, but there are loads of alternative instruments readily available in the market to attain this vogue of train.

Recount one: It’s… strlen?!

Disassembling the now-much less-obfuscated dump finds that one among the addresses has a group up pulled out of somewhere! It’s strlen? Going down the decision stack the subsequent one is labeled vscan_fn and after that the labels stop, tho I’m quite assured it’s sscanf.

A graph a day keeps the skeptics away

It’s parsing something. Parsing what? Untangling the disassembly would take forever so I obvious to dump some samples from the operating process the exercise of x64dbg. Some debug-stepping later it turns out it’s… JSON! They’re parsing JSON. A whopping 10 megabytes price of JSON with some 63k merchandise entries.

1
2
3
4
5
6
7
8
9
10
11
...,
{
"key": "WP_WCT_TINT_21_t2_v9_n2",
"set up": 45000,
"statName": "CHAR_KIT_FM_PURCHASE20",
"storageType": "BITFIELD",
"bitShift": 7,
"bitSize": 1,
"class": ["CATEGORY_WEAPON_MOD"]
},
...

What’s it? It looks to be info for a “rep shop catalog” in step with some references. I decide it gains a list of the total imaginable items and upgrades you are going to be ready to do away with in GTA On-line.

Clearing up some confusion: I beleive these are in-sport money purchasable items, no longer at as soon as linked with microtransactions.

However 10 megs? That’s nothing! And the exercise of sscanf would possibly per chance well no longer be optimum but no doubt it’s no longer that base? Successfully…

Ouch!

Yeah, that’s gonna take a whereas… To be beautiful I had no thought most sscanf implementations called strlen so I will’t blame the developer who wrote this. I’d choose it fair scanned byte by byte and will stop on a NULL.

Recount two: Let’s exercise a Hash- … Array?

Looks the 2nd wrongdoer is called correct subsequent to the first one. They’re each and every even called in the same if observation as viewed on this grotesque decompilation:

Beggar thy neighbour

All labels are mine, no thought what the capabilities/parameters are in actuality called.

The 2nd train? Factual after parsing an merchandise, it’s kept in an array (or an inlined C++ list? no longer sure). Each and every entry looks something esteem this:

1
2
3
4
struct {
uint64_t *hash;
item_t *merchandise;
} entry;

However sooner than it’s kept? It checks the entire array, one after the other, comparing the hash of the merchandise to look if it’s in the list or no longer. With ~63k entries that’s (n^2+n)/2=(63000^2+63000)/2=1984531500 checks if my math is correct. Most of them unnecessary. You possess uncommon hashes why no longer exercise a hash scheme.

Oof!

I named it hashmap whereas reversing but it’s clearly not_a_hashmap. And it will get even better. The hash-array-list-train is empty sooner than loading the JSON. And the total items in the JSON are uncommon! They don’t even want to verify if it’s in the list or no longer! They even possess a characteristic to at as soon as insert the items! Fantastic exercise that! Srsly, WAT!?

PoC

Now that’s fantastic and all, but no person goes to take me severely unless I test this so I will write a clickbait title for the post.

The conception? Write a .dll, inject it in GTA, hook some capabilities, ???, earnings.

The JSON train is furry, I will’t realistically replace their parser. Replacing sscanf with one which doesn’t rely on strlen would be extra real looking. However there’s an even more uncomplicated manner.

  • hook strlen
  • await a prolonged string
  • “cache” the begin and length of it
  • if it’s called again correct during the string’s vary, return cached price

Something esteem:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
size_t strlen_cacher(charstr)
{
static charbegin;
static charstop;
size_t len;
const size_t cap=20000;


if (begin && str>=begin && str

len=stop - str;



if (len 2)

MH_DisableHook((LPVOID)strlen_addr);


return len;
}




len=builtin_strlen(str);



if (len> cap) {
begin=str;
stop=str + len;
}


return len;
}

And as for the hash-array train, it’s extra easy – fair skip the duplicate checks fully and insert the items at as soon as since all of us know the values are uncommon.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
char __fastcall netcat_insert_dedupe_hooked(uint64_t catalog, uint64_tkey, uint64_tmerchandise)
{

uint64_t not_a_hashmap=catalog + 88;


if (!(*(uint8_t(__fastcall(uint64_t*))(*merchandise + 48))(merchandise))
return 0;


netcat_insert_direct(not_a_hashmap, key, &merchandise);



if (*key==0x7FFFD6BE) {
MH_DisableHook((LPVOID)netcat_insert_dedupe_addr);
promote off();
}

return 1;
}

Full source of PoC here.

Results

Successfully, did it work then?

1
2
3
4
5
6
Normal on-line mode load time:        ~6m flat
Time with supreme duplication test patch: 4m 30s
Time with supreme JSON parser patch: 2m 50s
Time with each and every components patched: 1m 50s

(6*60 - (1*60+50)) / (6*60)=69.4% load time enchancment (fantastic!)

Hell yes, it did! :))

Seemingly, this won’t therapy everyone’s load times – there would possibly per chance well even be other bottlenecks on varied systems, but it’s this sort of gaping hole that I have not any thought how Rhas missed all of it these years.

tl;dr

  • There’s a single thread CPU bottleneck whereas setting out GTA On-line
  • It turns out GTA struggles to parse a 10MB JSON file
  • The JSON parser itself is poorly constructed / naive and
  • After parsing there’s a slack merchandise de-duplication routine

Rplease repair

If this

=span>

Read More

Recent Content