.le_strings file format

[V] IdolNinja · Jun 15, 2013

I would love to able to add new variables for things like the Extended Taunts mod so the engine don't just output the code name of the taunt preceded by !!. It would also be nice for Sandbox+ which is simply using mission_help_table() and then outputting whatever the first argument is as the error (but it looks like an actual message to the player.) The problem is that the output from this function only displays correctly for the host and not the client in coop. The client gets an unknown hashtag error output to the hud. Being able to edit the string files and create new variables to reference would fix all this hack-y nonsense.

EDIT:
I would like this info for both SR2 (to fix GotR new weapon names) and SRTT (if the formats are different.)

Minimaul · Jun 15, 2013

I think the formats are the same, as I can extract both. It's putting them back together correctly that's the hard part.

[V] Knobby · Jun 17, 2013

Uint32 id (0xa84c7f73)
uint16 version
uint16 num_buckets
uint32 num_pairs
hash bucket array

a hash bucket is a uint32 num pairs followed by a pointer to the pair array
a pair is a uint32 crc and a wchar_t for the text itself.

Generally speaking we do pointer to offset stuff when writing things to files and then the opposite when reading the file, so the pointer in the hash bucket is an offset into the file.

Minimaul · Jun 17, 2013

[V] Knobby said:
Uint32 id (0xa84c7f73)
uint16 version
uint16 num_buckets
uint32 num_pairs
hash bucket array

a hash bucket is a uint32 num pairs followed by a pointer to the pair array
a pair is a uint32 crc and a wchar_t for the text itself.

Generally speaking we do pointer to offset stuff when writing things to files and then the opposite when reading the file, so the pointer in the hash bucket is an offset into the file.

Are they split into buckets in any particular way - and are they sorted in any particular order?

[V] Knobby · Jun 17, 2013

Minimaul said:
Are they split into buckets in any particular way - and are they sorted in any particular order?

Yes, sorry. It looks like to find a string we do the following:

uint32 hash = hash of string id name
for each loaded .le file:

bucket_index = hash & bucket->m_hash_mask

bucket = m_hash_buckets[bucket_index];

linear search on bucket pairs for crc match

Minimaul · Jun 17, 2013

[V] Knobby said:
Yes, sorry. It looks like to find a string we do the following:

uint32 hash = hash of string id name
for each loaded .le file:

bucket_index = hash & bucket->m_hash_mask
bucket = m_hash_buckets[bucket_index];
linear search on bucket pairs for crc match

Do we have any way of knowing what the hash masks are from the le_strings file or are they hardcoded?

[V] Knobby · Jun 17, 2013

Minimaul said:
Do we have any way of knowing what the hash masks are from the le_strings file or are they hardcoded?

actually it turns out to be simple:

m_hash_mask = num_buckets - 1;

Minimaul · Jun 18, 2013

OK, can someone confirm that I'm not going insane here?

I've put a test tool at http://minimaul.thirdstreetsaints.com/files/temp/StringsExtractor.zip (source at https://github.com/saintsrowmods/ThomasJepp.SaintsRow). Testing this tool with SR:TT le_strings files produces the correct result. Testing it with SR2 le_strings files (I used static_US.le_strings) produces garbage - if I try and decode the _TEXT ONLY_ as big endian it seems to work though.

Edit: OK, I think we've confirmed this in steam chat. The rest of the file structure is identical between SR2 on PC and SRTT on PC, but SR2 on PC has the text in a big-endian format, not little-endian. That makes perfect sense.

Hopefully last question:

Is there a rule defining how many buckets are used? I can bucket counts of between 32 and 1024 used in various files I have.

[V] Knobby · Jun 18, 2013

Minimaul said:
OK, can someone confirm that I'm not going insane here?

I've put a test tool at http://minimaul.thirdstreetsaints.com/files/temp/StringsExtractor.zip (source at https://src.tomjepp.co.uk/hg/ThomasJepp.SaintsRow/). Testing this tool with SR:TT le_strings files produces the correct result. Testing it with SR2 le_strings files (I used static_US.le_strings) produces garbage - if I try and decode the _TEXT ONLY_ as big endian it seems to work though.

Edit: OK, I think we've confirmed this in steam chat. The rest of the file structure is identical between SR2 on PC and SRTT on PC, but SR2 on PC has the text in a big-endian format, not little-endian. That makes perfect sense.

Hopefully last question:

Is there a rule defining how many buckets are used? I can bucket counts of between 32 and 1024 used in various files I have.

Yeah, we outsourced the PC port on sr2 so it might be hard to get sr2 PC answers. I am not surprised to hear about crazy things in that port.

Bucket counts are determined by comparing the "valid" sizes to the ideal. Ideal is considered to be num_entries/5 and the valid sizes are 32, 64, 128, 256, 512, and 1024.

Minimaul · Jun 18, 2013

Just thought I'd link this: http://steamcommunity.com/sharedfiles/filedetails/?id=153634495

Now I need to make it work for SR2.

.le_strings file format

[V] IdolNinja

Minimaul

Guest

[V] Knobby

Minimaul

Guest

[V] Knobby

Minimaul

Guest

[V] Knobby

Minimaul

Guest

[V] Knobby

Minimaul

Guest