1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

.le_strings file format

Discussion in 'Ask Volition!' started by [V] IdolNinja, Jun 15, 2013.

  1. [V] IdolNinja

    [V] IdolNinja Volition Staff

    I would love to able to add new variables for things like the Extended Taunts mod so the engine don't just output the code name of the taunt preceded by !!. It would also be nice for Sandbox+ which is simply using mission_help_table() and then outputting whatever the first argument is as the error (but it looks like an actual message to the player.) The problem is that the output from this function only displays correctly for the host and not the client in coop. The client gets an unknown hashtag error output to the hud. Being able to edit the string files and create new variables to reference would fix all this hack-y nonsense. :)

    I would like this info for both SR2 (to fix GotR new weapon names) and SRTT (if the formats are different.)
  2. Minimaul

    Minimaul Site owner Staff Member

    I think the formats are the same, as I can extract both. It's putting them back together correctly that's the hard part. ;)
  3. [V] Knobby

    [V] Knobby Volition Staff

    Uint32 id (0xa84c7f73)
    uint16 version
    uint16 num_buckets
    uint32 num_pairs
    hash bucket array

    a hash bucket is a uint32 num pairs followed by a pointer to the pair array
    a pair is a uint32 crc and a wchar_t for the text itself.

    Generally speaking we do pointer to offset stuff when writing things to files and then the opposite when reading the file, so the pointer in the hash bucket is an offset into the file.
  4. Minimaul

    Minimaul Site owner Staff Member

    Are they split into buckets in any particular way - and are they sorted in any particular order?
  5. [V] Knobby

    [V] Knobby Volition Staff

    Yes, sorry. It looks like to find a string we do the following:

    uint32 hash = hash of string id name
    for each loaded .le file:
    bucket_index = hash & bucket->m_hash_mask​
    bucket = m_hash_buckets[bucket_index];​
    linear search on bucket pairs for crc match​
  6. Minimaul

    Minimaul Site owner Staff Member

    Do we have any way of knowing what the hash masks are from the le_strings file or are they hardcoded?
  7. [V] Knobby

    [V] Knobby Volition Staff

    actually it turns out to be simple:

    m_hash_mask = num_buckets - 1;
  8. Minimaul

    Minimaul Site owner Staff Member

    OK, can someone confirm that I'm not going insane here?

    I've put a test tool at http://minimaul.thirdstreetsaints.com/files/temp/StringsExtractor.zip (source at https://src.tomjepp.co.uk/hg/ThomasJepp.SaintsRow/). Testing this tool with SR:TT le_strings files produces the correct result. Testing it with SR2 le_strings files (I used static_US.le_strings) produces garbage - if I try and decode the _TEXT ONLY_ as big endian it seems to work though.

    Edit: OK, I think we've confirmed this in steam chat. The rest of the file structure is identical between SR2 on PC and SRTT on PC, but SR2 on PC has the text in a big-endian format, not little-endian. That makes perfect sense. o_O

    Hopefully last question:

    Is there a rule defining how many buckets are used? I can bucket counts of between 32 and 1024 used in various files I have.
  9. [V] Knobby

    [V] Knobby Volition Staff

    Yeah, we outsourced the PC port on sr2 so it might be hard to get sr2 PC answers. I am not surprised to hear about crazy things in that port.

    Bucket counts are determined by comparing the "valid" sizes to the ideal. Ideal is considered to be num_entries/5 and the valid sizes are 32, 64, 128, 256, 512, and 1024.
  10. Minimaul

    Minimaul Site owner Staff Member