Welcome!

Join our community of MMO enthusiasts and game developers! By registering, you'll gain access to discussions on the latest developments in MMO server files and collaborate with like-minded individuals. Join us today and unlock the potential of MMO server development!

Join Today!

Automatic Translator

Moderator
Staff member
Moderator
Joined
Feb 22, 2008
Messages
2,404
Reaction score
723
I am building an Automatic Translator for any PT Executable. This is how it will work:

1 - The program will search for all strings in the given exe, and then some filters are added to clean the output result. Such as:

The final buffer will contain only strings after "DRExp" - indicates the end of the item table, and from what I could see, its the place where all the strings we want are.

Only strings with length great than 4 caracters.

2 - You select the desired text in the listbox, and you'll be able to edit it to any size you want. If the string has the same size of the last, it wont be moved. Otherwise, it will be moved to an empty location (a section you must add to have the new translated strings) and all references to that string will be updated aswell.


- Issues

Too much unreadable strings, C++ enviroment strings, like error strings... I was thinking about filtering only A-Z/0-9 caracters, removing everything else. Good solution?


Right now, I didnt do anything related to acctually translate the string, Im just cleaning the output, and then, when it reads the strings that really matters,I'll start with the second part. I will post screen shots after the program is more solid.

Any ideas?
 
Joined
Jan 19, 2007
Messages
400
Reaction score
38
very nice, i have an ideia, you can do a 'pre-made' section, with all your words... like that old section of RPT, in this new section , who you can load it from and .ini file, who will have all words, like "Connection Lost" , for the section,and a find string can be replaced by a pointer location. i think you will have a lot of work for do a compatible string-to-string.... you may be abble to do a find Hex string, for corean,japan,tai.... and the first search will take a replace for line one in your .ini ...
you don't need to do a Translation software, you can do a Replace software, who will find all words and replace it for some NEW word in your New section... there's a lot work do do, but is a one time work.
 
Joined
Jan 19, 2007
Messages
400
Reaction score
38
i use this, in delphi :
function ScanFile( const filename: String; ForString: String; caseSensitive: Boolean ): LongInt;
const
BufferSize= $8001;
var
pBuf, pEnd, pScan, pPos: Pchar;
filesize: LongInt;
bytesRemaining: LongInt;
bytesToRead: Word;
F: File;
SearchFor: Pchar;
oldMode: Word;
begin
Result := - 1;
if ( Length(forString) = 0) or (Length( filename ) = 0) then
Exit;
SearchFor := Nil;
pBuf := Nil;
AssignFile( F, filename );
oldMode := FileMode;
FileMode := 0;
Reset( F, 1 );
FileMode := oldMode;
try
SearchFor := StrAlloc( Length(forString) +1 );
StrPCopy( SearchFor, forString );
if not caseSensitive then
AnsiUpper( SearchFor );
GetMem( pBuf, BufferSize );
filesize := System.Filesize( F );
bytesRemaining := filesize;
pPos := Nil;
while bytesRemaining > 0 do
begin
if bytesRemaining >= BufferSize then
bytesToRead := Pred( BufferSize )
else
bytesToRead := bytesRemaining;
BlockRead( F, pBuf^, bytestoread );
pEnd := @pBuf[ bytesToRead ];
pEnd^ := #0;
pScan := pBuf;
while pScan < pEnd do
begin
if not caseSensitive then
AnsiUpper( pScan );
pPos := StrPos( pScan, SearchFor );
if pPos <> Nil then
begin
Result := FileSize - bytesRemaining + LongInt( pPos )
- LongInt( pBuf );
Break;
end;
pScan := StrEnd( pScan );
Inc( pScan );
end;
if pPos <> Nil then
Break;
bytesRemaining := bytesRemaining - bytesToRead;
if bytesRemaining > 0 then
begin
Seek( F, FilePos(F) - Length(forString) );
bytesRemaining := bytesRemaining + Length(forString) ;
end;
end;
finally
CloseFile( F );
if SearchFor <> Nil then
StrDispose( SearchFor );
if pBuf <> Nil then
//FreeMem( pBuf, BufferSize );
end;
FreeMem( pBuf, BufferSize );
end;
 
Custom Title Activated
Loyal Member
Joined
May 26, 2007
Messages
5,545
Reaction score
1,314
Need a better algorithm to find the strings... :(:
Yes... I never found anything completely accurate, to the extent that no manual intervention is necessary.

I did wonder if searching for references in the code would provide better "confidence" in prediction.
 
Initiate Mage
Joined
Apr 6, 2013
Messages
4
Reaction score
0
Need a better algorithm to find the strings... :(:
How do I know you speak Portuguese, I'll use it.

Para achar strings é muito fácil, você deseja fazer isso na memória Virtual ou na memória Física ?
Se for na Física você poderá fazer uso da CreateFileA e ReadFileA, Na Seção .text(onde na maioria das vezes ficam os textos) ou no executável por inteiro basta você fazer uso do *(CHAR*) ou LPCSTR, ambos vão pegando os bytes até encontrar o zerado (Toda Ansistring termina com byte zerado) Na Virtual é a mesma coisa a diferença é que na física você irá utilizar o tamanho do arquivo como referência para o fim da memória.
 
Moderator
Staff member
Moderator
Joined
Feb 22, 2008
Messages
2,404
Reaction score
723
Errrr... thats not the problem, the problem is identifying only the strings we want to translate removing all the undesired garbage.

After all, I dont wanna build a program to make translation harder, in fact its the opposite =p

n0HX4eB - Automatic Translator - RaGEZONE Forums


I found a way to filter some strings, as you see, only a few strings (considering everything the algorithm can find in the exe) but still, there are some "??????" strings, that I assume it is this way because it may be some remanescent of korean text inside QF 1873. (I'm using QF 1873 to basic tests).

I modified the algorithm to, instead of returning strings already made, I'm getting a byte array and I am converting them to string after I get all the values.

This way I have more flexibility over the strings. Now, one question:
Is it possible to guess the string codepage given the text? How could I remove those ???? and show its correspondent korean character, or any character it might be?

I am using ASCIIEncoding ( ) to convert from byte[] to string, but seems it is not working for everything.. Which codepage is the korean strings inside the game?
 

Attachments

You must be registered for see attachments list
Last edited:
Initiate Mage
Joined
Apr 6, 2013
Messages
4
Reaction score
0
Errrr... thats not the problem, the problem is identifying only the strings we want to translate removing all the undesired garbage.

After all, I dont wanna build a program to make translation harder, in fact its the opposite =p
LPCSTR | CHAR read just one string, by example:
Offset Ansi
001 "Juan is gay" (bytes 00)
013 "Juan isn't Gay."
"*(CHAR*) (001)" will receive just the Ansi : "Juan is gay" in other words you will can indentify only the string that you want with a loop, in case.
if you want I can show a source code for you, but in private because I hate write in english :x
 
Moderator
Staff member
Moderator
Joined
Feb 22, 2008
Messages
2,404
Reaction score
723
I'm not searching for a particular string, it is more like teaching the program how to identify the strings we want to translate... its a little bit abstract, I understand if you find this a little incomprehensive...

Take a look again in my post and see if you can understand what I am talking about.

Simple: Gets all the texts inside the executable, and iterate through everyone, identifying which string is crap and which one is not
 
Joined
Jan 19, 2007
Messages
400
Reaction score
38
Sheen, maybe i see something... if you see, in your software you find all Korean Mix strings, and you miss a lot of it, its happen cos u have some problem if your string engine, i don't know how you do for find it, but maybe you try to serach by an hex character...
Bob did not make something similar??? to find all strings in hex??? i remember something like that, i will search, if i found i will post here.
@@ maybe this help in something... @@
http://forum.ragezone.com/f740/beta-technical-demonstration-translation-without-688118/
 
Custom Title Activated
Loyal Member
Joined
May 26, 2007
Messages
5,545
Reaction score
1,314
The Korean client is written for codepage 949. (EUC Korean)

You could (maybe) use the MultiByteToWideChar() I mentioned to Vormav in trying to make his h/w accelerated fonts work in different languages to create a translation from CP949 to UTF16LE and count the number of "default character"s. That's the place-holder character that will be placed at any code sequence with isn't understood by CP949. The ones which usually display as question marks, or boxes. (depending on how the "place-holder" is shaped in the current font)

This may help:-


@microamazing: Yes, I did release as an aid to translation. It pulls out everything which could possibly be a string, and lists it's memory offset so you can manually look for uses of those offsets in the code... or use something like the International DLL I wrote to patch API calls replacing those addresses with alternate ones, containing your translation. ^_^

Sheen is looking to take that to the next level by completely automating the process, but patching internally to the exe, and not with a hooking DLL. ;)
 
Custom Title Activated
Loyal Member
Joined
Jan 28, 2009
Messages
1,320
Reaction score
616
UTF16BE :junglejane:


Yes, you should use ICONV to convert between code pages.
To detect language and cp you can use ICU... but I got poor results on kPT

There is even .NET version but can't tell if its any good.


How about checking if string is referenced and adding "confidence" next to your string?
e.g. if "bla bla bla" address is PUSHed or MOVed in exe than that would add +1 confidence for every push/mov.

This way you can start developing some 'learning' mechanism. You could also add option that would down vote strings and send results to database.


And as bob said, results will never hit 100% so add REMOVE / IGNORE btn :D
e.g. WP120 is a sting... but it should not be translated ^^
Well you are a smart cookie, you will find your way.
 
Moderator
Staff member
Moderator
Joined
Feb 22, 2008
Messages
2,404
Reaction score
723
Thats exactly what i waslooking for, thx bob and vormav, i think i can start making it more accurate now

--edit

Built the find references function, but found a problem: I have an array with 191.069 elements (all of them are byte[], strings from the game.exe) and iterating thorugh every element to find if one particular offset is found (in all bytes of game.exe), 191.069 times, is taking too long. Need to find a workaround... any options?



I could read in chuncks, but then, to get the reference, I'd still need to search in the entire buffer =(
 
Last edited:
Custom Title Activated
Loyal Member
Joined
Jan 28, 2009
Messages
1,320
Reaction score
616
Use vactors, make vector containing everything that was MOVed or PUSHed and compare it with vector that contain strings and string addresses.
 
Moderator
Staff member
Moderator
Joined
Feb 22, 2008
Messages
2,404
Reaction score
723
Already done, but search is taking too long. I managed to decrease the size of my bigger array from 191k to 130k, but it is still taking too long. I PMed Negata to see if he can help me out (he was helping me in coders paradise), lets see what happens =]

I'm pretty excited with this program =]
 
Custom Title Activated
Loyal Member
Joined
May 26, 2007
Messages
5,545
Reaction score
1,314
PUSH comes in several operand sizes IMS, but I would be looking at the C string API calls rather than PUSHes. This is how the International DLL works, if a memory buffer is passed to strcmp or strcmpi then it's probably some form of string. The problem for doing that offline is that what is passed is often an offset to a temporary variable which will hold the offset to one of a number of static string variable at runtime. That's not so easy to trace.

However, there are a couple of key routines in the client, which call TextOutA() or such, and I'm sure they will be the ones patched by Vormavs bitmap font routines. If you find that API, and then find anything which is PUSHed to the stack before CALLing the routine it's in, then you can count those memory addresses as confident ++. ^_^
 
Back
Top