Beta Technical Demonstration - Translation without Hexing the EXE
The attached links contain 1 executable (derived from Butchered 2b) and 3 DLLs, plus a folder containing 009.lng, which is a text file example which begins (with the help of the DLLs) to translate Butchered into English. (language code 009) The latest release also includes a script to switch between Release and Logging versions of the translation DLL.
If you are Brazilian or Spanish or German, your default system language code will not be 009, and you should create a new file for your language, this version of the EXE should only accept a language file which is default for your system, otherwise it will revert to the normal Korean text. -- Oh! And it only takes any notice of the primary language so it doesn't matter if it's British English, US or Canadian English, Australian English or English International... they are all 009. The same should apply to French French, Canadian French and South African French or Portuguese Portuguese or Brazilian Portuguese etc.
Extract all the files into a working client setup. Preferably a >= 197x one, but that's up to you.
Originally 2 of the three DLLs had an extra extension. Either '.rel' or '.dev'. You need to delete the extension from the end of one or the other of these.
In the current release the files are "IntStr[Release].dll" and "IntStr[Logging].dll", and the script "SetLogging.cmd" should allow you to chose which one to make active. (some reports that it doesn't present the question, I can't imagine why as the syntax is valid from DOS 5 [where the extension would have to be .bat not .cmd] to Win7)
The '.dev' or [Logging] version will slow the game down considerably, as any string based API is not only patched, but also the string and it's location in memory are logged to a file called "StringDbg.txt"... this is to help you locate the strings you want to patch into your own language.
The original log was really only Dev friendly, but the latest release produces a file which should import into spreadsheets easily and therefore allow you to remove duplicate lines and identify string address offsets easily.
Import settings in OpenOffice.org Calc:-
I'm not using OOo because I don't have to pay for it, read why I am in the spoilerLast version of MS Excel I *really* used (2003) had something very similar, and so does Gnumeric, and Lotus 123, so I'm sure you can figure it out for whatever you use.And how the file looks directly after importing into spreadsheet.
I've only translated a few strings in the 009.lng file, as I'm more concerned with getting the libraries working flawlessly and then getting an executable which uses them more cleanly. However, I don't expect the format of the .lng files to change much, so this is an opportunity for you guys to start building up and sharing some .lng files which you would like to have implemented.
Now... if you are using German or Spanish, or especially if you are trying for Chinese or Japanese, you will want to patch the code page passed to CreateFont() in the executable, before you do anything else. Ultimately I hope to implement a "fix" in the executable which will pass CreateFont() the default system codepage, but for now you will have to do this manually your self. (I wrote a guide to doing that a long, long time ago.)
Also, if you look at the log file you will see that this system not only shows where each string is located in memory when it is used, but also shows up many many strings which are never displayed. Firstly all the strings in the configuration files, 'hotuk.ini' and "ptreg.rgx" and such, but also the strings which represent the resource files, meshes, animations, textures and sounds. You may well want to customise these, and (with care) you can do so using this method, but if you do start working along those lines, please keep those changes clearly defined at one end or the other of your .lng file, and cut those changes before sharing it... that is likely to be a highly personal preference.
Through the discussions here, I will probably implement reading from a Global.lng file as well as your language, so changes for your server (regardless of the users language) can be made there.
Smart folks might like to comment on "how this system works" and continue to "suggest" alternatives... I'm still open to that, and working hard on trying out different ways of achieving the same thing. But as I say, I have not made any major change to the format of the .lng files in a couple of months, [strike]and feel that it is pretty much as standard as it can be[/strike]. I'm likely to implement fully sectioned Windows .ini or .inf style of .lng files in the near future to allow for colour and font face / style changes in displayed text, as well as altering locations and alignment.
I've implemented it as 2 DLLs (in the end, since you have a choice of slow developer logging IntStr.dll or fast(er) release IntStr.dll) because I wanted to have a high level language to rapidly develop reading the .lng file and parsing a list of addresses looking for a match and passing back a pointer to an alternative string, where the 'International.dll' is written in pure assembler, partly because, by reading the return address off the stack, it can see what code is looking for a certain string, and can modify the API call even more depending on where this API was called from, and partly because only by using assembler do I have the ability to take off only the arguments I'm interested in altering and then pushing them back on the stack in the correct order before directly calling the API.
Ultimately, I hope to make them one DLL all written in C(++). Right now, the high level language used to produce IntStr.dll (Both versions from the same source under conditional compilation) is written in freeBASIC, and that will certainly change to Visual C++ 6 - 9. (already started)
As the example 009.lng file states, don't worry about alignment of text and text overlap too much. I will fix that in International.dll as time goes on. Also, please don't worry about the messy way I'm importing these DLLs into the game... that will be changed once things settle down a bit. It's just a RAD shortcut, but it has meant changing the OEP of the main executable, and disabling the 'write' protection on the .rdata section of the PE. (I'm sure you know me well enough to know I won't allow that to remain the case for long.) I will ultimately link the DLL and it's APIs into a new Import table which will completely replace the one in .rdata.
Other than that, all feedback is welcome. Once the port to C++ is complete (which will only happen once the complete feature set and .lng file format is set in stone) I will open the source to you so you can implement any encryption, signing or compression you think is important to you.
So far, although I can see a couple of "risks" with this method, (user skinning etc. which they do anyway) I can also see it saving an awful lot of people a heck of a lot of time and effort.
New links for latest developments:-
MEGAUpload
DepositFiles
Hotfile
zShare
Uploading
or
All these links thanks to Multiupload.
File Scanned by Virus Total for your safety. Please confirm MD5 checksums etc of the file you receive against those VT list.
And here's the original links:-
SendSpace
MediaFire
Let me know if you need others. :wink:
--- EDIT ---
When the links died I attached a backup from 18th Dec 2010. But I attached to the wrong post. :doh:
Attachment 96768
Re: Beta Technical Demonstration - Translation without Hexing the EXE
very nice!
---------- Post added at 11:15 AM ---------- Previous post was at 10:15 AM ----------
One question. How can I translate that strings that appear together with the Tribe, Level, Charname in the login screen?
String @6053916 = '%s' not replaced.
String @1309792 = '템스크론' not replaced.
I Belive this one is the one for the Class, But Idk how to proceed.
---------- Post added at 12:05 PM ---------- Previous post was at 11:15 AM ----------
String @6091092 = '%d' not replaced.
String @1309792 = '73' not replaced.
String @53923952 = 'aaaaaaaaa' not replaced.
String @6053916 = '%s' not replaced.
String @1309792 = '템스크론' not replaced.
String @6053916 = '%s' not replaced.
String @1309792 = '파이터' not replaced.
aaaaaaaaa = my char
73 - his level
Re: Beta Technical Demonstration - Translation without Hexing the EXE
Thread teleported from Priston Tale Releases to main page O.o
Are you going to develop this? This is actually cool idea, from translation through item table and ending on PT GUI/SKIN language selector :)
Re: Beta Technical Demonstration - Translation without Hexing the EXE
Yes, I'd like to keep this going, its very easy than hexing. And faster. I think bob should develop some kind of encription, it could be XOR Key based, otherwise, players could easily mess up with the texts.
1 Attachment(s)
Re: Beta Technical Demonstration - Translation without Hexing the EXE
Thanks for the recent input guys.
It's not in release as I'm working in ways which are quick to develop and easy to change, but not good for a release... I wanted peoples input, especially people who want Br or Vn type translations... or even Fr, De, Es, No etc. which don't normally come to PT.
If (l)users want to skin their PT so that it's full of rude words or all in l33t5P34k, I don't think they gain any advantage in that from the server's point of view.
If item and level tables where handled as well, there would have to be some method of "signing" the files with a private key, that much is for sure.Lelejau asked earlier about '%s' there is no point in replacing... I've said before that that is stupid code where they char* pointer = wsprintf('%s', szptrRealString); TextOut(pointer); When they should just TextOut(szptrRealString);
The problem with 1309792 = '템스크론' and 1309792 = '파이터' is that they seem to be in the same location... fact is they are transferred there with a memcpy() instead of a strcpy(). (difference is "character" datatype copy or "byte" datatype copy)
I'm not sure if I should patch memcpy() as well, or just replace the memcpy() instructions in the game with strcpy() calls.
memcpy() doesn't check for null characters (end of string) and stuff, so it's a bit faster. strcpy() can cope with strings of any length.
I think I should find a way to patch TextOut() APIs to allow custom text location adjustments for different languages. (centre stuff up, align it and as on)
Re: Beta Technical Demonstration - Translation without Hexing the EXE
So, you think you can handle it bob?
Re: Beta Technical Demonstration - Translation without Hexing the EXE
I can... I could use some feedback from people, such as you've given here.
Different people will want different translations, and it takes a lot of in-game testing to find how all the strings appear in the debug log (or, don't appear in any unique way).
If there are too many of them, patching memcpy() would probably be good, if only a few (like the login screen) modify the exe to use strcpy() instead of memcpy(). But I need play-testers and translators to give a quantitative assessment.
I also needed to know I wasn't just doing this for my-self. It's not worth the effort if it's only me that's ever going to use it. I don't run an open server, I just develop. So there's no point me developing something nobody will use.
Once we have translation files for different languages, and can prove the API patches work, I can replace the import table completely and lock down the memory sections I unsecured to make development quicker. Replaced import tables means I don't need code to modify API vectors, so no slow down, and no need to remove Write Protection in the import table, so no risk of overwriting constants etc.
When I'm happy I'm not prototyping any more, but producing a practical final release implementation, the slow down that Vormav is concerned about should be minimal... and I can already see speed gains that can be achieved at the same time, as well a memory requirement saves to counter the extra load placed by the DLL(s) and their temporary data. (you don't need to keep the Korean strings to fall back on etc)
In short, once I have enough test data from people developing different languages, and can be sure that the solution doesn't need any more tweaking to fit different scenarios, I can re-implement the design as if it where always a part of the client.
I think TextOut() should be patched too, with offsets against the original location... that will help the Chat Overlap problem in different languages. You can also use TextOut() with centred text for "Connecting to server", "Connection failed" etc. but I'd have to store another, fast access lookup table for such changes and decide how to store those details. + / - X and Y in pixels... probably, applying different justification? Not sure.
Something like a [Positioning] section with Address = x, y, j where j = "l", "r" or "c" maybe? Or maybe AddressX = int, AddressY = int, AddressJ = char. That would make a bigger translation file, which would require less logic in the dll to store the info in an array or linked list.
So there is still design considerations... and actual input from admins and other devs would be much appreciated on such issues.
Re: Beta Technical Demonstration - Translation without Hexing the EXE
I guess I am doing thing you are doing right now but in different way (much simpler).
Would it not be better to add ENCODING and FONT to translation file:
ISO 8859-1 Western Europe
ISO 8859-8 Hebrew
etc.
and font ARIAL, Tahoma etc.
So code can patch this and translation will be added with right encoding and font.
Anyway some stable method need to be created so people will start porting this into they servers.
Re: Beta Technical Demonstration - Translation without Hexing the EXE
I could test it. I want to use. I did something like that in Delphi, using api hook, hooking the TextOut() function. It was OK, but some messages, like the welcome message was very weird....
Something like this happened:
The right:
"Welcome to MyPT.
The boostime is x:14."
With my dll:
"Welcome do MyPT (a werid character here ) The bosstime is x:14.
The bosstime is x:14."
So I stopped with it. I want to try your method. What kind of feedback do you need? I already gave you one. If you can fix that, and release it here, so there, I can start again with it and give you more feedback.
Re: Beta Technical Demonstration - Translation without Hexing the EXE
@Vormav: Yes... absolutely. I had already considered that the language .ini files should have font encoding details, but I have demonstrated how to do that in Olly, so if people need a specific codepage to start making translations, they can do that.
I also missed replying to one of your earlier comments
Quote:
This is actually cool idea, from translation through item table and ending on PT GUI/SKIN language selector :)
Yes, the strings for the location of TGA and BMP files come up in the debug versions log, and you can actually point different languages to different folders with those strings... it's not easy to do it for just 1 or 2 files though, and usually means you need to keep 2 folders full of every texture in them if you want to support 2 languages with appropriately "skinned" GUI. XD
Most useful... it can point to different folders for .sin files. Very important for multi-lingual client.
@lelejau: That's great! Actually, the best feedback I could have is if people would try adding as many translations to their own .ini files as possible, and comment out ones (like Clan, Tribe, Skill etc) which are problematic.. and also comment what issues they present if you try them that way.
This is why I included commented out "bad examples" and commented "better" or "good" examples of how to use it. So you can see how you can comment things you'd like to achieve, or would like better implemented.
Also, listing the Debug versions log lines of strings you'd like to replace but are finding problematic.
This gives a quantitative assessment of where issues lie, and gives me an idea what I need prioritise on, and what bias I should place on optimisation... do I keep size low, or number of patches low, or intrusion low... all these things are a matter of "balance".
@all: I'll say again, I didn't put this in the release section because it's a development idea that I've proven can be worked, but haven't completed.
I'm still designing this development... the implementation isn't what I want to release, but it's the quickest way for me to test and prove theories.
If I was working as my Tutors taught me, I'd have a program design before I began... but we all know PT development can't happen like that. I can whip up prototypes like this, prove they work and refine them to the point where I have a design, and then implement it properly.
That proper implementation will be released with source code... I hope the source will be in C. This release has 1 Assembler DLL, and one Basic (freeBasic) DLL, but I have a folder which builds in MS Visual C++ with project files for versions 6 - 9 inclusive. (VC6, VC .Net 2003, VC 2005 & VC 2008)
This should allow each private server to implement extra securities, signing, compression, encryption etc to their own tastes. While still having a common base which we will all understand when new server devs come here asking questions about it... as they do about QFs idea of putting the IP in the executable, and many of us hide, encrypt, move, compare or do other things to that IP for our own server.
I really hope this is clear. What I've done and released is for speed of, and for feedback for the design. It does slow the client down (because it's a prototype full of beta debug info and dynamic patches that wouldn't be necessary in a fixed final release), it isn't secure to release to players (sometimes it memory leaks, and players could easily put ridiculously long strings where very short ones should be) it's not at all complete or how a final version should be... it's just a proof of concept, that needs your input to get a final design and proper release.
Thank you, especially to those who have posted their ideas and desires here.
Re: Beta Technical Demonstration - Translation without Hexing the EXE
Thank you, to show all you are capable of again. And good luck in this project. I think we're all anxious for it, at least I am.
I want to help as I can. feedbacking etc.
Re: Beta Technical Demonstration - Translation without Hexing the EXE
This is great project but I am concern about SPEED.
With few records it is OK but what will happen when you will patch 200+ records?
It would not be faster if you put that file in memory and than "push" new translation offsets there?
One file that translate inventory, .sin files, messages and many other things would be awesome! GL!
Re: Beta Technical Demonstration - Translation without Hexing the EXE
But if you do that, all about this project translating without hexing de EXE would be invalidated, dont you think?
I cant see any slow down in my client, only when I use the dll to write the strings into the exe.
And BOB, another ideia, to put in the txt is this:
We can see alot of the same string. Example:
[quote]
String @6191640 = '%s%d.%d.%d' replaced with 'Ver: 1.99.7'
String @1309984 = 'Ver: 1.99.7' not replaced.
String @6186476 = '|' not replaced.
String @53940384 = '' not replaced.
String @6191640 = '%s%d.%d.%d' replaced with 'Ver: 1.99.7'
String @1309984 = 'Ver: 1.99.7' not replaced.
String @6186476 = '|' not replaced.
String @53940384 = '' not replaced.
String @6191640 = '%s%d.%d.%d' replaced with 'Ver: 1.99.7'
String @1309984 = 'Ver: 1.99.7' not replaced.
String @53940384 = '' not replaced.
String @6191640 = '%s%d.%d.%d' replaced with 'Ver: 1.99.7'
String @1309984 = 'Ver: 1.99.7' not replaced.
String @53940384 = '' not replaced.
String @6191640 = '%s%d.%d.%d' replaced with 'Ver: 1.99.7'
String @1309984 = 'Ver: 1.99.7' not replaced.
String @53940384 = '' not replaced.
String @6191640 = '%s%d.%d.%d' replaced with 'Ver: 1.99.7'
String @1309984 = 'Ver: 1.99.7' not replaced.
String @53940384 = '' not replaced.
String @6191640 = '%s%d.%d.%d' replaced with 'Ver: 1.99.7'
String @1309984 = 'Ver: 1.99.7' not replaced.
String @53940384 = '' not replaced.
String @6191640 = '%s%d.%d.%d' replaced with 'Ver: 1.99.7'
String @1309984 = 'Ver: 1.99.7' not replaced.
String @53940384 = '' not replaced.
String @6191640 = '%s%d.%d.%d' replaced with 'Ver: 1.99.7'
String @1309984 = 'Ver: 1.99.7' not replaced.
String @53940384 = '' not replaced.
String @6191640 = '%s%d.%d.%d' replaced with 'Ver: 1.99.7'
String @1309984 = 'Ver: 1.99.7' not replaced.
String @53940384 = '' not replaced.
String @6191640 = '%s%d.%d.%d' replaced with 'Ver: 1.99.7'
String @1309984 = 'Ver: 1.99.7' not replaced.
String @53940384 = '' not replaced.
String @6191640 = '%s%d.%d.%d' replaced with 'Ver: 1.99.7'
String @1309984 = 'Ver: 1.99.7' not replaced.
String @53940384 = '' not replaced.
String @6191640 = '%s%d.%d.%d' replaced with 'Ver: 1.99.7'
String @1309984 = 'Ver: 1.99.7' not replaced.
String @53940384 = '' not replaced.
String @6102924 = '/도우미' not replaced.
String @6191640 = '%s%d.%d.%d' replaced with 'Ver: 1.99.7'
[/code]
It would me easier if it checks:
Code:
If Exists(string) Then
do not write in txt
Else
WriteTXT(string)
End If
:P
Re: Beta Technical Demonstration - Translation without Hexing the EXE
Quote:
Originally Posted by
Vormav
This is great project but I am concern about SPEED.
With few records it is OK but what will happen when you will patch 200+ records?
It would not be faster if you put that file in memory and than "push" new translation offsets there?
One file that translate inventory, .sin files, messages and many other things would be awesome! GL!
Ahh... this is the failing of not showing the source code. What you suggest I do to Speed it up, is what I it does do already.
The number of patches (or active lines in the ini / lng file) is not a big problem. I'm iterating an array of DWords looking for a match... yes, the more DWords the longer it will take to get to the end of the array if a match isn't found, but it's not a big thing, compared to the time it takes to render the text on screen.
No code is patched, except the existing APIs overridden, and it's not like I'm even looking at the strings until I can find an address match.
Right now the array isn't sorted... so you can put the most frequent string patches near the front, and they will operate the fastest.
Basic (the language) dynamic arrays are not the most efficient of memory usage algorithms but they are very easy to program... that's why speed is not great at this point. A C++ linked list should be faster, I can optimise in assembler if the compiler produces crap code which I can't easily do in Basic and would be better off writing 100% pure assembler if it weren't for the availability of C, which is often half way between Assembler and languages like Basic, Fortran, Perl, Ruby etc.
Other than that, I could also write a custom memory storage class in C++... break the list into ranges of patches so I have lots of small arrays / linked lists. If I use a pure static array, then everything is very fast, but will also take up an heck of a lot of memory when the game is running and mean I need a fixed maximum length of string... so I want to avoid that.
Needless to say... this method is faster than the MUI method Microsoft recommends now, or the "strings" resources that they used to recommend, by a long long way, and for many reasons.
Quote:
Originally Posted by
lelejau
But if you do that, all about this project translating without hexing de EXE would be invalidated, dont you think?
Nope... because the .lng file may have options added, but I don't plan on taking them away... even if small changes are made, it would still be only a matter of Search & Replace to update your .lng to a new format.
What is "in flux" is not interception point, but how it's intercepted, what is intercepted (to I catch TextOut() and memcpy() too?) and extra features. (like the font creation routines)
Quote:
Originally Posted by
lelejau
I cant see any slow down in my client, only when I use the dll to write the strings into the exe.
I agree... if I'm not using the debug logging version I see little slow down... but Vormav is correct that any string address not listed in the language file will be slowed by checking it against every possibility in the array, and when the array has more than 16K patches that will be quite some time difference. Less than 4K patches... not so much.
Quote:
Originally Posted by
lelejau
And BOB, another ideia, to put in the txt is this:
We can see alot of the same string. Example:
It would me easier if it checks:
Code:
If Exists(string) Then
do not write in txt
Else
WriteTXT(string)
End If
:P
It would make it much harder and slower, for the exact reason Vormav has pointed out. That will create a very large array that has to be checked, written and rechecked each time a string is processed. It actually takes less time to just write it out to the log.
If I do DebugPrint() instead of file write, that is faster (I know, I tried) but that requires that people have tools and know how to handle the, normally invisible, Debug stream from a program... and PT 1977 creates DebugPrint logs anyway, so it would also have those logs jumbled up in it.
In case you're not quite with me as to why your method would be so much slower. You are requiring that I keep a separate array of addresses that have already had their action logged, which should be parsed each time a potential log entry might be made. That array will get really big, really quickly, and will have to be checked for each log line.
The simple solution to the long and repetitive log list is to load it in your spreadsheet, sort it (either on address or on string) and tell the spreadsheet to remove duplicate rows during analysis. I hope the problem isn't that the file becomes so big that it fills your hard drive... these devices usually have pretty high capacity and are reasonably inexpensive... still, it's just text, and for a little overhead, you can create the file with NTFS compression if that is an issue. (I suspect it isn't) Repetitive text information compresses very very well, even with the simplest of compression techniques. (Okay, RLE is probably too simple, but Squeeze, Deflate or LZH are all excellent... I believe NTFS Compressed files are LZ or LZ0 which is ideal and the reason it was implemented, to allow big logs on NT servers not to take up lots of disk space)
Re: Beta Technical Demonstration - Translation without Hexing the EXE
Quote:
Originally Posted by bobsobol
It would make it much harder and slower, for the exact reason Vormav has pointed out. That will create a very large array that has to be checked, written and rechecked each time a string is processed. It actually takes less time to just write it out to the log.
Yes, I didn't think about that. But no problem. :) Bob, when do you think you can release a second version to us? I really want to try out and give you more feedback.