[Help] Working together to solve this vital channel connect bug

Results 1 to 14 of 14
  1. #1
    Akaruz - The Legend [hidden] is offline
    MemberRank
    Jun 2006 Join Date
    ClassifiedLocation
    1,120Posts

    [Help] Working together to solve this vital channel connect bug

    before reading this i have a note : those who have nothing useful to say, better keep quite and not say anything at all, well exceptions are those who are mentally retarded and cant read this.

    here we go :

    i figured that with everyone elses' opinion, we might arrive at a logical solution to this issue. this is a very critical bug because for the majority, this would mean consistency and a headache free server v.s. giving up in the long run.

    unlike other bugs which we can avoid, this bug is something which everyone will have to pass. other servers whom we would never expect to share info in a million years, have fixed this problem. which is why we have to gather our ideas and help each other to solve this.

    to arrive at a clean log i rebuilt the server from ground zero. and when i checked the worldserver log even before my first login, the error already appeared. i am not 100% sure if this is related to the channel connect bug or if this error itself is already setting the server with an error which eventually causes failure upon channel connection.

    this is the only error in my logs :

    [Sat Jul 26 2008 13:07:27.853215 3086948032]: try listening 38111 port
    [Sat Jul 26 2008 13:07:44.436482 3086948032]: sock 5 ->LinkHandler close
    [Sat Jul 26 2008 13:07:49.441898 3086948032]: [##ERROR##] remoteAddr(127.0.0.1), connectRetryNum_(9,0), serverAddr(127.0.0.1:38170)
    [Sat Jul 26 2008 13:07:49.442713 3086948032]: try connect 127.0.0.1:38170, Sync
    [Sat Jul 26 2008 13:07:49.457828 3086948032]: sock 5 ->LinkHandler Open
    [Sat Jul 26 2008 13:07:49.458712 3086948032]: SendIPSConnect2Svr (161:24:01:00)
    [Sat Jul 26 2008 13:07:49.970767 27077536]: ChannelType changed(0 -> 1)
    [Sat Jul 26 2008 13:22:27.838209 3086948032]: sock 5 ->LinkHandler close
    [Sat Jul 26 2008 13:22:32.839633 3086948032]: [##ERROR##] remoteAddr(127.0.0.1), connectRetryNum_(9,0), serverAddr(127.0.0.1:38170)
    [Sat Jul 26 2008 13:22:32.840713 3086948032]: try connect 127.0.0.1:38170, Sync


    now, i'm sure that everyone else who got their server working has this error in their logs. now is this the one causing the channel connect bug? if so yes, how may we solve this, and what do you understand of this error msg?

    solutions :
    "keyserver" "authserver" "vcode"

    these were suggested, but at the moment we have no guide that makes us understand the link between the error msg, whats causing it, and how these files could fix the problem, nor do we have a guide on how to setup this files.

    I'm sure there are a few reading out here who have ideas.

    so speak up here, and lets try to solve this.


  2. #2
    Account Upgraded | Title Enabled! AMD79 is offline
    MemberRank
    Mar 2006 Join Date
    Sabah, MalaysiaLocation
    290Posts

    Re: [Help] Working together to solve this vital channel connect bug

    I think we need to hex GlobalMgrSvr to remove the auth process... couse from my point of view this is the file that always failed to connect to an unknown server...

  3. #3
    Akaruz - The Legend [hidden] is offline
    MemberRank
    Jun 2006 Join Date
    ClassifiedLocation
    1,120Posts

    Re: [Help] Working together to solve this vital channel connect bug

    ok lets review the basic functions on the various bins as presented in the manual for reference purposes:

    Linux System Application Server Category:
    GlobalDBAgent - ACCOUNT DB communication with the server;
    DBAgent - GAME DB communication with the server;
    GlobalMgrSvr - connecting GDBA, LoginSvr, WorldSvr, (Server Monitor) communication between the server;
    LoginSvr - and the audience communication, authentication server log;
    WorldSvr - game server
    ChatNode - chat server

    Port information:
    GlobalDBAgent DBAgent GlobalMgrSvr LoginSvr WorldSvr ChatNode
    38180 38181-38189 38170 38101-38109 38111-38119 38121

  4. #4
    The Dinosaur chumpywumpy is offline
    MemberRank
    Jun 2008 Join Date
    /f451/Location
    5,127Posts

    Re: [Help] Working together to solve this vital channel connect bug

    My theory is that we are missing a process and a database. I found another DB called CabalLoginSvr (trouble is i can't remember where it got it).

    This script will create it (2005 format as that's all i have on this machine):
    Code:
    USE [LoginSvr]
    GO
    /****** Object:  Table [dbo].[CabalLoginSvr]    Script Date: 07/26/2008 12:58:05 ******/
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    SET ANSI_PADDING ON
    GO
    CREATE TABLE [dbo].[CabalLoginSvr](
    	[CabalLoginSvrid] [int] NOT NULL,
    	[CabalLoginSvruser] [varchar](50) NOT NULL,
    	[account] [char](10) NOT NULL,
    	[u://account//CabalLoginSvr;192.168.0.1] [char](10) NOT NULL,
     CONSTRAINT [PK_CabalLoginSvr] PRIMARY KEY CLUSTERED 
    (
    	[CabalLoginSvrid] ASC
    )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
    ) ON [PRIMARY]
    
    GO
    SET ANSI_PADDING OFF
    Not sure if this helps any as i have note really done anything with it yet.

    EDIT: The other thing i have found is a "GlobalDBAgent" in some releases including the /usr/bin folder at http://www.arcadianpride.com/CabalServer/Home/cabal/. We all use links to point GlobalDBAgent to DBAgent but this has been posted in a few places which makes me curious.

    Here are the sizes and CRCs of the files i have. The -1 files are the original ep1 files taken from cabal.rpm (or 1.0-2 if you look at it's actual version), the -2 files are the ones we copy over the top (the ep 2 binaries) and the -eh/-eh2 files are this extra GlobalDBAgent i have which seems to be a copy of the ep1 DBAgent.

    Code:
    ; File                 Size (Bytes)    Time/Date
    ; -----------------    ------------    ---------
    ; ChatNode-1           600,768         12:53:54 26/07/2008
    ; ChatNode-2           701,832         12:52:29 26/07/2008
    ; DBAgent-1            621,552         12:53:54 26/07/2008
    ; DBAgent-2            842,584         12:52:29 26/07/2008
    ; GlobalDBAgent-eh     621,552         13:08:50 26/07/2008
    ; GlobalDBAgent-eh2    621,552         13:09:18 26/07/2008
    ; GlobalMgrSvr-1       293,760         12:53:54 26/07/2008
    ; GlobalMgrSvr-2       307,984         12:52:29 26/07/2008
    ; LoginSvr-1           215,916         12:53:54 26/07/2008
    ; LoginSvr-2           362,976         12:52:29 26/07/2008
    ; WorldSvr-1           1,402,000       12:53:54 26/07/2008
    ; WorldSvr-2           2,353,916       12:52:29 26/07/2008
    ;
    ChatNode-1 1292FBB1
    ChatNode-2 CFCD71E5
    DBAgent-1 8C69B727
    DBAgent-2 522E4948
    GlobalDBAgent-eh 8C69B727
    GlobalDBAgent-eh2 8C69B727
    GlobalMgrSvr-1 AF4AF67D
    GlobalMgrSvr-2 1371F4AA
    LoginSvr-1 1C708479
    LoginSvr-2 C41356F8
    WorldSvr-1 41C91FF7
    WorldSvr-2 BD89F464
    Hopefully i will get a bit of time to try using this older DBAgent in place of GlobalDBAgent later today as GlobalDBAgent does seem to be the cause of the problems as others have pointed out.

    Still, hexing any "keyserver" junk out of it would be better as AMD79 says. This is assuming a "keyserver" actually exists and certain people (who i won't name) are not just lying to try and confuse the community and feel all superior as their server doesn't have the problem.

  5. #5
    Akaruz - The Legend [hidden] is offline
    MemberRank
    Jun 2006 Join Date
    ClassifiedLocation
    1,120Posts

    Re: [Help] Working together to solve this vital channel connect bug

    i also have the db, but I'm not sure which file is using it atm.



    on second glance at my log with the server i rebuilt from ground zero, i noticed this error on the globalmgrsvr:


    [Sat Jul 26 2008 13:06:47.843163 3086931648]: try listening 38170 port
    [Sat Jul 26 2008 13:06:47.851804 3086461856]: CProcessLayer(0)::svc Start [3086461856]
    [Sat Jul 26 2008 13:06:48.970073 3086931648]: accept success 10(127.0.0.1:32770)
    [Sat Jul 26 2008 13:06:48.972247 3086931648]: 1.open user 10 [86B5B80] (127.0.0.1)
    [Sat Jul 26 2008 13:06:50.353387 3086931648]: accept success 11(127.0.0.1:32772)
    [Sat Jul 26 2008 13:06:50.355699 3086931648]: 2.open user 11 [86BBFE8] (127.0.0.1)
    [Sat Jul 26 2008 13:06:50.381677 3086461856]: new server session (128:01)
    [Sat Jul 26 2008 13:06:52.343668 3086931648]: accept success 12(127.0.0.1:32774)
    [Sat Jul 26 2008 13:06:52.345451 3086931648]: 3.open user 12 [86C2250] (127.0.0.1)
    [Sat Jul 26 2008 13:07:42.397154 3086461856]: [##ERROR##] OnIPCNFYUserCnt0(): pUserDataCtx(bServerIdx:0, bGroupIdx:0), pIPSNFYUserCnt0(bServerIdx:24, bGroupIdx:1)
    [Sat Jul 26 2008 13:07:42.397850 3086461856]: [##ERROR##] 'OnIPCNFYUserCnt0' fail (Proc/Global.cpp:21)
    [Sat Jul 26 2008 13:07:42.398141 3086461856]: [##ERROR##] UsrMap Fail : MainCmd(52) Ret(15:0:21) Addr(127.0.0.1)
    [Sat Jul 26 2008 13:07:44.399623 3086931648]: sock 12 ->handle_timeout

    this and the first error however doesn't cause any of the bins to crash, it might probably (directly or indirectly) have something to do with the difficulty issue in entering channels

  6. #6
    searching... chacina is offline
    MemberRank
    Jan 2005 Join Date
    wwwLocation
    272Posts

    Re: [Help] Working together to solve this vital channel connect bug

    i think is here... connection timeout, i think hang or something like that...
    login connect fast
    world not
    or need that server monitor :P
    Attached Thumbnails Attached Thumbnails cabal2.jpg  

  7. #7
    Enthusiast bestfood is offline
    MemberRank
    Nov 2004 Join Date
    ThailandLocation
    25Posts

    Re: [Help] Working together to solve this vital channel connect bug

    [Sun Jul 27 2008 09:56:31.270468 3085523872]: [##ERROR##] 1:2 server notify time over (time: 65375)
    [Sun Jul 27 2008 09:56:31.276686 3085523872]: 8.close user -1 [9A66EA8] (127.0.0.1)
    [Sun Jul 27 2008 09:56:36.286186 3085993664]: accept success 17(127.0.0.1:39777)
    [Sun Jul 27 2008 09:56:36.286913 3085993664]: 8.open user 17 [9A66EA8] (127.0.0.1)
    [Sun Jul 27 2008 09:56:45.886195 3085523872]: [##ERROR##] OnIPCNFYUserCnt0(): pUserDataCtx(bServerIdx:0, bGroupIdx:0), pIPSNFYUserCnt0(bServerIdx:1, bGroupIdx:2)
    [Sun Jul 27 2008 09:56:45.886368 3085523872]: [##ERROR##] 'OnIPCNFYUserCnt0' fail (Proc/Global.cpp:21)
    [Sun Jul 27 2008 09:56:45.886395 3085523872]: [##ERROR##] UsrMap Fail : MainCmd(52) Ret(15:0:21) Addr(127.0.0.1)
    [Sun Jul 27 2008 09:56:47.975713 3085993664]: sock 17 ->handle_timeout
    [Sun Jul 27 2008 09:56:47.975947 3085993664]: 8.close user -1 [9A66EA8] (127.0.0.1)
    [Sun Jul 27 2008 09:56:53.071281 3085993664]: accept success 17(127.0.0.1:39786)
    [Sun Jul 27 2008 09:56:53.071388 3085993664]: 8.open user 17 [9A66EA8] (127.0.0.1)
    [Sun Jul 27 2008 09:56:53.073477 3085523872]: new server session (01:02)

  8. #8
    Akaruz - The Legend [hidden] is offline
    MemberRank
    Jun 2006 Join Date
    ClassifiedLocation
    1,120Posts

    Re: [Help] Working together to solve this vital channel connect bug

    After a while, the error boils to GlobalMgrSvr, it communicates and sends your login to the world. the problem is, : at a very short time it allow and communicates, then fails. this is the time you are not able to login. it can communicate with login and world, but world and login often cannot communicate to it because of a problem.

    idea:

    the fix is a globalmgrsvr that can run in windows, which functions the same as globalmgrsvr in linux. communication in configs will be redirected to this globalmgrsvr with the fix, and solve the problem.

  9. #9
    Alpha Member WulfgarXX is offline
    MemberRank
    Jan 2007 Join Date
    1,507Posts

    Re: [Help] Working together to solve this vital channel connect bug

    I was told that we're missing another server side program...

  10. #10
    The Dinosaur chumpywumpy is offline
    MemberRank
    Jun 2008 Join Date
    /f451/Location
    5,127Posts

    Re: [Help] Working together to solve this vital channel connect bug

    I'm pretty convinced it is the server monitor that is missing.

    Code:
    [##ERROR##] OnIPCNFYUserCnt0(): pUserDataCtx(bServerIdx:0, bGroupIdx:0), pIPSNFYUserCnt0(bServerIdx:24, bGroupIdx:1)
    I'm pretty convinced that there should be a master process with serveridx 0 and group 0, which would be the server monitor or something to do with the cabal_managerdb, which is the link that is failing. Trying to set one up with 0,0 doesn't work well though as you get invalid config errors in the logs.

    Given that certain servers

    That LoginSvr database isn't going to be anything to do with it. There are a lot of mentions of "version control" and sourcesafe related things and i think the db is part of the version control system EST use for managing the login server sourcecode. Most of the tables and stored procedures exist elsewhere in our databases too.

  11. #11
    Proficient Member 13n00b37 is offline
    MemberRank
    Feb 2008 Join Date
    SGLocation
    173Posts

    Re: [Help] Working together to solve this vital channel connect bug

    That's right, but I don't think its a server monitor. The guy who released these files in the first place mentioned ripping his quick-fix verification process from the zhengtu online source code. If it's the same as silkroad online, then these files connect through that process which links to the database.

  12. #12
    Valued Member SAUR0N is offline
    MemberRank
    Sep 2004 Join Date
    144Posts

    Re: [Help] Working together to solve this vital channel connect bug

    use higher log level and parse logs like this..
    # tail -f /var/log/cabal/WorldSvr_01_36.log | grep -v "process command: 3" | grep -v "PutPacket" | grep -v "ITC_TIMERINTRPT" | grep -v "dequeues" | grep -v "getq EW"

    worldserver connects somewhere and sometimes get some bytes and others not..i sniffed globalmanager and dbagent but worldserver doesn't send anything there in the timeout preiod..so it must be connecting/trying to connect somewhere else.

    this is when it works (i don't have the logs around, that's what i found in a msn log.. maybe the sock 5 was a gms check it does all the time, anyway in the 2do code you can see the timeout)
    [Sat Aug 2 2008 18:21:17.479737 3082135776]: sock 11=> handle_output
    [Sat Aug 2 2008 18:21:17.479839 3082135776]: sock 11=>send 18 bytes, ret = 18 bytes
    [Sat Aug 2 2008 18:21:17.646384 3082135776]: sock 5=>handle_input
    [Sat Aug 2 2008 18:21:17.646418 3082135776]: sock 5=>recv wait 7947 bytes, ret = 106 bytes
    [Sat Aug 2 2008 18:21:17.646449 3082135776]: ExtractPacket : recvBuf Size(106)

    and this when it doesn't
    [Sat Aug 2 2008 18:22:08.014671 3082135776]: sock 11=> handle_output
    [Sat Aug 2 2008 18:22:08.014714 3082135776]: sock 11=>send 18 bytes, ret = 18 bytes
    [Sat Aug 2 2008 18:22:47.085274 3082135776]: sock 11=>handle_input
    [Sat Aug 2 2008 18:22:47.085325 3082135776]: sock 11=>recv wait 8178 bytes, ret = 0 bytes
    [Sat Aug 2 2008 18:22:47.085357 3082135776]: sock 11=>recv error(Success)

    i couldn't found where is it connecting, once that done it's easy to fix..

  13. #13
    Valued Member CrazyArcad is offline
    MemberRank
    Apr 2007 Join Date
    USALocation
    117Posts

    Re: [Help] Working together to solve this vital channel connect bug

    That release that you found at http://www.arcadianpride.com is my website... lol the GlobalDBAgent file was included with at the release files that i had gotton from asdsf.com...

  14. #14
    Am i? ScriptKid is offline
    MemberRank
    Oct 2006 Join Date
    $Location="??";Location
    1,810Posts

    Re: [Help] Working together to solve this vital channel connect bug

    i had also the problem with these channel login, my theory was after joining the server, the connectsrv restart then restarts.. there is a missing server side program here. if i got any update from the server, ill help you guys.. right now stick together in this problem..



Advertisement