Welcome!

Join our community of MMO enthusiasts and game developers! By registering, you'll gain access to discussions on the latest developments in MMO server files and collaborate with like-minded individuals. Join us today and unlock the potential of MMO server development!

Join Today!

[PHP REGEX] Recursive BBcode with [quote] [solved]

Joined
Jun 8, 2007
Messages
1,985
Reaction score
490
I have an issue with this BBcode parser I've been working on. Specifically with the [ quote="name"] tags (minus the first space). I got it working recursively if no name is used, but when one tries to put a name, it only works 1 tier. If they do the first tier with name="name", and all the inner quotes without a name, it works too.

The problem only comes up when I have [ quote="name][ quote="name"][/quote][/quote].

Rather then doing this,
It does this,
name said:
[ quote="name"] (minus the space)
[ /quote] (minus the space)

I got code and modified it from an example at PHP.net, under preg_replace_callback() function reference. (Their example was for indent, and it did not have two arguments.)
PHP:
if(!function_exists(parseQuotesRecursive))
    {
        function parseQuotesRecursive($input)
        {
            $regex = '#\[quote\=?"?(.*?)"?\]((?:[^[]|\[(?!/?quote\])|(?R))+)\[/quote\]#i';
            if (is_array($input)) 
            {
                $input = '<blockquote style="margin: 5px 20px 20px;">'
                    .'<div style="margin-bottom: 2px;font-size:10px;font-family:sans-serif">'
                    .'Quote'.( (strlen($input[1])>0) ? ' From <strong>'.$input[1].'</strong>' : '' ).','
                    .'</div>'
                    .'<div style="padding:4px;padding-top:0;border:#000 1px inset">'
                    .$input[2]
                    .'</div>'
                    .'</blockquote>';
            }
            return preg_replace_callback($regex, 'parseQuotesRecursive', $input);
        }
    }
    $text = parseQuotesRecursive($text);

It would be nice to have the quote parser for RageZone.. :cool:

Anyone do this before? I searched google, but surprisingly, couldn't find something specific to the 'name="name"' recursive problem.. Or at least not one that was solved yet. :(:

Edit: Silly me, I forgot to include '\=?"?(.*?)"?' in the recursive section, lol.... Now it works great!
PHP:
'#\[quote\=?"?(.*?)"?\]((?:[^[]|\[(?!/?quote\=?"?(.*?)"?\])|(?R))+)\[/quote\]#i'
 
Last edited:
Ginger by design.
Loyal Member
Joined
Feb 15, 2007
Messages
2,340
Reaction score
653
)

Get the dragon book and make an LALR parser.

preg'ing stuff isn't "parsing," it's multi-pass filtering which is horribly inefficient. Your parser should be a one-pass state machine that outputs the resulting HTML that you want.

(Or you could be lame and use BISON but that actually might be MORE involved >.>)
 
Last edited:
Infraction Baɴɴed
Loyal Member
Joined
Apr 9, 2008
Messages
1,416
Reaction score
169
)

Get the dragon book and make an LALR parser.

preg'ing stuff isn't "parsing," it's multi-pass filtering which is horribly inefficient. Your parser should be a one-pass state machine that outputs the resulting HTML that you want.

(Or you could be lame and use BISON but that actually might be MORE involved >.>)
how does vBulletin do it then?
 
Joined
Jun 8, 2007
Messages
1,985
Reaction score
490
I got it working on my own perfectly, btw. I edited it and also fixed the RageZone glitch by adding a backslash; it functions the same.

Copy the function, and replace the regex with the one in my edit.

Yeah, Merlin, Technically I'm not making a "Parser", they just call it that in the documentation of "MarkItUp" (The BBcode Editor). It's just a BBcode->HTML converter, so coding it in PHP and REGEX is fine, and actually loads fairly quickly. It's really cool now, and it works recursively very, very well. Also, I rigged it so the code & PHP boxes have line numbers. Also, if you put PHP inside the 'code' tags, it'll still show PHP.

The BB code is very flexible and scalable- Especially the 'quote' tags.
 
Newbie Spellweaver
Joined
Nov 2, 2009
Messages
54
Reaction score
21
Small remark, function_exists() isn't gonna help you if you don't specify the function name as a string :tongue:

PHP:
if(!function_exists('parseQuotesRecursive'))
 
Ginger by design.
Loyal Member
Joined
Feb 15, 2007
Messages
2,340
Reaction score
653
I got it working on my own perfectly, btw. I edited it and also fixed the RageZone glitch by adding a backslash; it functions the same.

Copy the function, and replace the regex with the one in my edit.

Yeah, Merlin, Technically I'm not making a "Parser", they just call it that in the documentation of "MarkItUp" (The BBcode Editor). It's just a BBcode->HTML converter, so coding it in PHP and REGEX is fine, and actually loads fairly quickly. It's really cool now, and it works recursively very, very well. Also, I rigged it so the code & PHP boxes have line numbers. Also, if you put PHP inside the 'code' tags, it'll still show PHP.

The BB code is very flexible and scalable- Especially the 'quote' tags.

Is it converting before storing or storing then pulling it to display then converting it?

If you're going to be modifying the conversion code a lot, the latter should be used until development is done, after that for a release mode you could do the first. That way all the processing would be done when someone writes a message, and from then on out all that happens is a mysql_query (which can be memcached'd for awesomeness) and an echo. You might play around with caching in the development sense as a thread that's really popular is going to get lots of hits and refreshes, and it makes no sense to re-convert the code over and over if it's not changing.
 
Joined
Jun 8, 2007
Messages
1,985
Reaction score
490
Is it converting before storing or storing then pulling it to display then converting it?

If you're going to be modifying the conversion code a lot, the latter should be used until development is done, after that for a release mode you could do the first. That way all the processing would be done when someone writes a message, and from then on out all that happens is a mysql_query (which can be memcached'd for awesomeness) and an echo. You might play around with caching in the development sense as a thread that's really popular is going to get lots of hits and refreshes, and it makes no sense to re-convert the code over and over if it's not changing.

I plan on having two tables, almost identical. One stores the converted version, for display purposes. The other stores the BBcode, so I can run an update anytime the converter is changed, if necessary. I plan on finishing the converter before I release, too, but anything can happen..

For example, I found a security risk with the URL BBcode. I can use JavaScript in a place I thought I couldn't, like this:
[ url=http://#"target="_self"onclick="alert('hello')]Muahaha[ /url]
I put a patch on the link before, so it works like this:
PHP:
'#\[url\="?((?:ftp|https?)://.*?)"?\](.*)\[\/url\]#si'
I thought by requiring a prefix like 'http' it was safe; I was wrong. Now I have to fix it somehow, when before I assumed it was safe.. Good thing I tried to break it.. :grr: I suppose.. The problem comes up when you call it inside the
Code:
 or [php] tags, since it makes it so you can use quotes and apostrophes. It won't work if you just call the URL on it's on, and also, spaces are still disabled, so it won't output standard XHTML, though the injection works in Firefox.

I assume it needs to be more like the IMG, since the front & back are both controlled. 
[php]
'#\[img\](https?://.*?\.(?:jpg|jpeg|gif|png|bmp))\[\/img\]#si'
[/php]
I don't know for sure, maybe you'll prove it wrong in a second..  It's just a stupid rant anyway, for an example.

If I had already released, that might be a problem where an update is needed, especially since (hypothetically) there might have been existing XSS injections. 

By saving the BBcode, I can fix the existing issues, and still save time by not calling a parser every time I select the data.

Same basic concept, but simpler for my taste.. And you never know when something will come up, like I said, I thought I was safe, and I broke it myself, so yeah.. Things come up sometimes, no matter how confident you are. :8:
 
Last edited:
Newbie Spellweaver
Joined
Nov 2, 2009
Messages
54
Reaction score
21
Well you could try securing it more, but well I don't know if this will catch all kinds of URL's,

#\[url\="?((?:ftp|https?)://[a-zA-Z0-9/\._&-]+[a-zA-Z0-9/\._\#&-]*?)"?\](.*)\[\/url\]#isU

Try to break it though, we're never careful enough against XSS injections ^^
 
Joined
Jun 8, 2007
Messages
1,985
Reaction score
490
Oh, check on my tutorial, I got it fixed ^_^ I used a similar approach, but a few more characters are allowed. I went a researched to see what characters are allowed in the URL, and Viola! (Hope I didn't miss any!)

The point is, things change all the time, and in case another XSS injection comes up, the converted user generated content needs to have a way to be filtered as BBcode again, for whatever reason. (Mostly XSS injections, yes.)

I have a feeling there can be an easy XSS injection with some other tags as well, like the color and anything in a <span> tag... The list tags, probably enable mouse events which in effect, can be cracked.. Anything that enables JavaScript events.. (Pretty much every parameter passed between HTML brackets) can be cracked.

<strong>text</strong> is safe, while <span style="font-weight:(.*)">text</span> would not be safe, since a hacker can just close the quote, add a new attribute, and they're in.. It's nasty.. But none of them work unless quotes work- so that's the key..
 
Last edited:
Back
Top