On Thu, Jul 31, 2008 at 11:10 PM, Daniel Friesen <dan_the_man(a)telus.net> wrote:
We're escaping for content, not escaping for
attributes (attribute
escaping should be handled by different code). So does anyone remember
the parameters of htmlspecialchars?
http://ca.php.net/htmlspecialchars
string **htmlspecialchars** ( string $string [, int $quote_style [,
string $charset [, bool $double_encode ]]] )
($charset since 4.1.0; $double_encode since 5.2.3)
You know that you can use:
$text = htmlspecialchars( $text, ENT_NOQUOTES );
And the quotes won't be encoded.
Yes, but something like
html > body { color: red; }
will still break. You miss the point, I think. *Nothing* should be
encoded inside <script> or <style>, if you want to remain compatible
with HTML.
Though personally... When I make a sanitizer I go for
what it's meant to
do. Thing like my cleanHtml are meant to make things safe, not escaping
of things.
They're meant to make things not just safe but valid. This requires
escaping everything that has a special meaning.
So on that, my sanitizers only convert < and >
into < and > they
don't do any other encoding, and they don't double encode the entities
for <>. Cause the point is to make the syntax so that it won't be
considered evil html. And only <> needs to be escaped for that purpose.
Quotes also need to be escaped if there's any possibility you'd be in
an attribute. And & must always be escaped for normal HTML output if
you want to ensure validity, which we do.