After short investigation the answer is pretty straight forward and
explained in
https://bugzilla.mozilla.org/show_bug.cgi?id=839023
quoting:
U+0000-U+001F are illegal in HTML 4.0 and XML 1.0 (except the
characters HR, LF and CR). And it's not permitted to use named
character references such as  either (although it is permitted
in XML 1.1, except for NUL):
http://www.w3.org/International/questions/qa-controls
possible fixes:
* Run SQL query that find and replace these characters
* Patch bugzilla so that it replace them during xml conversion
Inside Bugzilla/WebService/Server/XMLRPC.pm, in _strip_undefs, at the
end of the function (around line 250):
if (ref $initial eq '')
{
$initial =~ s/([\x01-\x08\x0b\x0c\x0f-\x1f])/sprintf "\\x%02x",
ord($1)/ge;
}
should do the trick but that, indeed, damages some binaries. Do we
actually want to export them? Because XML is not a good format for
exports of binary files as it doesn't allow some characters. What
about getting the out using some SQL query? Why do we even need to use
XML? Is it only way to import to phab?
On Fri, Oct 24, 2014 at 11:12 PM, Petr Bena <benapetr(a)gmail.com> wrote:
Isn't Marc expert? :P
I will have a look as well...
On Fri, Oct 24, 2014 at 10:55 PM, Quim Gil <qgil(a)wikimedia.org> wrote:
> The Wikimedia Phabricator team needs help from someone familiar with PERL.
>
> The Bugzilla API has a bug, which we tried to fix with a patch, but now
> that patch creates another problem. Now we either break comments or binary
> attachments. The details:
>
> Upstream Bugzilla XML-RPC API issue creates invalid XML
>
https://phabricator.wikimedia.org/T815
>
> Your help is welcome! It doesn't seem to be too complicated. The task
> doesn't require any background on Phabricator or Bugzilla.
>
> --
> Quim Gil
> Engineering Community Manager @ Wikimedia Foundation
>
http://www.mediawiki.org/wiki/User:Qgil
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/wikitech-l