Brion has recently added code to store articles in 'old' SQL table in
compressed format, so I will need to adjust the scripts for the
international stats.
I spent several hours on it, and despite some useful tips from Brion I
can't get those article data inflated, all I get is a Z_DATA_ERROR (-3)
Brion sent me a small sample of the articles in the fr: 'old' dump in
compressed raw format, without escape sequences and other fields, just
article data. Even this I could not tackle.
Brion wrote:
Here's a zip file containing the raw bytes of
compressed old_text from
the first up to 100 columns in the table:
http://leuksman.com/misc/raw.zip
They do decompress with gzdeflate() in PHP.
Here is my test script
#!/usr/bin/perl
use CGI::Carp qw(fatalsToBrowser);
use Compress::Zlib;
$path = "raw/" ;
($refinf, $status) = inflateInit();
for ($i = 1 ; $i <= 100 ; $i++)
{ &ReadFile ($i) ; }
exit ;
sub ReadFile
{
$file_in = $path . "old-" . $i . ".raw" ;
open "FILE_IN", "<", $file_in ||
die ("Input file " . $file_in . " could not be opened.") ;
binmode FILE_IN ;
$article = "" ;
while ($line = <FILE_IN>)
{
chomp ($line) ;
$article .= $line ;
}
($article2, $status) = $refinf->inflate ($article) ;
if ($status == Z_OK) # Z_OK = 0
{ print "$i:OK: " . substr ($article2,0,50) . "\n" ; }
else
{ print "$i:Unzip error: $status\n" ; } # Z_DATA_ERROR = -3
}
Can someone help me out with this?
I can deflate/inflate dummy texts, so libraries are all in place.
(I use ActivePerl 5.8, on Windows)
-------------------------------------------------------------
There is a second problem,
possibly trivial after problem above has been solved:
(well actually I hope the problem above is a trivial oversight of mine
too)
The SQL dump contains escape sequences:
A small section of fr: old dump, new style, that Brion sent me contains
\Z: 3541 times
\\: 3497
\": 3428
\n: 3598
\r: 3550
\0: 3190
\Z is not listed on
http://www.mysql.com/doc/en/String_syntax.html
I could not find any other doc referring to it.
\z is listed, so maybe upper/lower makes no difference, but I doubt it.
Anyone encountered this before?
Thanks for any help.
Erik Zachte