Discussion:
[phpxmlrpc] Special special chars in XML Response
Matthias Korn
2007-09-17 17:35:40 UTC
Permalink
Hi,

I have an encoding problem of some sort. The data (strings) I'm sending
through xmlresp contains some really nasty characters (e.g. ? ? ? ?) and
breaks the XML parser on the client side. Most of the characters get
automatically converted to their corresponding XML entities by you
library, but not those listed above.

How can I convert them so that my XML parser doesn't break? (I can
verify it's broken in Internet Explorer, which probably uses the same
parser)


Best regards,
Matthias Korn
Matthias Korn
2007-09-18 16:28:54 UTC
Permalink
Hi Gaetano,

thank you for your fast reply and advice! I implemented the steps as you
described, but when setting
$GLOBALS['xmlrpc_internalencoding']='CP1252'; I am now getting the
following error:

Warning: xml_parser_set_option() [function.xml-parser-set-option]:
Unsupported target encoding "CP1252" in
...\module_xmlrpc\lib\xmlrpcs.inc on line 922

The PHP documentation says the only support ISO-8859-1, US-ASCII and
UTF-8: http://de3.php.net/xml_parser_set_option

How can I further tackle this issue?

Thanks and best regards,
Matthias Korn
The characters you are sending are very likely part of the windows
charset, aka, cp 1252.
There is no support for that right now, but it is quite is easy to add
in xmlrpc.inc, on line 152, an array is already defined with the
necessary translation. Using array_keys() and array_values() on it,
$escaped_data = str_replace(array('&', '"', "'", '<',
$escaped_data =
str_replace($GLOBALS['xml_iso88591_Entities']['in'],
$GLOBALS['xml_iso88591_Entities']['out'], $escaped_data);
$escaped_data =
str_replace(array_keys(array_keys($GLOBALS['$cp1252_to_xmlent'])),
array_values($GLOBALS['$cp1252_to_xmlent']), $escaped_data);
break;
then of course you have to declare your internal encoding as CP1252
... and maybe check out if there is any decoding function to be
patched...
bye
Gaetano
Post by Matthias Korn
Hi,
I have an encoding problem of some sort. The data (strings) I'm
sending through xmlresp contains some really nasty characters (e.g. ?
? ? ?) and breaks the XML parser on the client side. Most of the
characters get automatically converted to their corresponding XML
entities by you library, but not those listed above.
How can I convert them so that my XML parser doesn't break? (I can
verify it's broken in Internet Explorer, which probably uses the same
parser)
Best regards,
Matthias Korn
_______________________________________________
phpxmlrpc mailing list
http://lists.usefulinc.com/cgi-bin/mailman/listinfo/phpxmlrpc
--
Mit freundlichen Gr??en,

Matthias Korn
----------------------------------------
I n f o t r a X G m b H

Fon +49 (0)271 30 30 888
Fax +49 (0)271 74124-77
Mob +49 (0)176 700 17 17 8

Besuchsadresse: Postadresse:
Hindenburgstrasse 11 Setzer Weg 29
57072 Siegen 57076 Siegen

Gesch?ftsf?hrer
Dipl.-Ing. Marc Staiger

Handelsregister
HRB7776 Amtsgericht Siegen

http://www.prometa.de
http://www.infotrax.de
----------------------------------------
Gaetano Giunta
2007-09-20 00:11:49 UTC
Permalink
The answer was clearly given without enough thinking...

The first question is: are you using the lib to write the client, the
server or both?

Then some explanations:
- on line 922, if the server has received some CP1252 text, it should
default to $GLOBALS['xmlrpc_defencoding']='UTF-8'. Did you also change
that variable? otherwise I cannot explain it...
- are you using php 4 or 5? there are some differences between the xml
parser use by php
- there is some more work surely to be done for everything to work fine.
Setting internalencoding to CP1252 before emitting (encoding) data is
fine, but, as you have seen, it cannot be used when decoding it. And
both server and client decode data (request and response, respectively).
Since cp1252 is not supported by the php4 xml parser, we have to find
some workaround

Bye
Gaetano
Post by Matthias Korn
Hi Gaetano,
thank you for your fast reply and advice! I implemented the steps as
you described, but when setting
$GLOBALS['xmlrpc_internalencoding']='CP1252'; I am now getting the
Unsupported target encoding &quot;CP1252&quot; in
...\module_xmlrpc\lib\xmlrpcs.inc on line 922
The PHP documentation says the only support ISO-8859-1, US-ASCII and
UTF-8: http://de3.php.net/xml_parser_set_option
How can I further tackle this issue?
Thanks and best regards,
Matthias Korn
The characters you are sending are very likely part of the windows
charset, aka, cp 1252.
There is no support for that right now, but it is quite is easy to
in xmlrpc.inc, on line 152, an array is already defined with the
necessary translation. Using array_keys() and array_values() on it,
$escaped_data = str_replace(array('&', '"', "'", '<',
$escaped_data =
str_replace($GLOBALS['xml_iso88591_Entities']['in'],
$GLOBALS['xml_iso88591_Entities']['out'], $escaped_data);
$escaped_data =
str_replace(array_keys(array_keys($GLOBALS['$cp1252_to_xmlent'])),
array_values($GLOBALS['$cp1252_to_xmlent']), $escaped_data);
break;
then of course you have to declare your internal encoding as CP1252
... and maybe check out if there is any decoding function to be
patched...
bye
Gaetano
Post by Matthias Korn
Hi,
I have an encoding problem of some sort. The data (strings) I'm
sending through xmlresp contains some really nasty characters (e.g.
? ? ? ?) and breaks the XML parser on the client side. Most of the
characters get automatically converted to their corresponding XML
entities by you library, but not those listed above.
How can I convert them so that my XML parser doesn't break? (I can
verify it's broken in Internet Explorer, which probably uses the
same parser)
Best regards,
Matthias Korn
_______________________________________________
phpxmlrpc mailing list
http://lists.usefulinc.com/cgi-bin/mailman/listinfo/phpxmlrpc
Gaetano Giunta
2007-09-20 04:20:21 UTC
Permalink
Ok, I have seen that line 922 is actually line 932 on my version of the lib.

This hints to the fact that you are writing an xmlrpcserver.
xmlrpc_defencoding has nothing to do with the problem.

The patch I would recommend to xmlrpcs.inc is the following:
if (!in_array($GLOBALS['xmlrpc_internalencoding'],
array('UTF-8', 'ISO-8859-1', 'US-ASCII')))
{
xml_parser_set_option($parser,
XML_OPTION_TARGET_ENCODING, 'UTF-8');
}
else
{
xml_parser_set_option($parser,
XML_OPTION_TARGET_ENCODING, $GLOBALS['xmlrpc_internalencoding']);
}

What this patch does is that
- it makes sure that no warning is emitted
- most importantly, it makes sure the charset encoding of the data as
seen by the user code is not dependent on the encoding of data received
over the net (as opposed to just prepending an @ in front of
xml_parser_set_option)
- it picks the charset encoding with the widest range, to avoid data loss

This means that, when $xmlrpc_internalencoding is set to a charset other
than the 3 allowed, incoming data will always be in UTF8.
It is up to your code to treat it appropriately in xmlrpc method
handlers (eg. via utf8_decode or using mbstring for UTF8 -> CP1252
translation).

Bye
Gaetano
Post by Gaetano Giunta
The answer was clearly given without enough thinking...
The first question is: are you using the lib to write the client, the
server or both?
- on line 922, if the server has received some CP1252 text, it should
default to $GLOBALS['xmlrpc_defencoding']='UTF-8'. Did you also change
that variable? otherwise I cannot explain it...
- are you using php 4 or 5? there are some differences between the xml
parser use by php
- there is some more work surely to be done for everything to work
fine. Setting internalencoding to CP1252 before emitting (encoding)
data is fine, but, as you have seen, it cannot be used when decoding
it. And both server and client decode data (request and response,
respectively). Since cp1252 is not supported by the php4 xml parser,
we have to find some workaround
Bye
Gaetano
Post by Matthias Korn
Hi Gaetano,
thank you for your fast reply and advice! I implemented the steps as
you described, but when setting
$GLOBALS['xmlrpc_internalencoding']='CP1252'; I am now getting the
Unsupported target encoding &quot;CP1252&quot; in
...\module_xmlrpc\lib\xmlrpcs.inc on line 922
The PHP documentation says the only support ISO-8859-1, US-ASCII and
UTF-8: http://de3.php.net/xml_parser_set_option
How can I further tackle this issue?
Thanks and best regards,
Matthias Korn
The characters you are sending are very likely part of the windows
charset, aka, cp 1252.
There is no support for that right now, but it is quite is easy to
in xmlrpc.inc, on line 152, an array is already defined with the
necessary translation. Using array_keys() and array_values() on it,
$escaped_data = str_replace(array('&', '"', "'", '<',
$escaped_data =
str_replace($GLOBALS['xml_iso88591_Entities']['in'],
$GLOBALS['xml_iso88591_Entities']['out'], $escaped_data);
$escaped_data =
str_replace(array_keys(array_keys($GLOBALS['$cp1252_to_xmlent'])),
array_values($GLOBALS['$cp1252_to_xmlent']), $escaped_data);
break;
then of course you have to declare your internal encoding as CP1252
... and maybe check out if there is any decoding function to be
patched...
bye
Gaetano
Post by Matthias Korn
Hi,
I have an encoding problem of some sort. The data (strings) I'm
sending through xmlresp contains some really nasty characters (e.g.
? ? ? ?) and breaks the XML parser on the client side. Most of the
characters get automatically converted to their corresponding XML
entities by you library, but not those listed above.
How can I convert them so that my XML parser doesn't break? (I can
verify it's broken in Internet Explorer, which probably uses the
same parser)
Best regards,
Matthias Korn
_______________________________________________
phpxmlrpc mailing list
http://lists.usefulinc.com/cgi-bin/mailman/listinfo/phpxmlrpc
Matthias Korn
2007-09-21 15:11:52 UTC
Permalink
Hi Gaetano,

thank you again for your valuable hints.

I managed to get some response from my XMLRPC-Server by following your
suggestions and additionally adding \case 'CP1252_':\ and \case
'CP1252_UTF-8':\ to the switch-statement we talked about earlier (not
sure which one he chooses though). Still, what I get does not seem right.

I get this if I set the xmlrpc_internalencoding to CP1252 and the
XML_OPTION_TARGET_ENCODING to UTF-8:
<member>
<name>SessionID</name>
<value><string>a? d? ???? ?? ????????d???c?? ????d? ?? cc? d?????????????????b???a</string></value>
</member>

And this if XML_OPTION_TARGET_ENCODING is ISO-8859-1:
<member>
<name>SessionID</name>
<value><string>cb???d?????????? ????? ???? e??ba???eeae?????d?? ?????????c???f???</string></value>
</member>

(It is just a sessionID with numbers and letters)

Unfortunately, I do not understand much of all that encoding-stuff.

For now I am switching over to another approach where dataloss cannot
absolutely be ruled out (in case of CP1252 encoded chars).

Thanks for your help.
Matthias Korn
Post by Gaetano Giunta
Ok, I have seen that line 922 is actually line 932 on my version of the lib.
This hints to the fact that you are writing an xmlrpcserver.
xmlrpc_defencoding has nothing to do with the problem.
if (!in_array($GLOBALS['xmlrpc_internalencoding'],
array('UTF-8', 'ISO-8859-1', 'US-ASCII')))
{
xml_parser_set_option($parser,
XML_OPTION_TARGET_ENCODING, 'UTF-8');
}
else
{
xml_parser_set_option($parser,
XML_OPTION_TARGET_ENCODING, $GLOBALS['xmlrpc_internalencoding']);
}
What this patch does is that
- it makes sure that no warning is emitted
- most importantly, it makes sure the charset encoding of the data as
seen by the user code is not dependent on the encoding of data
xml_parser_set_option)
- it picks the charset encoding with the widest range, to avoid data loss
This means that, when $xmlrpc_internalencoding is set to a charset
other than the 3 allowed, incoming data will always be in UTF8.
It is up to your code to treat it appropriately in xmlrpc method
handlers (eg. via utf8_decode or using mbstring for UTF8 -> CP1252
translation).
Bye
Gaetano
Post by Gaetano Giunta
The answer was clearly given without enough thinking...
The first question is: are you using the lib to write the client, the
server or both?
- on line 922, if the server has received some CP1252 text, it should
default to $GLOBALS['xmlrpc_defencoding']='UTF-8'. Did you also
change that variable? otherwise I cannot explain it...
- are you using php 4 or 5? there are some differences between the
xml parser use by php
- there is some more work surely to be done for everything to work
fine. Setting internalencoding to CP1252 before emitting (encoding)
data is fine, but, as you have seen, it cannot be used when decoding
it. And both server and client decode data (request and response,
respectively). Since cp1252 is not supported by the php4 xml parser,
we have to find some workaround
Bye
Gaetano
Post by Matthias Korn
Hi Gaetano,
thank you for your fast reply and advice! I implemented the steps as
you described, but when setting
$GLOBALS['xmlrpc_internalencoding']='CP1252'; I am now getting the
Unsupported target encoding &quot;CP1252&quot; in
...\module_xmlrpc\lib\xmlrpcs.inc on line 922
The PHP documentation says the only support ISO-8859-1, US-ASCII and
UTF-8: http://de3.php.net/xml_parser_set_option
How can I further tackle this issue?
Thanks and best regards,
Matthias Korn
The characters you are sending are very likely part of the
windows charset, aka, cp 1252.
There is no support for that right now, but it is quite is easy to
in xmlrpc.inc, on line 152, an array is already defined with the
necessary translation. Using array_keys() and array_values() on it,
$escaped_data = str_replace(array('&', '"', "'",
$escaped_data =
str_replace($GLOBALS['xml_iso88591_Entities']['in'],
$GLOBALS['xml_iso88591_Entities']['out'], $escaped_data);
$escaped_data =
str_replace(array_keys(array_keys($GLOBALS['$cp1252_to_xmlent'])),
array_values($GLOBALS['$cp1252_to_xmlent']), $escaped_data);
break;
then of course you have to declare your internal encoding as CP1252
... and maybe check out if there is any decoding function to be
patched...
bye
Gaetano
Post by Matthias Korn
Hi,
I have an encoding problem of some sort. The data (strings) I'm
sending through xmlresp contains some really nasty characters
(e.g. ? ? ? ?) and breaks the XML parser on the client side. Most
of the characters get automatically converted to their
corresponding XML entities by you library, but not those listed
above.
How can I convert them so that my XML parser doesn't break? (I can
verify it's broken in Internet Explorer, which probably uses the
same parser)
Best regards,
Matthias Korn
_______________________________________________
phpxmlrpc mailing list
http://lists.usefulinc.com/cgi-bin/mailman/listinfo/phpxmlrpc
Loading...