<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Python And Unicode&#8230; What a Crap</title>
	<atom:link href="http://simononsoftware.com/python-and-unicode-what-a-crap/feed/" rel="self" type="application/rss+xml" />
	<link>http://simononsoftware.com/python-and-unicode-what-a-crap/</link>
	<description>programming, databases and other IT something</description>
	<lastBuildDate>Fri, 06 Aug 2010 22:57:30 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Jeff McNeil</title>
		<link>http://simononsoftware.com/python-and-unicode-what-a-crap/comment-page-1/#comment-646</link>
		<dc:creator>Jeff McNeil</dc:creator>
		<pubDate>Fri, 26 Feb 2010 19:27:42 +0000</pubDate>
		<guid isPermaLink="false">http://www.simononsoftware.com/?p=697#comment-646</guid>
		<description>Oh, and yes, it is goofy. Not defending it =)</description>
		<content:encoded><![CDATA[<p>Oh, and yes, it is goofy. Not defending it =)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jeff McNeil</title>
		<link>http://simononsoftware.com/python-and-unicode-what-a-crap/comment-page-1/#comment-645</link>
		<dc:creator>Jeff McNeil</dc:creator>
		<pubDate>Fri, 26 Feb 2010 19:25:30 +0000</pubDate>
		<guid isPermaLink="false">http://www.simononsoftware.com/?p=697#comment-645</guid>
		<description>Py3k fixes most of that oddness as it differentiates between text &amp; binary data pretty cleanly. Look at the codecs module:

Test file: 
-------------
[root@virtapi01 platform]# cat myfile 
здравствуйте!!
[root@virtapi01 platform]# 

Just reading -- notice it&#039;s a str type.
---------------------------------------------
&gt;&gt;&gt; type(open(&#039;myfile&#039;).read())

&gt;&gt;&gt; 

Using codecs -- notice it&#039;s the correct unicode type.
----------------------------------------------------------------------------
&gt;&gt;&gt; type (codecs.open(&#039;myfile&#039;, encoding=&#039;utf8&#039;).read())

&gt;&gt;&gt; 

You could always (as the byte stream *is* valid utf8 data):
---------------------------
&gt;&gt;&gt; open(&#039;myfile&#039;).read().decode(&#039;utf8&#039;)
u&#039;\u0437\u0434\u0440\u0430\u0432\u0441\u0442\u0432\u0443\u0439\u0442\u0435!!\n\n&#039;

And the reason the prints work direct to the console?
---------------------------------------------------------------------------------
&gt;&gt;&gt; sys.stdin.encoding
&#039;UTF-8&#039;
&gt;&gt;&gt; sys.stdout.encoding
&#039;UTF-8&#039;
&gt;&gt;&gt; 

HTH =) I guess I know this better than I thought I did. Interesting blog, btw.</description>
		<content:encoded><![CDATA[<p>Py3k fixes most of that oddness as it differentiates between text &amp; binary data pretty cleanly. Look at the codecs module:</p>
<p>Test file:<br />
&#8212;&#8212;&#8212;&#8212;-<br />
[root@virtapi01 platform]# cat myfile<br />
здравствуйте!!<br />
[root@virtapi01 platform]# </p>
<p>Just reading &#8212; notice it&#8217;s a str type.<br />
&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br />
&gt;&gt;&gt; type(open(&#8216;myfile&#8217;).read())</p>
<p>&gt;&gt;&gt; </p>
<p>Using codecs &#8212; notice it&#8217;s the correct unicode type.<br />
&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<br />
&gt;&gt;&gt; type (codecs.open(&#8216;myfile&#8217;, encoding=&#8217;utf8&#8242;).read())</p>
<p>&gt;&gt;&gt; </p>
<p>You could always (as the byte stream *is* valid utf8 data):<br />
&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br />
&gt;&gt;&gt; open(&#8216;myfile&#8217;).read().decode(&#8216;utf8&#8242;)<br />
u&#8217;\u0437\u0434\u0440\u0430\u0432\u0441\u0442\u0432\u0443\u0439\u0442\u0435!!\n\n&#8217;</p>
<p>And the reason the prints work direct to the console?<br />
&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br />
&gt;&gt;&gt; sys.stdin.encoding<br />
&#8216;UTF-8&#8242;<br />
&gt;&gt;&gt; sys.stdout.encoding<br />
&#8216;UTF-8&#8242;<br />
&gt;&gt;&gt; </p>
<p>HTH =) I guess I know this better than I thought I did. Interesting blog, btw.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Simon</title>
		<link>http://simononsoftware.com/python-and-unicode-what-a-crap/comment-page-1/#comment-644</link>
		<dc:creator>Simon</dc:creator>
		<pubDate>Fri, 26 Feb 2010 19:14:03 +0000</pubDate>
		<guid isPermaLink="false">http://www.simononsoftware.com/?p=697#comment-644</guid>
		<description>@Jeff
Right... I&#039;ve already found that in the documentation... but it means that I need to decide if the string is unicode or not at the time of writing the code. I don&#039;t know Python so much, should I do something magical with variables that I read from files or get from standard input? I want to have a script that uses unicode or not, depending only on the input (or maybe some environment variables).</description>
		<content:encoded><![CDATA[<p>@Jeff<br />
Right&#8230; I&#8217;ve already found that in the documentation&#8230; but it means that I need to decide if the string is unicode or not at the time of writing the code. I don&#8217;t know Python so much, should I do something magical with variables that I read from files or get from standard input? I want to have a script that uses unicode or not, depending only on the input (or maybe some environment variables).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jeff McNeil</title>
		<link>http://simononsoftware.com/python-and-unicode-what-a-crap/comment-page-1/#comment-643</link>
		<dc:creator>Jeff McNeil</dc:creator>
		<pubDate>Fri, 26 Feb 2010 19:06:52 +0000</pubDate>
		<guid isPermaLink="false">http://www.simononsoftware.com/?p=697#comment-643</guid>
		<description>Python 2.4.3 (#1, Jun 11 2009, 14:09:58) 
[GCC 4.1.2 20080704 (Red Hat 4.1.2-44)] on linux2
Type &quot;help&quot;, &quot;copyright&quot;, &quot;credits&quot; or &quot;license&quot; for more information.
&gt;&gt;&gt; len(&quot;hello&quot;)
5
&gt;&gt;&gt; len(&quot;здравствуйте&quot;)
24
&gt;&gt;&gt; len(u&quot;здравствуйте&quot;)
12
&gt;&gt;&gt; 

Note the &#039;u&#039; before the correct unicode version, without it, you&#039;re just reading a stream of bytes fed into Python via the terminal that is interpreted as an ASCII string.</description>
		<content:encoded><![CDATA[<p>Python 2.4.3 (#1, Jun 11 2009, 14:09:58)<br />
[GCC 4.1.2 20080704 (Red Hat 4.1.2-44)] on linux2<br />
Type &#8220;help&#8221;, &#8220;copyright&#8221;, &#8220;credits&#8221; or &#8220;license&#8221; for more information.<br />
&gt;&gt;&gt; len(&#8220;hello&#8221;)<br />
5<br />
&gt;&gt;&gt; len(&#8220;здравствуйте&#8221;)<br />
24<br />
&gt;&gt;&gt; len(u&#8221;здравствуйте&#8221;)<br />
12<br />
&gt;&gt;&gt; </p>
<p>Note the &#8216;u&#8217; before the correct unicode version, without it, you&#8217;re just reading a stream of bytes fed into Python via the terminal that is interpreted as an ASCII string.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
