16 | | Therefore, `unicode` can be seen as the ''safe side'' of textual data: |
17 | | once you're in `unicode`, you know that your text data can contain any |
18 | | kind of multilingual characters, and that you can safely manipulate it |
19 | | the expected way. |
20 | | |
21 | | On the other hand, a `str` object can be used to contain anything, |
22 | | binary data, or some text using any conceivable encoding. |
23 | | But if it supposed to contain some text, it is crucial to know |
24 | | which encoding was used. That knowledge must be known or inferred |
25 | | from somewhere, which is not always a trivial thing to do. |
26 | | |
27 | | In summary, it is not manipulating `unicode` object which is |
28 | | problematic (it is not), but how to go from the "wild" side |
29 | | to the "safe" side... |
30 | | Going from `unicode` to `str` is usually less problematic, |
31 | | because you can always control what kind of encoding you |
32 | | want to use for serializing your Unicode data. |
| 16 | `unicode` provides a real representation of textual data: once you're in `unicode`, you know that your text data can contain any kind of multilingual characters, and that you can safely manipulate it the expected way. |
| 17 | |
| 18 | On the other hand, a `str` object can be used to contain anything, binary data, or some text using any conceivable encoding. But if it's supposed to contain text, it is crucial to know which encoding was used. That knowledge must be known or inferred from somewhere, which is not always a trivial thing to do. |
| 19 | |
| 20 | In summary, it is not manipulating `unicode` object which is problematic (it is not), but how to go from the "wild" side (`str`) to the "safe" side (`unicode`)… Going from `unicode` to `str` is usually less problematic, because you can always control what kind of encoding you want to use for serializing your Unicode data. |