python整數、字符串、字節串相互轉換 · python

數據解析時，python可以相互轉換各種數據類型。最近在斯坦福公開課《密碼學》網站上面做題發現，我對數據轉換很不熟悉，寫下日志記下用法。導航 | | 數字 | 字符串 | 字節碼 | | --- | --- | --- | --- | | 到數字 | [進制轉換](https://lixingcong.github.io/2016/03/06/convert-data-in-python/#進制轉換) | [字符轉整數](https://lixingcong.github.io/2016/03/06/convert-data-in-python/#字符to整數) | [字節串轉整數](https://lixingcong.github.io/2016/03/06/convert-data-in-python/#字節串to整數) | | 到字符串 | str() | [字符串編碼解碼](https://lixingcong.github.io/2016/03/06/convert-data-in-python/#字節串to字符串) | decode(‘hex’) | | 到字節碼 | [數字轉字符串](https://lixingcong.github.io/2016/03/06/convert-data-in-python/#整數to字節串) | [字符串轉字節串](https://lixingcong.github.io/2016/03/06/convert-data-in-python/#字符串to字節串) | no | 還有常見的單個字符轉換 | 函數 | 功能 | 記憶口訣 | 備注 | | --- | --- | --- | --- | | chr | 數字轉成對應的ascii字符 | chr長得很像char，因此轉成char | 范圍為0~255 | | ord | 單個字符轉對應ascii序號 | digit為最后一個字母 | ### [](https://lixingcong.github.io/2016/03/06/convert-data-in-python/#進制轉換 "進制轉換")進制轉換 10進制轉16進制: ~~~ hex(16) ==> 0x10 ~~~ 16進制轉10進制: int(STRING,BASE)將字符串STRING轉成十進制int，其中STRING的基是base。該函數的第一個參數是字符串 ~~~ int('0x10', 16) ==> 16 ~~~ 類似的還有八進制oct()，二進制bin() 16進制字符串轉成二進制 ~~~ hex_str='00fe' bin(int('1'+hex_str, 16))[3:] #含有前導0 # 結果 '0000000011111110' bin(int(hex_str, 16))[2:] #忽略前導0 # 結果 '11111110' ~~~ 二進制字符串轉成16進制字符串 ~~~ bin_str='0b0111000011001100' hex(int(bin_str,2)) # 結果 '0x70cc' ~~~ ### [](https://lixingcong.github.io/2016/03/06/convert-data-in-python/#字符to整數 "字符to整數")字符to整數 10進制字符串: ~~~ int('10') ==> 10 ~~~ 16進制字符串: ~~~ int('10', 16) ==> 16 # 或者 int('0x10', 16) ==> 16 ~~~ ### [](https://lixingcong.github.io/2016/03/06/convert-data-in-python/#字節串to整數 "字節串to整數")字節串to整數使用網絡數據包常用的struct，兼容C語言的數據結構 struct中支持的格式如下表 | Format | C-Type | Python-Type | 字節數 | 備注 | | --- | --- | --- | --- | --- | | x | pad byte | no value | 1 | | | c | char | string of length 1 | 1 | | | b | signed char | integer | 1 | | | B | unsigned char | integer | 1 | | | ? | _Bool | bool | 1 | | | h | short | integer | 2 | | | H | unsigned short | integer | 2 | | | i | int | integer | 4 | | | I | unsigned int | integer or long | 4 | | | l | long | integer | 4 | | | L | unsigned long | long | 4 | | | q | long long | long | 8 | 僅支持64bit機器 | | Q | unsigned long long | long | 8 | 僅支持64bit機器 | | f | float | float | 4 | | | d | double | float | 8 | | | s | char[] | string | 1 | | | p | char[] | string | 1(與機器有關) | 作為指針 | | P | void * | long | 4 | 作為指針 | 對齊方式：放在第一個fmt位置 | CHARACTER | BYTE ORDER | SIZE | ALIGNMENT | | --- | --- | --- | --- | | @ | native | native | native | | = | native | standard | none | | < | little-endian | standard | none | | > | big-endian | standard | none | | ! | network (= big-endian) | standard | none | 轉義為short型整數: ~~~ struct.unpack('<hh', bytes(b'\x01\x00\x00\x00')) ==> (1, 0) ~~~ 轉義為long型整數: ~~~ struct.unpack('<L', bytes(b'\x01\x00\x00\x00')) ==> (1,) ~~~ ### [](https://lixingcong.github.io/2016/03/06/convert-data-in-python/#整數to字節串 "整數to字節串")整數to字節串轉為兩個字節: ~~~ struct.pack('<HH', 1,2) ==> b'\x01\x00\x02\x00' ~~~ 轉為四個字節: ~~~ struct.pack('<LL', 1,2) ==> b'\x01\x00\x00\x00\x02\x00\x00\x00' ~~~ ### [](https://lixingcong.github.io/2016/03/06/convert-data-in-python/#整數to字符串 "整數to字符串")整數to字符串直接用函數 ~~~ str(100) ~~~ ### [](https://lixingcong.github.io/2016/03/06/convert-data-in-python/#字符串to字節串 "字符串to字節串")字符串to字節串 [我用c++實現的encode(hex)和decode(hex)](https://lixingcong.github.io/2016/03/06/convert-data-in-python/#CPP實現encode) *decode和encode區別* decode函數是重新解碼，把CT字符串所顯示的69dda8455c7dd425【每隔兩個字符】解碼成十六進制字符\x69\xdd\xa8\x45\x5c\x7d\xd4\x25 ~~~ CT='69dda8455c7dd425' print "%r"%CT.decode('hex') ~~~ encode函數是重新編碼，把CT字符串所顯示的69dda8455c7dd425【每個字符】編碼成acsii值，ascii值為十六進制顯示，占兩位。執行下列結果顯示36396464613834353563376464343235等價于將CT第一個字符’6’編碼為0x36h 第二個字符’9’編碼為0x39h ~~~ CT='69dda8455c7dd425' print "%r"%CT.encode('hex') ~~~ *可以理解為：decode解碼，字符串變短一半，encode編碼，字符串變為兩倍長度* decode(‘ascii’)解碼為字符串Unicode格式。輸出帶有’u’ encode(‘ascii’)，編碼為Unicode格式，其實python默認處理字符串存儲就是Unicode，輸出結果估計和原來的字符串一樣。字符串編碼為字節碼: ~~~ '12abc'.encode('ascii') ==> b'12abc' ~~~ 數字或字符數組: ~~~ bytes([1,2, ord('1'),ord('2')]) ==> b'\x01\x0212' ~~~ 16進制字符串: ~~~ bytes().fromhex('010210') ==> b'\x01\x02\x10' ~~~ 16進制字符串: ~~~ bytes(map(ord, '\x01\x02\x31\x32')) ==> b'\x01\x0212' ~~~ 16進制數組: ~~~ bytes([0x01,0x02,0x31,0x32]) ==> b'\x01\x0212' ~~~ ### [](https://lixingcong.github.io/2016/03/06/convert-data-in-python/#字節串to字符串 "字節串to字符串")字節串to字符串字節碼解碼為字符串: ~~~ bytes(b'\x31\x32\x61\x62').decode('ascii') ==> 12ab ~~~ 字節串轉16進制表示,夾帶ascii: ~~~ str(bytes(b'\x01\x0212'))[2:-1] ==> \x01\x0212 ~~~ 字節串轉16進制表示,固定兩個字符表示: ~~~ str(binascii.b2a_hex(b'\x01\x0212'))[2:-1] ==> 01023132 ~~~ 字節串轉16進制數組: ~~~ [hex(x) for x in bytes(b'\x01\x0212')] ==> ['0x1', '0x2', '0x31', '0x32'] ~~~ 問題：什么時候字符串前面加上’r’、’b’、’r’，其實官方文檔有寫。我認為在Python2中，r和b是等效的。 The Python 2.x documentation: > A prefix of ‘b’ or ‘B’ is ignored in Python 2; it indicates that the literal should become a bytes literal in Python 3 (e.g. when code is automatically converted with 2to3). A ‘u’ or ‘b’ prefix may be followed by an ‘r’ prefix. > ‘b’字符加在字符串前面，對于python2會被忽略。加上’b’目的僅僅為了兼容python3，讓python3以bytes數據類型(0~255)存放這個字符、字符串。 The Python 3.3 documentation states: > Bytes literals are always prefixed with ‘b’ or ‘B’; they produce an instance of the bytes type instead of the str type. They may only contain ASCII characters; bytes with a numeric value of 128 or greater must be expressed with escapes. > 數據類型byte總是以’b’為前綴，該數據類型僅為ascii。下面是stackflow上面一個回答。我覺得不錯，拿出來跟大家分享 In Python 2.x > Pre-3.0 versions of Python lacked this kind of distinction between text and binary data. Instead, there was: > > * unicode = u’…’ literals = sequence of Unicode characters = 3.x str > * str = ‘…’ literals = sequences of confounded bytes/characters > Usually text, encoded in some unspecified encoding. > But also used to represent binary data like struct.pack output. Python 3.x makes a clear distinction between the types: > * str = ‘…’ literals = a sequence of Unicode characters (UTF-16 or UTF-32, depending on how Python was compiled) > * bytes = b’…’ literals = a sequence of octets (integers between 0 and 255) ### [](https://lixingcong.github.io/2016/03/06/convert-data-in-python/#CPP實現encode "CPP實現encode")CPP實現encode 就是做個筆記，畢竟在做題Cryptography時候用c++寫字符串的處理很蛋疼！為了防止再次造輪子，記下來。 ~~~ #include <cstring> //用到strlen函數 static unsigned char ByteMap[] = { '0', '1', '2', '3', '4', '5', '6', '7', '8','9', 'a', 'b', 'c', 'd', 'e', 'f' }; unsigned char hex_2_dec(unsigned char c){ if(c >= '0' && c <= '9') return c - '0'; if(c >= 'a' && c <= 'f') return c - 'a' + 10; } void str_encode(unsigned char *src, unsigned char *dest, int len_of_src) { // 使用注意：dest_len >= 2*len_src +1，最后一位是存放'\0'。 int t1; for (int i = 0; i < len_of_src; ++i) { t1 = (int) src[i]; dest[2 * i] = ByteMap[t1 / 16]; dest[2 * i + 1] = ByteMap[t1 % 16]; } dest[2 * len_of_src] = 0; //必須填充最后一個為'\0' } void str_decode(unsigned char *src,unsigned char *dest){ int len_of_src=strlen((char *)src); unsigned char t1; for(int i=1;i<=len_of_src;i+=2){ t1=hex_2_dec(src[i-1]); t1= 16*t1 + hex_2_dec(src[i]); dest[i/2]=t1; } } ~~~ 鳴謝本文轉載自csdn博客的[《python常用的十進制、16進制、字符串、字節串之間的轉換》](http://blog.csdn.net/crylearner/article/details/38521685)。