QRegExp Class Reference · PyQt4 中文文檔

# QRegExp Class Reference ## [[QtCore](index.htm) module] 使用正則表達式類負責提供模式匹配。[More...](#details) ### Types * `enum CaretMode { CaretAtZero, CaretAtOffset, CaretWontMatch }` * `enum PatternSyntax { RegExp, RegExp2, Wildcard, FixedString, WildcardUnix, W3CXmlSchema11 }` ### Methods * `__init__ (self)` * `__init__ (self, QString?pattern, Qt.CaseSensitivity?cs?=?Qt.CaseSensitive, PatternSyntax?syntax?=?QRegExp.RegExp)` * `__init__ (self, QRegExp?rx)` * `QString cap (self, int?nth?=?0)` * `int captureCount (self)` * `QStringList capturedTexts (self)` * `Qt.CaseSensitivity caseSensitivity (self)` * `QString errorString (self)` * `bool exactMatch (self, QString?str)` * `int indexIn (self, QString?str, int?offset?=?0, CaretMode?caretMode?=?QRegExp.CaretAtZero)` * `bool isEmpty (self)` * `bool isMinimal (self)` * `bool isValid (self)` * `int lastIndexIn (self, QString?str, int?offset?=?-1, CaretMode?caretMode?=?QRegExp.CaretAtZero)` * `int matchedLength (self)` * `int numCaptures (self)` * `QString pattern (self)` * `PatternSyntax patternSyntax (self)` * `int pos (self, int?nth?=?0)` * `setCaseSensitivity (self, Qt.CaseSensitivity?cs)` * `setMinimal (self, bool?minimal)` * `setPattern (self, QString?pattern)` * `setPatternSyntax (self, PatternSyntax?syntax)` * `swap (self, QRegExp?other)` ### Static Methods * `QString escape (QString?str)` ### Special Methods * `bool __eq__ (self, QRegExp?rx)` * `bool __ne__ (self, QRegExp?rx)` * `str __repr__ (self)` * * * ## Detailed Description 使用正則表達式類負責提供模式匹配。正則表達式，或“正則表達式” ，是在一個文本字符串匹配的模式。這在許多情況下是有用的，例如， | Validation | A regexp can test whether a substring meets some criteria, e.g. is an integer or contains no whitespace. | | Searching | A regexp provides more powerful pattern matching than simple substring matching, e.g., match one of the words _mail_, _letter_ or _correspondence_, but none of the words _email_, _mailman_, _mailer_, _letterbox_, etc. | | Search and Replace | A regexp can replace all occurrences of a substring with a different substring, e.g., replace all occurrences of _&_ with _&_ except where the _&_ is already followed by an _amp;_. | | String Splitting | A regexp can be used to identify where a string should be split apart, e.g. splitting tab-delimited strings. | 簡要介紹了正則表達式給出， Qt的正則表達式語言，一些例子，函數文檔本身的描述。 QRegExp是仿照Perl的正則表達式語言。它完全支持Unicode 。 QRegExp也可以使用更簡單的，_wildcard mode_類似于在命令shell中的功能。使用QRegExp的語法規則可以被改變[setPatternSyntax](qregexp.html#setPatternSyntax)（）。特別是，該模式的語法可以被設置為[QRegExp.FixedString](qregexp.html#PatternSyntax-enum)，這意味著要匹配的模式被解釋為一個普通的字符串，即特殊字符（如反斜線）不會逃脫。在正則表達式一個很好的文本_Mastering Regular Expressions_（第三版）由Jeffrey EF Friedl的，國際標準書號0-596-52812-4 。 ### Introduction 正則表達式是由表情，量詞，并斷言建立起來。最簡單的表達式是一個字符，例如**x** or **5**。表達式也可以是一組方括號括起來的字符。**[ABCD]**將匹配**A**或**B**或**C**或**D**。我們可以這樣寫相同的表達式為**[A-D]**，以及以消痰匹配英文字母的任何下水管局字母寫成**[A-Z]**。量詞指定事件表達式必須匹配的數量。**x{1,1}**意味著匹配一個且只有一個**x**。**x{1,5}**表示匹配序列**x**包含至少一個字符**x**但不超過五個。請注意，在一般的正則表達式不能用于檢查平衡括號或標籤。例如，正則表達式可以被寫入到一個匹配HTML開始``及其關閉``如果``標籤不能嵌套，但如果``標籤是嵌套的，即同樣的正則表達式將匹配一開口``用錯誤的結束標記``。對于片段`bold bolder`中，第一``將與第一個匹配``，這是不正確的。但是，可以編寫一個正則表達式，將正確匹配嵌套的括號或標籤，但只有當嵌套級別的數量是固定的和已知的。如果嵌套級別的數量是不固定的和已知的，它是不可能寫，不會失敗的一個正則表達式。假設我們希望有一個正則表達式來取值范圍為0 ?99的整數相匹配。至少有一個數字是必需的，所以我們開始與表達**[0-9]{1,1}**，它匹配單個數字只出現一次。這個正則表達式在范圍0到9的整數匹配。要匹配整數高達99 ，增加出現的最大數量為2，所以正則表達式變得**[0-9]{1,2}**。此正則表達式滿足原來的要求相匹配的整數0至99 ，但它也將匹配出現在字符串中的中間整數。如果我們想匹配的整數是整個字符串，必須用錨斷言，**^**（插入符號）和**$**（美元）。何時**^**是一個正則表達式的第一個字符，這意味著正則表達式必須從字符串的開頭匹配。何時**$**是正則表達式的最后一個字符，則表示正則表達式必須匹配到字符串的結尾。正則表達式變得**^[0-9]{1,2}$**。需要注意的是斷言，例如**^**和**$**，不匹配的字符串中的字符，但位置。如果你看到其他地方所描述的正則表達式，它們可能看起來從此處顯示的不同。這是因為有些字符集和一些量詞是如此普遍，他們被賦予特殊的符號來表示它們。**[0-9]**可以替換的符號**\d**。量詞完全匹配出現一次，**{1,1}**，可以被替換的表達式本身，即**x{1,1}**是一樣的**x**。所以我們的0至99的匹配可以寫成**^\d{1,2}$**。它也可以寫成**^\d\d{0,1}$**，即_From the start of the string, match a digit, followed immediately by 0 or 1 digits_。在實踐中，這將被寫為**^\d\d?$**。該**?**是簡寫的量詞**{0,1}**，即0或1出現。**?**使一個表達式可選。正則表達式**^\d\d?$** means _From the beginning of the string, match one digit, followed immediately by 0 or 1 more digit, followed immediately by end of string_。寫一個正則表達式匹配一個單詞'郵件'的_or_'信'_or_“對應”，但不匹配詞都包含這些用語，例如， “電子郵件” ， “郵差”，“郵件”和“信箱” ，開始用正則表達式匹配'郵件'的。充分表達，正則表達式是**m{1,1}a{1,1}i{1,1}l{1,1}**，但由于字符表達式是由自動定量**{1,1}**，我們可以簡化對正則表達式**mail**，即一個'M'后跟一個'a '后跟一個'i'后跟一個'L' 。現在我們可以使用豎線**|**，這意味著**or**，包括其他兩個詞，所以我們的正則表達式匹配任意的三個詞的變**mail|letter|correspondence**。匹配'郵件'**or**'信'**or**'對應' 。雖然這個規則表達式將與要匹配的三個詞之一，它也將匹配，我們不想匹配的話，比如， “電子郵件” 。要防止正則表達式匹配從不受歡迎的詞匯，我們必須告訴它在字邊界開始和結束比賽。首先，我們附上我們的正則表達式中括號，**(mail|letter|correspondence)**。括號組表達式在一起，他們確定，我們希望將正則表達式的一部分[capture](#capturing-text)。括在括號中的表達式允許我們使用它作為在更復雜的正則表達式的組件。這也讓我們審視這三個詞實際上是匹配的。給力的比賽開始和結束的單詞邊界，我們附上了正則表達式中**\b** _word boundary_聲明：**\b(mail|letter|correspondence)\b**。現在正則表達式的意思是：_Match a word boundary, followed by the regexp in parentheses, followed by a word boundary_。該**\b**斷言匹配_position_在正則表達式，而不是一個_character_。字邊界是任何非單詞字符，如空格，換行符或字符串的開頭或結尾。如果我們要使用HTML實體替換符號字符**&**，正則表達式匹配是根本**&**。但是，這也正則表達式將匹配已經轉換為HTML實體＆符號。我們想要替換只能尚未其次是＆符號**amp;**。對于這一點，我們需要的負預測先行斷言，**(?!**[__](index.html)**)**。正則表達式可以被寫成**&(?!amp;)**，即_Match an ampersand that is_ **not** _followed by_ **amp;**。如果我們要計算'埃里克'和'的Eirik '的所有在字符串中出現，兩個有效的解決方案**\b(Eric|Eirik)\b**和**\bEi?ri[ck]\b**。單詞邊界斷言'\ B'是必須避免含有任一名稱匹配單詞，例如： “愛立信” 。請注意，第二個正則表達式匹配更多的拼寫比我們想： '埃里克'，'埃里克'，' Eiric '和'的Eirik ' 。一些上面討論的實施例是在實施[code examples](#code-examples)一節。 ### Characters and Abbreviations for Sets of Characters | Element | Meaning | | --- | --- | | **c** | A character represents itself unless it has a special regexp meaning. e.g. **c** matches the character _c_. | | **\c** | A character that follows a backslash matches the character itself, except as specified below. e.g., To match a literal caret at the beginning of a string, write **\^**. | | **\a** | Matches the ASCII bell (BEL, 0x07). | | **\f** | Matches the ASCII form feed (FF, 0x0C). | | **\n** | Matches the ASCII line feed (LF, 0x0A, Unix newline). | | **\r** | Matches the ASCII carriage return (CR, 0x0D). | | **\t** | Matches the ASCII horizontal tab (HT, 0x09). | | **\v** | Matches the ASCII vertical tab (VT, 0x0B). | | **\x_hhhh_** | Matches the Unicode character corresponding to the hexadecimal number _hhhh_ (between 0x0000 and 0xFFFF). | | **\0_ooo_** (i.e., \zero _ooo_) | matches the ASCII/Latin1 character for the octal number _ooo_ (between 0 and 0377). | | **. (dot)** | Matches any character (including newline). | | **\d** | Matches a digit ([QChar.isDigit](qchar.html#isDigit)()). | | **\D** | Matches a non-digit. | | **\s** | Matches a whitespace character ([QChar.isSpace](qchar.html#isSpace)()). | | **\S** | Matches a non-whitespace character. | | **\w** | Matches a word character ([QChar.isLetterOrNumber](qchar.html#isLetterOrNumber)(), [QChar.isMark](qchar.html#isMark)(), or '[_](index.html)'). | | **\W** | Matches a non-word character. | | **\_n_** | The _n_-th [backreference](#backreferences), e.g. \1, \2, etc. | **Note:**在C + +編譯器把反斜杠在字符串中。要包括**\**在一個正則表達式，輸入兩次，即`\\`。要匹配反斜杠字符本身，進入它的四倍，即`\\\\`。 ### Sets of Characters 方括號的意思是匹配包含在方括號內的任何字符。上述字符集的縮寫可以出現在一個字符在方括號中設置。除字符集的縮寫和以下兩個例外，字符沒有在方括號中的特殊含義。 | **^** | The caret negates the character set if it occurs as the first character (i.e. immediately after the opening square bracket). **[abc]** matches 'a' or 'b' or 'c', but **[^abc]** matches anything _but_ 'a' or 'b' or 'c'. | | **-** | The dash indicates a range of characters. **[W-Z]** matches 'W' or 'X' or 'Y' or 'Z'. | 使用預定義的字符集的縮寫比使用字符更便攜的范圍跨平臺和語言。例如，**[0-9]**符合西方字母一個數字，但**\d**匹配一個數字_any_字母表。注：在其他的正則表達式的文檔，字符集通常被稱為“字符類” 。 ### Quantifiers 默認情況下，一個表達式將自動被量化**{1,1}**的，即它應該發生一次。在下面的列表中，**_E_**代表表達。一個表達式是一個字符，或簡稱為一組字符或一組方括號中的字符，或者在括號中的表達式。 | **_E_?** | Matches zero or one occurrences of _E_. This quantifier means _The previous expression is optional_, because it will match whether or not the expression is found. **_E_?** is the same as **_E_{0,1}**. e.g., **dents?** matches 'dent' or 'dents'. | | **_E_+** | Matches one or more occurrences of _E_. **_E_+** is the same as **_E_{1,}**. e.g., **0+** matches '0', '00', '000', etc. | | **_E_*** | Matches zero or more occurrences of _E_. It is the same as **_E_{0,}**. The ***** quantifier is often used in error where **+** should be used. For example, if **\s*$** is used in an expression to match strings that end in whitespace, it will match every string because **\s*$** means _Match zero or more whitespaces followed by end of string_. The correct regexp to match strings that have at least one trailing whitespace character is **\s+$**. | | **_E_{n}** | Matches exactly _n_ occurrences of _E_. **_E_{n}** is the same as repeating _E_ _n_ times. For example, **x{5}** is the same as **xxxxx**. It is also the same as **_E_{n,n}**, e.g. **x{5,5}**. | | **_E_{n,}** | Matches at least _n_ occurrences of _E_. | | **_E_{,m}** | Matches at most _m_ occurrences of _E_. **_E_{,m}** is the same as **_E_{0,m}**. | | **_E_{n,m}** | Matches at least _n_ and at most _m_ occurrences of _E_. | 要應用一個量詞的不僅僅是前面的字符，使用括號字符組一起在一個表達式。例如，**tag+**匹配一個'T'后跟一個'a '后跟至少有一個“G” ，而**(tag)+**匹配至少一個出現'標籤'的。注：量詞通常是“貪婪” 。他們總是匹配盡可能多的文本，因為他們可以。例如，**0+**第一個零匹配后找到的第一個零和所有連續的零。應用到'20005 ' ，它matches'20005 ' 。量詞可以由非貪婪，見[setMinimal](qregexp.html#setMinimal)（）。 ### Capturing Text 括號讓我們組元素結合在一起，使我們能夠量化和捕捉它們。例如，如果我們有表達**mail|letter|correspondence**匹配我們知道，一個串_one_的匹配的話，但沒有哪一個。使用括號使我們能夠“捕獲”無論是匹配的范圍內，所以如果我們使用**(mail|letter|correspondence)**和相匹配的字符串，這個正則表達式“我給你發一些電子郵件” ，我們可以使用[cap](qregexp.html#cap)（）或[capturedTexts](qregexp.html#capturedTexts)（）函數來提取匹配的字符，在這種情況下， “郵件” 。我們可以在正則表達式本身中使用捕獲的文本。要引用我們使用捕獲的文本_backreferences_這是從1索引，一樣的[cap](qregexp.html#cap)（）。例如，我們可以使用一個字符串搜索重復的單詞**\b(\w+)\W+\1\b**這意味著匹配一個單詞邊界后跟后跟后跟相同的文字作為第一個括號表達式后跟一個單詞邊界的一個或多個非單詞字符中的一個或多個單詞字符。如果我們要純粹使用括號進行分組，而不是用于捕捉，我們可以使用非捕獲的語法，如**(?:green|blue)**。非捕獲括號開始' （： '和結束'）' 。在這個例子中，我們要么匹配'綠色'或'藍色'，但我們不捕獲匹配，所以我們只知道我們是否匹配，但沒有哪種顏色，我們居然發現。使用非捕獲括號比使用捕獲括號，因為正則表達式引擎必須少做簿記更有效率。捕捉和非捕獲括號可以嵌套。由于歷史原因，量詞（例如*****）適用于捕獲括號更“貪婪”比其他的量詞。例如，**a*(a*)**將匹配“ AAA”帽（ 1 ） == “AAA” 。這種行為是從什么其他的正則表達式引擎做的（值得注意的是， Perl的）不同。為了獲得更為直觀的捕捉行為，請指定[QRegExp.RegExp2](qregexp.html#PatternSyntax-enum)到QRegExp構造函數或調用setPatternSyntax （[QRegExp.RegExp2](qregexp.html#PatternSyntax-enum)）。當匹配的數量不能預先確定，一個常見的成語是使用[cap](qregexp.html#cap)（）在一個循環。例如： ``` QRegExp rx("(\\d+)"); [QString](qstring.html) str = "Offsets: 12 14 99 231 7"; [QStringList](qstringlist.html) list; int pos = 0; while ((pos = rx.indexIn(str, pos)) != -1) { list << rx.cap(1); pos += rx.matchedLength(); } // list: ["12", "14", "99", "231", "7"] ``` ### Assertions 斷言做出了一些文字語句，他們出現在正則表達式，但它們不匹配任何字符的地步。在下面的列表中**_E_**代表任意表達式。 | **^** | The caret signifies the beginning of the string. If you wish to match a literal `^` you must escape it by writing `\\^`. For example, **^#include** will only match strings which _begin_ with the characters '#include'. (When the caret is the first character of a character set it has a special meaning, see [Sets of Characters](#sets-of-characters).) | | **$** | The dollar signifies the end of the string. For example **\d\s*$** will match strings which end with a digit optionally followed by whitespace. If you wish to match a literal `$` you must escape it by writing `\\$`. | | **\b** | A word boundary. For example the regexp **\bOK\b** means match immediately after a word boundary (e.g. start of string or whitespace) the letter 'O' then the letter 'K' immediately before another word boundary (e.g. end of string or whitespace). But note that the assertion does not actually match any whitespace so if we write **(\bOK\b)** and we have a match it will only contain 'OK' even if the string is "It's OK now". | | **\B** | A non-word boundary. This assertion is true wherever **\b** is false. For example if we searched for **\Bon\B** in "Left on" the match would fail (space and end of string aren't non-word boundaries), but it would match in "tonne". | | **(?=_E_)** | Positive lookahead. This assertion is true if the expression matches at this point in the regexp. For example, **const(?=\s+char)** matches 'const' whenever it is followed by 'char', as in 'static const char *'. (Compare with **const\s+char**, which matches 'static const char *'.) | | **(?!_E_)** | Negative lookahead. This assertion is true if the expression does not match at this point in the regexp. For example, **const(?!\s+char)** matches 'const' _except_ when it is followed by 'char'. | ### Wildcard Matching 大多數命令shell ，如_bash_ or _cmd.exe_支持“文件通配符”使用通配符來標識一組文件的能力。該[setPatternSyntax](qregexp.html#setPatternSyntax)（）函數用于正則表達式和通配符模式之間切換。通配符匹配是比全正則表達式更簡單，只有四個特點： | **c** | Any character represents itself apart from those mentioned below. Thus **c** matches the character _c_. | | **?** | Matches any single character. It is the same as **.** in full regexps. | | ***** | Matches zero or more of any characters. It is the same as **.*** in full regexps. | | **[...]** | Sets of characters can be represented in square brackets, similar to full regexps. Within the character class, like outside, backslash has no special meaning. | 在該模式中通配符，通配符不能逃脫。在該模式[WildcardUnix](qregexp.html#PatternSyntax-enum)，字符“\”轉義通配符。例如，如果我們在通配符模式，并有包含文件名的字符串，我們可以用識別HTML文件***.html**。這將匹配零個或多個字符后面跟著一個點后面'H' ， 'T' ， 'm'和'L' 。要測試的一個通配符表達式，使用一個字符串[exactMatch](qregexp.html#exactMatch)（）。例如： ``` QRegExp rx("*.txt"); rx.setPatternSyntax(QRegExp.Wildcard); rx.exactMatch("README.txt"); // returns true rx.exactMatch("welcome.txt.bak"); // returns false ``` ### Notes for Perl Users 大多數通過Perl的支持的字符類縮寫是由QRegExp支持，請參閱[characters and abbreviations for sets of characters](#characters-and-abbreviations-for-sets-of-characters)。在QRegExp ，除了字符類中，`^`始終表示字符串的開始，所以符號標識必須始終逃脫，除非用于這一目的。在Perl中插入符號的含義有所不同自動將不同的地方發生這樣逃脫它很少是必要的。這同樣適用于`$`這在QRegExp始終表示字符串的結束。 QRegExp的量詞是一樣的Perl的貪婪量詞（但見[note above](#greedy-quantifiers)）。非貪婪匹配，不能應用到單個量詞，但可以適用于所有模式的量詞。例如，要匹配的Perl的正則表達式**ro+?m**要求： ``` QRegExp rx("ro+m"); rx.setMinimal(true); ``` Perl的等效`/i`選項setCaseSensitivity （[Qt.CaseInsensitive](qt.html#CaseSensitivity-enum)）。 Perl的`/g`選項可以使用來模擬[loop](#cap-in-a-loop)。在QRegExp**.**匹配任何字符，因此所有的QRegExp正則表達式有Perl的等效`/s`選項。 QRegExp沒有一個等同于Perl的`/m`選項，但這可以通過把輸入到線路或通過用正則表達式，搜索換行符循環不同的方式，例如被模擬。因為QRegExp為導向的字符串，沒有\ A ， \ Z或\ ?斷言。不支持的\ G斷言，但可以在一個循環來模擬。 Perl的$ ＆是帽（ 0 ）或[capturedTexts](qregexp.html#capturedTexts)（） [0]。有沒有QRegExp等值$ ` ， $ '或$ + 。 Perl的捕獲變量， $ 1，$ 2，...對應于蓋（1 ）或[capturedTexts](qregexp.html#capturedTexts)（） [1] ，帽（ 2 ）或[capturedTexts](qregexp.html#capturedTexts)（）[ 2]等要替換的模式使用[QString.replace](qstring.html#replace)（）。 Perl的擴展`/x`不支持的語法，也不是指令，例如（？ ⅰ），或正則表達式的評論，例如（？＃注釋）。另一方面，C + +的表示字符串的規則可以用來實現相同的： ``` QRegExp mark("\\b" // word boundary "[Mm]ark" // the word we want to match ); ``` 兩個零寬度的正序和零寬度負預測先行斷言（？ =模式）和（？！模式）都具有相同的語法和Perl的支持。不支持Perl的向后斷言， “獨立”的子表達式和條件表達式。非捕獲括號也支持，具有相同的（：？模式）語法。 See [QString.split](qstring.html#split)（）和[QStringList.join](qstringlist.html#join)（）的等值Perl的分割和結合功能。注：因為C + +轉換\的，他們必須寫_twice_在代碼中，例如**\b**必須寫**\\b**。 ### Code Examples ``` QRegExp rx("^\\d\\d?$"); // match integers 0 to 99 rx.indexIn("123"); // returns -1 (no match) rx.indexIn("-6"); // returns -1 (no match) rx.indexIn("6"); // returns 0 (matched as position 0) ``` 第三個字符串匹配'6' 。這是一個簡單的驗證正則表達式的取值范圍為0 ?99的整數。 ``` QRegExp rx("^\\S+$"); // match strings without whitespace rx.indexIn("Hello world"); // returns -1 (no match) rx.indexIn("This_is-OK"); // returns 0 (matched at position 0) ``` 第二個字符串匹配'This_is-OK' 。我們已經使用的字符集的縮寫'\ S' （非空白）和錨相匹配的不含空格的字符串。在下面的例子中，我們匹配包含'郵件'或'信'或'對應'字符串，但只匹配整個單詞，即沒有“電子郵件” ``` QRegExp rx("\\b(mail|letter|correspondence)\\b"); rx.indexIn("I sent you an email"); // returns -1 (no match) rx.indexIn("Please write the letter"); // returns 17 ``` 第二個字符串匹配“請寫letter。 “這個詞'信'是還抓獲（因為括號），我們可以看到文字，我們已經捕獲像這樣： ``` [QString](qstring.html) captured = rx.cap(1); // captured == "letter" ``` 這將從第一組捕獲括號（計數捕獲左圓括號從左到右）捕獲文本。括號是從1 ，因為帽計數（ 0 ）是整個匹配正則表達式（相當于'＆'在大多數正則表達式引擎）。 ``` QRegExp rx("&(?!amp;)"); // match ampersands but not & [QString](qstring.html) line1 = "This & that"; line1.replace(rx, "&"); // line1 == "This & that" [QString](qstring.html) line2 = "His & hers & theirs"; line2.replace(rx, "&"); // line2 == "His & hers & theirs" ``` 在這里，我們已經通過了QRegExp到[QString](qstring.html)的替換（）函數來更換新的文本匹配的文本。 ``` [QString](qstring.html) str = "One Eric another Eirik, and an Ericsson. " "How many Eiriks, Eric?"; QRegExp rx("\\b(Eric|Eirik)\\b"); // match Eric or Eirik int pos = 0; // where we are in the string int count = 0; // how many Eric and Eirik's we've counted while (pos >= 0) { pos = rx.indexIn(str, pos); if (pos >= 0) { ++pos; // move along in str ++count; // count our Eric or Eirik } } ``` 我們使用了[indexIn](qregexp.html#indexIn)（）函數將字符串中的正則表達式匹配反復。請注意，而不是向前移動一個字符在一個時間，`pos++`我們可以寫`pos += rx.matchedLength()`跳過已經匹配的字符串。伯爵將等于3 ，配套“一Eric另一Eirik以及愛立信。有多少Eiriks ，Eric？“ ，它不匹配'愛立信'或' Eiriks '，因為它們不是由非單詞邊界為界。一個常見的使用正則表達式的是分隔的數據行拆分為它們的組件領域。 ``` str = "Nokia Corporation\tqt.nokia.com\tNorway"; [QString](qstring.html) company, web, country; rx.setPattern("^([^\t]+)\t([^\t]+)\t([^\t]+)$"); if (rx.indexIn(str) != -1) { company = rx.cap(1); web = rx.cap(2); country = rx.cap(3); } ``` 在這個例子中我們輸入行的格式為公司名稱，網址和國家。不幸的是，正則表達式是相當長的，而不是非常靈活 - 該代碼將打破，如果我們添加更多的字段。一個更簡單和更好的解決辦法是尋找分隔符， '\ T'在這種情況下，走周圍的文字。該[QString.split](qstring.html#split)（）函數可以接受一個分隔字符串或正則表達式作為參數，并相應的分割字符串。 ``` [QStringList](qstringlist.html) field = str.split("\t"); ``` 在此領域[ 0 ]是該公司，現場??[1]的網址等。模仿一個外殼的匹配，我們可以使用通配符模式。 ``` QRegExp rx("*.html"); rx.setPatternSyntax(QRegExp.Wildcard); rx.exactMatch("index.html"); // returns true rx.exactMatch("default.htm"); // returns false rx.exactMatch("readme.txt"); // returns false ``` 通配符匹配可能是因為它的簡單方便，但任何通配符正則表達式可以使用完整的正則表達式來定義，例如：**.*\.html$**。請注意，我們不能同時匹配`.html`和`.htm`帶通配符的文件，除非我們使用***.htm***這也將匹配' test.html.bak “ 。一個完整的正則表達式為我們提供了我們所需要的精度，**.*\.html?$**。 QRegExp可以使用不區分大小寫匹配情況[setCaseSensitivity](qregexp.html#setCaseSensitivity)（），并可以使用非貪婪匹配，請參閱[setMinimal](qregexp.html#setMinimal)（）。默認情況下QRegExp采用全正則表達式，但是這可以通過改變[setWildcard](index.htm#setWildcard)（）。搜索可以向前[indexIn](qregexp.html#indexIn)（）或向后[lastIndexIn](qregexp.html#lastIndexIn)（）。捕獲的文本可以使用訪問[capturedTexts](qregexp.html#capturedTexts)（），它返回所有捕獲的字符串的字符串列表，或者使用[cap](qregexp.html#cap)（），它返回捕獲的字符串給定的索引。該[pos](qregexp.html#pos)（）函數接受一個匹配索引，并返回那里的比賽作出（或-1，如果沒有匹配）在字符串中的位置。 * * * ## Type Documentation ``` QRegExp.CaretMode ``` 該CaretMode枚舉定義插入符的不同含義（**^**）在正則表達式。可能的值有： | Constant | Value | Description | | --- | --- | --- | | `QRegExp.CaretAtZero` | `0` | 插入符號對應于索引0中搜索字符串。 | | `QRegExp.CaretAtOffset` | `1` | 插入符號對應于開始搜索的偏移量。 | | `QRegExp.CaretWontMatch` | `2` | 插入符號永遠不匹配。 | ``` QRegExp.PatternSyntax ``` 語法用于解釋該圖案的含義。 | Constant | Value | Description | | --- | --- | --- | | `QRegExp.RegExp` | `0` | 豐富的Perl類似的模式匹配語法。這是默認的。 | | `QRegExp.RegExp2` | `3` | 喜歡的RegExp ，但與[greedy quantifiers](qregexp.html#greedy-quantifiers)。這將是默認的Qt 5 。（在Qt 4.2中引入）。 | | `QRegExp.Wildcard` | `1` | 這提供了“文件通配符”類似于使用的砲彈（命令解釋器）一個簡單的模式匹配的語法。看[Wildcard Matching](qregexp.html#wildcard-matching)。 | | `QRegExp.WildcardUnix` | `4` | 這類似于通配符而是一個Unix外殼的行為。通配符可以用轉義字符“ \ ” 。 | | `QRegExp.FixedString` | `2` | 模式是固定字符串。這相當于使用了RegExp模式上，所有的元字符進行轉義字符串使用[escape](qregexp.html#escape)（）。 | | `QRegExp.W3CXmlSchema11` | `5` | 該模式是一個正則表達式由W3C的XML Schema 1.1規范定義。 | **See also** [setPatternSyntax](qregexp.html#setPatternSyntax)（）。 * * * ## Method Documentation ``` QRegExp.__init__ (self) ``` 構造一個空的正則表達式。 **See also** [isValid](qregexp.html#isValid)（）和[errorString](qregexp.html#errorString)（）。 ``` QRegExp.__init__ (self, QString?pattern, Qt.CaseSensitivity?cs?=?Qt.CaseSensitive, PatternSyntax?syntax?=?QRegExp.RegExp) ``` 構造一個正則表達式對象為給定的_pattern_字符串。該模式必須使用通配符表示法，如果給予_syntax_ is [Wildcard](qregexp.html#PatternSyntax-enum)，默認為[RegExp](qregexp.html#PatternSyntax-enum)。該模式是區分大小寫的，除非_cs_ is [Qt.CaseInsensitive](qt.html#CaseSensitivity-enum)。匹配是貪婪的（最大），但可以通過調用改變[setMinimal](qregexp.html#setMinimal)（）。 **See also** [setPattern](qregexp.html#setPattern)（）[setCaseSensitivity](qregexp.html#setCaseSensitivity)（）和[setPatternSyntax](qregexp.html#setPatternSyntax)（）。 ``` QRegExp.__init__ (self, QRegExp?rx) ``` 構造一個正則表達式的一個副本_rx_。 **See also** [operator=](qregexp.html#operator-eq)（）。 ``` QString QRegExp.cap (self, int?nth?=?0) ``` 返回由捕獲的文本_nth_子表達式。在整場比賽的索引為0和括號的子表達式有索引從1開始（不包括非捕獲括號）。 ``` [QRegExp](qregexp.html) rxlen("(\\d+)(?:\\s*)(cm|inch)"); int pos = rxlen.indexIn("Length: 189cm"); if (pos > -1) { [QString](qstring.html) value = rxlen.cap(1); // "189" [QString](qstring.html) unit = rxlen.cap(2); // "cm" // ... } ``` 匹配由帽元件（）的順序如下所示。第一個元素，帽（ 0 ），是整個匹配的字符串。每個后續元素對應到下一個捕獲打開左括號。因此蓋（ 1 ）是第一個捕獲的括號的文本，帽（ 2 ）是第二的文本，等等。 **See also** [capturedTexts](qregexp.html#capturedTexts)（）和[pos](qregexp.html#pos)（）。 ``` int QRegExp.captureCount (self) ``` 返回包含在正則表達式捕獲的數量。此功能被引入Qt的4.6 。 ``` QStringList QRegExp.capturedTexts (self) ``` 返回捕獲的文本字符串的列表。該列表中的第一個字符串是整個匹配的字符串。每個后續的列表元素包含了匹配正則表達式的（捕獲）子表達式的字符串。例如： ``` [QRegExp](qregexp.html) rx("(\\d+)(\\s*)(cm|inch(es)?)"); int pos = rx.indexIn("Length: 36 inches"); [QStringList](qstringlist.html) list = rx.capturedTexts(); // list is now ("36 inches", "36", " ", "inches", "es") ``` 上面的例子還捕捉可能存在的元素，但我們有沒有興趣。這個問題可以通過使用非捕獲括號來解決： ``` [QRegExp](qregexp.html) rx("(\\d+)(?:\\s*)(cm|inch(?:es)?)"); int pos = rx.indexIn("Length: 36 inches"); [QStringList](qstringlist.html) list = rx.capturedTexts(); // list is now ("36 inches", "36", "inches") ``` 需要注意的是，如果你想遍歷列表，你應該遍歷一個副本，如 ``` [QStringList](qstringlist.html) list = rx.capturedTexts(); [QStringList](qstringlist.html).iterator it = list.begin(); while (it != list.end()) { myProcessing(*it); ++it; } ``` 一些正則表達式可以匹配一個不確定的次數。例如，如果輸入字符串為“偏移： 12 14 99 231 7 ”和正則表達式，`rx`，是**(\d+)+**，我們希望讓所有的數字相匹配的列表。但是，調用后`rx.indexIn(str)`， capturedTexts （）將返回列表中（ “12” ， “12” ），即在整場比賽是“ 12”和第一個子表達式匹配的是“12” 。正確的方法是使用[cap](qregexp.html#cap)在（）[loop](qregexp.html#cap-in-a-loop)。在字符串列表中元素的順序如下。第一個元素是整個匹配的字符串。每個后續元素對應到下一個捕獲打開左括號。因此capturedTexts （） [1]是第一個捕獲的括號， capturedTexts （） [ 2 ]是第二等（相當于$ 1，$ 2，等等，在其他一些正則表達式語言）文本的文本。 **See also** [cap](qregexp.html#cap)（）和[pos](qregexp.html#pos)（）。 ``` Qt.CaseSensitivity QRegExp.caseSensitivity (self) ``` [](qt.html#CaseSensitivity-enum) [Returns](qt.html#CaseSensitivity-enum) [Qt.CaseSensitive](qt.html#CaseSensitivity-enum)如果設置了RegExp是敏感匹配的情況下，否則返回[Qt.CaseInsensitive](qt.html#CaseSensitivity-enum)。 **See also** [setCaseSensitivity](qregexp.html#setCaseSensitivity)（）[patternSyntax](qregexp.html#patternSyntax)（）[pattern](qregexp.html#pattern)（）和[isMinimal](qregexp.html#isMinimal)（）。 ``` QString QRegExp.errorString (self) ``` 返回解釋了為什么一個正則表達式模式無效的情況下是一個文本字符串，否則返回“沒有發生錯誤” 。 **See also** [isValid](qregexp.html#isValid)（）。 ``` QString QRegExp.escape (QString?str) ``` 返回字符串_str_與每一個正則表達式的特殊字符用反斜杠轉義。特殊字符$ ??，（，）， * ， + ，，， [ ，， ] ， ^ ，？ { ， |和} 。例如： ``` s1 = [QRegExp](qregexp.html).escape("bingo"); // s1 == "bingo" s2 = [QRegExp](qregexp.html).escape("f(x)"); // s2 == "f\$x\$" ``` 此功能是動態構造的正則表達式模式有用： ``` [QRegExp](qregexp.html) rx("(" + [QRegExp](qregexp.html).escape(name) + "|" + [QRegExp](qregexp.html).escape(alias) + ")"); ``` **See also** [setPatternSyntax](qregexp.html#setPatternSyntax)（）。 ``` bool QRegExp.exactMatch (self, QString?str) ``` 返回True如果_str_精確匹配這個正則表達式，否則返回False 。您可以決定如何將字符串的多少是相匹配的調用[matchedLength](qregexp.html#matchedLength)（）。對于一個給定的正則表達式的字符串R，完全匹配（ “R” ）是indexIn （ “ ^ R $ ” ），因為完全匹配相當于（）有效地封閉了正則表達式在字符串和字符串錨年底開始，除了它設置[matchedLength](qregexp.html#matchedLength)（）不同。例如，如果正則表達式是**blue**，然后完全匹配（）只適用于輸入返回True`blue`。對于輸入`bluebell`，`blutak`和`lightblue`，完全匹配（）返回False ，并[matchedLength](qregexp.html#matchedLength)（）將返回4,3和0分別。雖然常量，這個函數集[matchedLength](qregexp.html#matchedLength)（）[capturedTexts](qregexp.html#capturedTexts)（）和[pos](qregexp.html#pos)（）。 **See also** [indexIn](qregexp.html#indexIn)（）和[lastIndexIn](qregexp.html#lastIndexIn)（）。 ``` int QRegExp.indexIn (self, QString?str, int?offset?=?0, CaretMode?caretMode?=?QRegExp.CaretAtZero) ``` 試圖找到一個匹配_str_從位置_offset_（默認為0 ）。如果_offset_為-1，搜索從最后一個字符，如果-2 ，在倒數第二個字符，等等。返回第一個匹配，或者-1的位置，如果沒有匹配。該_caretMode_參數可以被用來指示是否**^**應該匹配在索引0或_offset_。你可能更愿意使用[QString.indexOf](qstring.html#indexOf)（）[QString.contains](qstring.html#contains)（），或什至[QStringList.filter](qstringlist.html#filter)（）。要替換的匹配使用[QString.replace](qstring.html#replace)（）。例如： ``` [QString](qstring.html) str = "offsets: 1.23 .50 71.00 6.00"; [QRegExp](qregexp.html) rx("\\d*\\.\\d+"); // primitive floating point matching int count = 0; int pos = 0; while ((pos = rx.indexIn(str, pos)) != -1) { ++count; pos += rx.matchedLength(); } // pos will be 9, 14, 18 and finally 24; count will end up as 4 ``` 雖然常量，這個函數集[matchedLength](qregexp.html#matchedLength)（）[capturedTexts](qregexp.html#capturedTexts)（）和[pos](qregexp.html#pos)（）。如果[QRegExp](qregexp.html)是一個通配符表達式（見[setPatternSyntax](qregexp.html#setPatternSyntax)（）），并希望測試對整個通配符表達式，使用一個字符串[exactMatch](qregexp.html#exactMatch)（代替此功能）。 **See also** [lastIndexIn](qregexp.html#lastIndexIn)（）和[exactMatch](qregexp.html#exactMatch)（）。 ``` bool QRegExp.isEmpty (self) ``` 返回True如果該模式字符串為空，否則返回False 。如果你打電話[exactMatch](qregexp.html#exactMatch)（）與空字符串的空模式將返回True，否則它，因為它工作在整個字符串返回False 。如果你打電話[indexIn](qregexp.html#indexIn)（）對空模式_any_字符串將返回起始位置的偏移（默認為0 ），因為空模式在字符串的開頭匹配的“空虛” 。在這種情況下返回的匹配的長度[matchedLength](qregexp.html#matchedLength)（）為0。 See [QString.isEmpty](qstring.html#isEmpty)（）。 ``` bool QRegExp.isMinimal (self) ``` 返回True如果已啟用最小的（非貪婪）匹配，否則返回False 。 **See also** [caseSensitivity](qregexp.html#caseSensitivity)（）和[setMinimal](qregexp.html#setMinimal)（）。 ``` bool QRegExp.isValid (self) ``` 返回True如果正則表達式是有效的，否則返回False 。無效的正則表達式永遠不匹配。該模式**[a-z**是無效的模式的一個例子，因為它缺少結束括號。請注意，一個正則表達式的有效性也可能依賴于通配符標志的設置，例如***.html**是一種有效的通配符正則表達式，但無效的正則表達式滿。 **See also** [errorString](qregexp.html#errorString)（）。 ``` int QRegExp.lastIndexIn (self, QString?str, int?offset?=?-1, CaretMode?caretMode?=?QRegExp.CaretAtZero) ``` 試圖找到向后在比賽中_str_從位置_offset_。如果_offset_為-1 （默認），搜索從最后一個字符，如果-2 ，在倒數第二個字符，等等。返回第一個匹配，或者-1的位置，如果沒有匹配。該_caretMode_參數可以被用來指示是否**^**應該匹配在索引0或_offset_。雖然常量，這個函數集[matchedLength](qregexp.html#matchedLength)（）[capturedTexts](qregexp.html#capturedTexts)（）和[pos](qregexp.html#pos)（）。 **Warning:**反向搜索比正向搜索要慢很多。 **See also** [indexIn](qregexp.html#indexIn)（）和[exactMatch](qregexp.html#exactMatch)（）。 ``` int QRegExp.matchedLength (self) ``` 返回最后一個匹配的字符串，或-1的長度，如果沒有匹配。 **See also** [exactMatch](qregexp.html#exactMatch)（）[indexIn](qregexp.html#indexIn)（）和[lastIndexIn](qregexp.html#lastIndexIn)（）。 ``` int QRegExp.numCaptures (self) ``` ``` QString QRegExp.pattern (self) ``` 返回正則表達式的模式字符串。該模式具有下列正則表達式語法或通配符語法，這取決于[patternSyntax](qregexp.html#patternSyntax)（）。 **See also** [setPattern](qregexp.html#setPattern)（）[patternSyntax](qregexp.html#patternSyntax)（）和[caseSensitivity](qregexp.html#caseSensitivity)（）。 ``` PatternSyntax QRegExp.patternSyntax (self) ``` [](qregexp.html#PatternSyntax-enum) [返回所使用的正則表達式的語法。默認值是](qregexp.html#PatternSyntax-enum)[QRegExp.RegExp](qregexp.html#PatternSyntax-enum)。 **See also** [setPatternSyntax](qregexp.html#setPatternSyntax)（）[pattern](qregexp.html#pattern)（）和[caseSensitivity](qregexp.html#caseSensitivity)（）。 ``` int QRegExp.pos (self, int?nth?=?0) ``` 返回的位置_nth_在搜索字符串捕獲的文本。如果_nth_為0 （默認值）， POS （）返回整個匹配的位置。例如： ``` [QRegExp](qregexp.html) rx("/([a-z]+)/([a-z]+)"); rx.indexIn("Output /dev/null"); // returns 7 (position of /dev/null) rx.pos(0); // returns 7 (position of /dev/null) rx.pos(1); // returns 8 (position of dev) rx.pos(2); // returns 12 (position of null) ``` 對于零長度匹配， POS （）總是返回-1 。（例如，如果蓋（ 4 ）將返回一個空字符串， POS （ 4 ）返回-1。）這是實現的一個特點。 **See also** [cap](qregexp.html#cap)（）和[capturedTexts](qregexp.html#capturedTexts)（）。 ``` QRegExp.setCaseSensitivity (self, Qt.CaseSensitivity?cs) ``` 設置區分大小寫匹配_cs_。 If _cs_ is [Qt.CaseSensitive](qt.html#CaseSensitivity-enum)，**\.txt$** matches `readme.txt`但不`README.TXT`。 **See also** [caseSensitivity](qregexp.html#caseSensitivity)（）[setPatternSyntax](qregexp.html#setPatternSyntax)（）[setPattern](qregexp.html#setPattern)（）和[setMinimal](qregexp.html#setMinimal)（）。 ``` QRegExp.setMinimal (self, bool?minimal) ``` 啟用或禁用最小匹配。如果_minimal_是假的，匹配是貪婪的（最大），這是默認的。例如，假設我們有輸入字符串：“我們必須\u003cb\u003e粗體\u003c / b\u003e中，非常\u003cb\u003e粗體\u003c / B\u003e ！ ”并且圖案**.***。用默認的貪婪（最大）的匹配，與之匹配的是“我們必須bold, very bold！ “，但以最小的（非貪婪）匹配時，第一場比賽是： ”我們必須bold，非常\u003cB\u003e \u003c / B\u003e大膽！ “，而第二場比賽是”我們必須\u003cb\u003e粗體\u003c / b\u003e中，很bold！ “ 。在實踐中，我們可能會使用該模式**[^<]***相反，盡管這仍然會為嵌套標記失敗。 **See also** [minimal](index.htm#minimal)（）和[setCaseSensitivity](qregexp.html#setCaseSensitivity)（）。 ``` QRegExp.setPattern (self, QString?pattern) ``` 模式字符串設置為_pattern_。區分大小寫，通配符，和最小的匹配選項不會改變。 **See also** [pattern](qregexp.html#pattern)（）[setPatternSyntax](qregexp.html#setPatternSyntax)（）和[setCaseSensitivity](qregexp.html#setCaseSensitivity)（）。 ``` QRegExp.setPatternSyntax (self, PatternSyntax?syntax) ``` 設置語法模式的正則表達式。默認值是[QRegExp.RegExp](qregexp.html#PatternSyntax-enum)。 Setting _syntax_至[QRegExp.Wildcard](qregexp.html#PatternSyntax-enum)讓簡單的殼狀[wildcard matching](qregexp.html#wildcard-matching)。例如，**r*.txt**字符串匹配`readme.txt`在通配符模式，但不匹配`readme`。 Setting _syntax_至[QRegExp.FixedString](qregexp.html#PatternSyntax-enum)表示該圖案被解釋為一個簡單的字符串。特殊字符（如反斜杠）不需要進行轉義即可。 **See also** [patternSyntax](qregexp.html#patternSyntax)（）[setPattern](qregexp.html#setPattern)（）[setCaseSensitivity](qregexp.html#setCaseSensitivity)（）和[escape](qregexp.html#escape)（）。 ``` QRegExp.swap (self, QRegExp?other) ``` 掉期的正則表達式_other_與此正則表達式。這個操作是非常快的，而且永遠不會。此功能被引入Qt的4.8 。 ``` bool QRegExp.__eq__ (self, QRegExp?rx) ``` ``` bool QRegExp.__ne__ (self, QRegExp?rx) ``` ``` str QRegExp.__repr__ (self) ```