在Facebook中的「SAS戰術應用精研社」,網友偶會提到使用文字函數處理中文時遇到一些的狀況,如果使用一搬文字函數如compress、scan、substr、index等可能是得到一堆亂碼,或是找不到設定文字,又或是找到非預期中的字,或等,例如鄭姓網友所提的問題:
data a1;
input string $20.;
cards;
洛瓦
北極光
野生動物
黃色小鴨
窈窕曲線包
手藝有限
;
data a2;
set a1;
idx1=index(string,'孕');
idx2=index(string,'嬰');
run;
proc print data=a2;
run;
這6個字串裡面沒有「看到」任何「孕」或「嬰」,但用index函數卻得到前3個字串有孕字,後三個字串有嬰字。這一類的問題主要為中文為雙字元編碼,SAS傳統的文字函數適合處理單字元編碼的函數。
在SAS Base建有處理雙位元的函數,在說明檔案中,SAS將這一系列稱為『K Functions』,詳細K函數請見SAS官方網頁。
最後擷取說明檔中部分K函數的說明:
函數 | 說明 |
KCOMPARE | Returns the result of a comparison of character expressions. |
KCOMPRESS | Removes specified characters from a character expression. |
KCOUNT | Returns the number of double-byte characters in an expression. |
KINDEX | Searches a character expression for a string of characters. |
KINDEXC | Searches a character expression for specified characters. |
KLEFT | Left-aligns a character expression by removing unnecessary leading DBCS blanks and SO/SI. |
KLENGTH | Returns the length of an argument. |
KLOWCASE | Converts all letters in an argument to lowercase. |
KPROPCASE | Converts Chinese, Japanese, Korean, Taiwanese (CJKT) characters. |
KPROPCHAR | Converts special characters to normal characters. |
KPROPDATA | Removes or converts unprintable characters. |
KREVERSE | Reverses a character expression. |
KRIGHT | Right-aligns a character expression by trimming trailing DBCS blanks and SO/SI. |
KSCAN | Selects a specified word from a character expression. |
KSTRCAT | Concatenates two or more character expressions. |
KSUBSTR | Extracts a substring from an argument. |
KSUBSTRB | Extracts a substring from an argument according to the byte position of the substring in the argument. |
KTRANSLATE | Replaces specific characters in a character expression. |
KTRIM | Removes trailing DBCS blanks and SO/SI from character expressions. |
KTRUNCATE | Truncates a string to a specified length in byte unit without breaking multibyte characters. |
KUPCASE | Converts all letters in an argument to uppercase. |
KUPDATE | Inserts, deletes, and replaces character value contents. |
KUPDATEB | Inserts, deletes, and replaces the contents of the character value according to the byte position of the character value in the argument. |
KVERIFY | Returns the position of the first character that is unique to an expression. |
沒有留言:
張貼留言