在Facebook中的「SAS戰術應用精研社」,網友偶會提到使用文字函數處理中文時遇到一些的狀況,如果使用一搬文字函數如compress、scan、substr、index等可能是得到一堆亂碼,或是找不到設定文字,又或是找到非預期中的字,或等,例如鄭姓網友所提的問題:
data a1;
input string $20.;
cards;
洛瓦
北極光
野生動物
黃色小鴨
窈窕曲線包
手藝有限
;
data a2;
set a1;
idx1=index(string,'孕');
idx2=index(string,'嬰');
run;
proc print data=a2;
run;
這6個字串裡面沒有「看到」任何「孕」或「嬰」,但用index函數卻得到前3個字串有孕字,後三個字串有嬰字。這一類的問題主要為中文為雙字元編碼,SAS傳統的文字函數適合處理單字元編碼的函數。
在SAS Base建有處理雙位元的函數,在說明檔案中,SAS將這一系列稱為『K Functions』,詳細K函數請見SAS官方網頁。
最後擷取說明檔中部分K函數的說明:
| 函數 | 說明 |
| KCOMPARE | Returns the result of a comparison of character expressions. |
| KCOMPRESS | Removes specified characters from a character expression. |
| KCOUNT | Returns the number of double-byte characters in an expression. |
| KINDEX | Searches a character expression for a string of characters. |
| KINDEXC | Searches a character expression for specified characters. |
| KLEFT | Left-aligns a character expression by removing unnecessary leading DBCS blanks and SO/SI. |
| KLENGTH | Returns the length of an argument. |
| KLOWCASE | Converts all letters in an argument to lowercase. |
| KPROPCASE | Converts Chinese, Japanese, Korean, Taiwanese (CJKT) characters. |
| KPROPCHAR | Converts special characters to normal characters. |
| KPROPDATA | Removes or converts unprintable characters. |
| KREVERSE | Reverses a character expression. |
| KRIGHT | Right-aligns a character expression by trimming trailing DBCS blanks and SO/SI. |
| KSCAN | Selects a specified word from a character expression. |
| KSTRCAT | Concatenates two or more character expressions. |
| KSUBSTR | Extracts a substring from an argument. |
| KSUBSTRB | Extracts a substring from an argument according to the byte position of the substring in the argument. |
| KTRANSLATE | Replaces specific characters in a character expression. |
| KTRIM | Removes trailing DBCS blanks and SO/SI from character expressions. |
| KTRUNCATE | Truncates a string to a specified length in byte unit without breaking multibyte characters. |
| KUPCASE | Converts all letters in an argument to uppercase. |
| KUPDATE | Inserts, deletes, and replaces character value contents. |
| KUPDATEB | Inserts, deletes, and replaces the contents of the character value according to the byte position of the character value in the argument. |
| KVERIFY | Returns the position of the first character that is unique to an expression. |