it-swarm.com.de

Wie entferne ich verlängerte ASCII Zeichen aus einer Zeichenfolge in T-SQL?

Ich muss erweiterte ASCII Zeichen aus einer SELECT-Anweisung in T-SQL herausfiltern (entfernen).

Ich verwende dazu eine gespeicherte Prozedur.

Erwartete Eingabe:

ËËËËeeeeËËËË

Erwartete Ausgabe:

eeee

Alles was ich gefunden habe ist für MySQL .

Ich benutze :

Microsoft SQL Server Management Studio  11.0.2100.60
Microsoft .NET Framework    4.0.30319.17929
10
Codeman

OK, probieren Sie es aus. Es scheint das gleiche Problem zu sein, das sie haben. Auf jeden Fall müssen Sie es an Ihre Anforderungen anpassen.

CREATE FUNCTION RemoveNonASCII 
(
    @nstring nvarchar(255)
)
RETURNS varchar(255)
AS
BEGIN

    DECLARE @Result varchar(255)
    SET @Result = ''

    DECLARE @nchar nvarchar(1)
    DECLARE @position int

    SET @position = 1
    WHILE @position <= LEN(@nstring)
    BEGIN
        SET @nchar = SUBSTRING(@nstring, @position, 1)
        --Unicode & ASCII are the same from 1 to 255.
        --Only Unicode goes beyond 255
        --0 to 31 are non-printable characters
        IF UNICODE(@nchar) between 32 and 255
            SET @Result = @Result + @nchar
        SET @position = @position + 1
    END

    RETURN @Result

END
GO

Überprüfen Sie es unter SqlServerCentral

30
amit kohan

Die akzeptierte Antwort verwendet eine Schleife, die vermieden werden sollte ...

Meine Lösung ist vollständig inlinierbar, es ist einfach, daraus eine UDF (oder vielleicht sogar einen Inline-TVF) zu erstellen.

Die Idee: Erstellen Sie eine Reihe laufender Nummern (hier ist die Anzahl der Objekte in sys.objects begrenzt, aber es gibt eine Menge Beispiele dafür, wie Sie schnell eine Zahlenliste erstellen können). Im zweiten CTE werden die Zeichenfolgen in einzelne Zeichen aufgeteilt. Die endgültige Auswahl wird mit der gereinigten Zeichenfolge zurückgegeben.

DECLARE @tbl TABLE(ID INT IDENTITY, EvilString NVARCHAR(100));
INSERT INTO @tbl(EvilString) VALUES('ËËËËeeeeËËËË'),('ËaËËbËeeeeËËËcË');

WITH RunningNumbers AS
(
    SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS Nmbr
    FROM sys.objects
)
,SingleChars AS
(
    SELECT tbl.ID,rn.Nmbr,SUBSTRING(tbl.EvilString,rn.Nmbr,1) AS Chr
    FROM @tbl AS tbl
    CROSS APPLY (SELECT TOP(LEN(tbl.EvilString)) Nmbr FROM RunningNumbers) AS rn 
)
SELECT ID,EvilString
      ,(
        SELECT '' + Chr 
        FROM SingleChars AS sc
        WHERE sc.ID=tbl.ID AND ASCII(Chr)<128
        ORDER BY sc.Nmbr
        FOR XML PATH('')
      ) AS GoodString
FROM @tbl As tbl

Das Ergebnis

1   ËËËËeeeeËËËË    eeee
2   ËaËËbËeeeeËËËcË abeeeec

Hier ist eine andere Antwort von mir, wo dieser Ansatz verwendet wird, um alle special - Zeichen durch secure - Zeichen zu ersetzen, um plain latin zu erhalten. 

4
Shnugo

Vielen Dank, dass Sie Ihren Code freigegeben haben.

Ich brauchte so etwas, nicht nur für das Bereinigen von E-Mail-Adressen, sondern für allgemeine Zwecke, indem die Eingabe von Benutzerwebsites gefiltert wurde, bevor sie über Integrationsmodule an SAP ERP gelangt.

Beim Laufen in Brasilien muss es sprachlichen Akzenten gehorchen ...

Hier ist der resultierende Code. 

Vielleicht kann es jemandem eines Tages helfen, so wie es mir angetan hat.

     IF EXISTS
     (
         SELECT * 
           FROM sys.objects 
          WHERE object_id = OBJECT_ID(N'[dbo].[fnRemoveNonASCII]') 
            AND type IN (N'FN')
     )
         DROP FUNCTION dbo.fnRemoveNonASCII
     GO

     CREATE FUNCTION [dbo].[fnRemoveNonASCII] 
     (
         @nstring nvarchar(MAX) 
     )
     RETURNS nvarchar(MAX)
     AS
     BEGIN

         DECLARE @nchar nvarchar(1)      -- individual char in string
         DECLARE @nUnicode nvarchar(3)   -- ASCII for individual char in string
         DECLARE @position int           -- subscript to control loop in the string
         DECLARE @Result nvarchar(MAX)   -- return valus

         SET @Result = ''
         SET @position = 1

         WHILE @position <= LEN(@nstring)

            BEGIN

                --Unicode & ASCII are the same from 1 to 255.
                --Only Unicode goes beyond 255
                --0 to 31 are non-printable characters

                SET @nchar = SUBSTRING(@nstring, @position, 1)

                SET @nUnicode = UNICODE(@nChar) 

                IF @nUnicode = 10 
                   OR @nUnicode = 13
                   OR @nUnicode BETWEEN 32  AND 126
                   OR @nUnicode = 160
                   OR @nUnicode BETWEEN 192 AND 207
                   OR @nUnicode BETWEEN 210 AND 213
                   OR @nUnicode BETWEEN 217 AND 219
                   OR @nUnicode BETWEEN 224 AND 227
                   OR @nUnicode BETWEEN 231 AND 234
                   OR @nUnicode = 236
                   OR @nUnicode = 237
                   OR @nUnicode BETWEEN 242 AND 245
                   OR @nUnicode = 247
                   OR @nUnicode = 249
                   OR @nUnicode = 250        

                   SET @Result = @Result + @nchar

                ELSE IF @nUnicode = 9 -- TAB
                   SET @Result =  @Result + '  '

                ELSE
                   SET @Result =  @Result + ' '

                SET @position = @position + 1
         END

         RETURN @Result
     END

     /*
     ---------------------------------------------------------------------------------------------------------------

     -- Tabela dos caracteres Unicode/ASCII exportáveis

     &#9;   |   &#x9;      |    %9     =    TAB
     &#10;     |    &#xa;      |    %a     =    0A   Line Feed
     &#13;     |    &#xd;      |    %d     =    0D   Carriage Return
     &#32;      |   &#x20;  |   %20 =   <space>  
     &#33;     |    &#x21;  |   %21 =   !            
     &#34;     |    &#x22;  |   %22 =   "            
     &#35;     |    &#x23;  |   %23 =   #            
     &#36;     |    &#x24;  |   %24 =   $            
     &#37;     |    &#x25;  |   %25 =   %            
     &#38;     |    &#x26;  |   %26 =   &            
     &#39;     |    &#x27;  |   %27 =   '            
     &#40;     |    &#x28;  |   %28 =   (            
     &#41;     |    &#x29;  |   %29 =   )            
     &#42;     |    &#x2a;  |   %2a =   *            
     &#43;     |    &#x2b;  |   %2b =   +            
     &#44;     |    &#x2c;  |   %2c =   ,            
     &#45;     |    &#x2d;  |   %2d =   -            
     &#46;     |    &#x2e;  |   %2e =   .            
     &#47;     |    &#x2f;  |   %2f =   /            
     &#48;     |    &#x30;  |   %30 =   0            
     &#49;     |    &#x31;  |   %31 =   1            
     &#50;     |    &#x32;  |   %32 =   2            
     &#51;     |    &#x33;  |   %33 =   3            
     &#52;     |    &#x34;  |   %34 =   4            
     &#53;     |    &#x35;  |   %35 =   5            
     &#54;     |    &#x36;  |   %36 =   6            
     &#55;     |    &#x37;  |   %37 =   7            
     &#56;     |    &#x38;  |   %38 =   8            
     &#57;     |    &#x39;  |   %39 =   9            
     &#58;     |    &#x3a;  |   %3a =   :            
     &#59;     |    &#x3b;  |   %3b =   ;            
     &#60;     |    &#x3c;  |   %3c =   <            
     &#61;     |    &#x3d;  |   %3d =   =            
     &#62;     |    &#x3e;  |   %3e =   >            
     &#63;     |    &#x3f;  |   %3f =   ?            
     &#64;     |    &#x40;  |   %40 =   @   
     &#65;     |    &#x41;  |   %41 =   A
     &#66;     |    &#x42;  |   %42 =   B
     &#67;     |    &#x43;  |   %43 =   C
     &#68;     |    &#x44;  |   %44 =   D
     &#69;     |    &#x45;  |   %45 =   E
     &#70;     |    &#x46;  |   %46 =   F
     &#71;     |    &#x47;  |   %47 =   G
     &#72;     |    &#x48;  |   %48 =   H
     &#73;     |    &#x49;  |   %49 =   I
     &#74;     |    &#x4a;  |   %4a =   J
     &#75;     |    &#x4b;  |   %4b =   K
     &#76;     |    &#x4c;  |   %4c =   L
     &#77;     |    &#x4d;  |   %4d =   M
     &#78;     |    &#x4e;  |   %4e =   N
     &#79;     |    &#x4f;  |   %4f =   O 
     &#80;     |    &#x50;  |   %50 =   P
     &#81;     |    &#x51;  |   %51 =   Q
     &#82;     |    &#x52;  |   %52 =   R
     &#83;     |    &#x53;  |   %53 =   S
     &#84;     |    &#x54;  |   %54 =   T
     &#85;     |    &#x55;  |   %55 =   U
     &#86;     |    &#x56;  |   %56 =   V
     &#87;     |    &#x57;  |   %57 =   W
     &#88;     |    &#x58;  |   %58 =   X
     &#89;     |    &#x59;  |   %59 =   Y
     &#90;     |    &#x5a;  |   %5a =   Z
     &#91;     |    &#x5b;  |   %5b =   [
     &#92;     |    &#x5c;  |   %5c =   \
     &#93;     |    &#x5d;  |   %5d =   ]
     &#94;     |    &#x5e;  |   %5e =   ^
     &#95;     |    &#x5f;  |   %5f =   _
     &#96;     |    &#x60;  |   %60 =   `
     &#97;     |    &#x61;  |   %61 =   a
     &#98;     |    &#x62;  |   %62 =   b
     &#99;     |    &#x63;  |   %63 =   c
     &#100; |   &#x64;  |   %64 =   d
     &#101; |   &#x65;  |   %65 =   e
     &#102; |   &#x66;  |   %66 =   f
     &#103; |   &#x67;  |   %67 =   g
     &#104; |   &#x68;  |   %68 =   h
     &#105; |   &#x69;  |   %69 =   i
     &#106; |   &#x6a;  |   %6a =   j
     &#107; |   &#x6b;  |   %6b =   k
     &#108; |   &#x6c;  |   %6c =   l
     &#109; |   &#x6d;  |   %6d =   m
     &#110; |   &#x6e;  |   %6e =   n
     &#111; |   &#x6f;  |   %6f =   o
     &#112; |   &#x70;  |   %70 =   p
     &#113; |   &#x71;  |   %71 =   q
     &#114; |   &#x72;  |   %72 =   r
     &#115; |   &#x73;  |   %73 =   s
     &#116; |   &#x74;  |   %74 =   t
     &#117; |   &#x75;  |   %75 =   u
     &#118; |   &#x76;  |   %76 =   v
     &#119; |   &#x77;  |   %77 =   w
     &#120; |   &#x78;  |   %78 =   x
     &#121; |   &#x79;  |   %79 =   y
     &#122; |   &#x7a;  |   %7a =   z
     &#123; |   &#x7b;  |   %7b =   {
     &#124; |   &#x7c;  |   %7c =   |
     &#125; |   &#x7d;  |   %7d =   }
     &#126; |   &#x7e;  |   %7e =   ~
     &#160; |   &#xa0;  |   %a0 =   <nbsp>      
     &#192; |   &#xc0;  |   %c0 =   À
     &#193; |   &#xc1;  |   %c1 =   Á
     &#194; |   &#xc2;  |   %c2 =   Â
     &#195; |   &#xc3;  |   %c3 =   Ã
     &#196; |   &#xc4;  |   %c4 =   Ä
     &#197; |   &#xc5;  |   %c5 =   Å
     &#198; |   &#xc6;  |   %c6 =   Æ
     &#199; |   &#xc7;  |   %c7 =   Ç
     &#200; |   &#xc8;  |   %c8 =   È
     &#201; |   &#xc9;  |   %c9 =   É
     &#202; |   &#xca;  |   %ca =   Ê
     &#203; |   &#xcb;  |   %cb =   Ë
     &#204; |   &#xcc;  |   %cc =   Ì
     &#205; |   &#xcd;  |   %cd =   Í
     &#206; |   &#xce;  |   %ce =   Î
     &#207; |   &#xcf;  |   %cf =   Ï
     &#210; |   &#xd2;  |   %d2 =   Ò
     &#211; |   &#xd3;  |   %d3 =   Ó
     &#212; |   &#xd4;  |   %d4 =   Ô
     &#213; |   &#xd5;  |   %d5 =   Õ
     &#217; |   &#xd9;  |   %d9 =   Ù
     &#218; |   &#xda;  |   %da =   Ú
     &#219; |   &#xdb;  |   %db =   Û
     &#224; |   &#xe0;  |   %e0 =   à
     &#225; |   &#xe1;  |   %e1 =   á
     &#226; |   &#xe2;  |   %e2 =   â
     &#227; |   &#xe3;  |   %e3 =   ã
     &#231; |   &#xe7;  |   %e7 =   ç
     &#232; |   &#xe8;  |   %e8 =   è
     &#233; |   &#xe9;  |   %e9 =   é
     &#234; |   &#xea;  |   %ea =   ê
     &#236; |   &#xec;  |   %ec =   ì
     &#237; |   &#xed;  |   %ed =   í
     &#242; |   &#xf2;  |   %f2 =   ò
     &#243; |   &#xf3;  |   %f3 =   ó
     &#244; |   &#xf4;  |   %f4 =   ô
     &#245; |   &#xf5;  |   %f5 =   õ
     &#247; |   &#xf7;  |   %f7 =   ÷
     &#249; |   &#xf9;  |   %f9 =   ù
     &#250; |   &#xfa;  |   %fa =   ú
     */

     GO     
1
AlxSts

Nur den obigen Code korrigieren (es wurden DOTs geschnitten)

CREATE FUNCTION [dbo].[RemoveNonASCII] 
(
    @nstring nvarchar(255)
)
RETURNS nvarchar(255)
AS
BEGIN

    DECLARE @Result nvarchar(255)
    SET @Result = ''

    DECLARE @nchar nvarchar(1)
    DECLARE @position int

    SET @position = 1
    WHILE @position <= LEN(@nstring)
    BEGIN
        SET @nchar = SUBSTRING(@nstring, @position, 1)
        --Unicode & ASCII are the same from 1 to 255.
        --Only Unicode goes beyond 255
        --0 to 31 are non-printable characters
        IF (UNICODE(@nchar) between 192 and 198) or (UNICODE(@nchar) between 225 and 230) -- letter A or a with accents
            SET @nchar = 'a'
        IF (UNICODE(@nchar) between 200 and 203) or (UNICODE(@nchar) between 232 and 235) -- letter E or e with accents
            SET @nchar = 'e'
        IF (UNICODE(@nchar) between 204 and 207) or (UNICODE(@nchar) between 236 and 239) -- letter I or i with accents
            SET @nchar = 'i'
        IF (UNICODE(@nchar) between 210 and 214) or (UNICODE(@nchar) between 242 and 246) or (UNICODE(@nchar)=240) -- letter O or o with accents
            SET @nchar = 'o'
        IF (UNICODE(@nchar) between 217 and 220) or (UNICODE(@nchar) between 249 and 252)  -- letter U or u with accents
            SET @nchar = 'u'
        IF (UNICODE(@nchar)=199)  or (UNICODE(@nchar)=231)  -- letter Ç or ç 
            SET @nchar = 'c'
        IF (UNICODE(@nchar)=209)  or (UNICODE(@nchar)=241)  -- letter Ñ or ñ 
            SET @nchar = 'n'
        IF (UNICODE(@nchar) between 45 and 46) or (UNICODE(@nchar) between 48 and 57) or (UNICODE(@nchar)  between 64 and 90) or (UNICODE(@nchar) = 95) or (UNICODE(@nchar)  between 97 and 122)
            SET @Result = @Result + @nchar
        SET @position = @position + 1
    END
    set @Result = lower(@Result) -- e-mails in lower case
    RETURN @Result

END
0

Die Lösung von Amit Kohan funktioniert zwar, war aber für mich zu langsam. Bei großen Zeichenfolgendaten ist die Iteration durch jedes Zeichen nicht optimal. Ich habe die folgende Funktion gemacht, mit ist zwar nicht sehr kompakt aber schnell.

CREATE FUNCTIONdbo.F_StripLowAscii
(
   @Name nvarchar(max)
)
RETURN nvarchar(max) as
BEGIN
DECLARE @Result nvarchar(max) 

    If @Name IS NULL 
      RETURN @Name 

    DECLARE @BlankRange VARCHAR(15) 
    DECLARE @FoundAt INTEGER 

    -- ASCII CHAR #0 needs a special treatment
    SET @BlankRange = '%[' + CHAR(0) + ']%' 
    SET @FoundAt = PATINDEX(@BlankRange ,@Name COLLATE SQL_Latin1_General_CP850_Bin) 
    WHILE @FoundAt > 0 
    BEGIN 
        SET @name = left(@name, @FoundAt-1 ) + SUBSTRING(@name, @FoundAt+1, LEN(@Name))
        SET @FoundAt = PATINDEX(@BlankRange ,@Name COLLATE SQL_Latin1_General_CP850_Bin) 
    END 


    SET @BlankRange = '%[' + CHAR(1)+'-'+CHAR(8) + ']%' 
    SET @FoundAt = PATINDEX(@BlankRange ,@Name COLLATE SQL_Latin1_General_CP850_Bin) 
    WHILE @FoundAt > 0 
    BEGIN 
        SET @name = Replace(@Name, SUBSTRING(@Name, @FoundAt,1),'')
        SET @FoundAt = PATINDEX(@BlankRange ,@Name COLLATE SQL_Latin1_General_CP850_Bin) 
    END 

    SET @BlankRange = '%[' + CHAR(11)+'-'+CHAR(12) + ']%' 
    SET @FoundAt = PATINDEX(@BlankRange ,@Name COLLATE SQL_Latin1_General_CP850_Bin) 
    WHILE @FoundAt > 0 
    BEGIN 
        SET @name = Replace(@Name, SUBSTRING(@Name, @FoundAt,1),'')
        SET @FoundAt = PATINDEX(@BlankRange ,@Name COLLATE SQL_Latin1_General_CP850_Bin) 
    END 

    SET @BlankRange = '%[' + CHAR(14)+'-'+CHAR(31) + ']%' 
    SET @FoundAt = PATINDEX(@BlankRange ,@Name COLLATE SQL_Latin1_General_CP850_Bin) 
    WHILE @FoundAt > 0 
    BEGIN 
        SET @name = Replace(@Name, SUBSTRING(@Name, @FoundAt,1),'')
        SET @FoundAt = PATINDEX(@BlankRange ,@Name COLLATE SQL_Latin1_General_CP850_Bin) 
    END 

    RETURN @Name 

END
GO
0
brenkdar