11.1.Deobfuscation Presentation

download 11.1.Deobfuscation Presentation

of 45

Transcript of 11.1.Deobfuscation Presentation

  • 8/12/2019 11.1.Deobfuscation Presentation

    1/45

    DeobfuscationReverse Engineering Obfuscated Code

    by Sharath K. Udupa, Saumya K. Debray and Matias Madou

    Presented by Fabrizio Steiner

    1

  • 8/12/2019 11.1.Deobfuscation Presentation

    2/45

    Overview

    Introduction Obfuscation Transformations

    Basic Control Flow Flattening Enhancements

    Interprocedural Data Flow Artificial Blocks and Pointers

    Deobfuscation Cloning Static Path Feasibility Analysis Combining Static and Dynamic Analysis

    Experiments

    Conclusion2

  • 8/12/2019 11.1.Deobfuscation Presentation

    3/45

    Introduction:What is obfuscated Code?

    Obfuscating is used to prevent reverse engineering.

    Code is very hard do read and to understand.

    Important for newer environments such as .NET or JAVA

    Obfuscator takes a code block creates unreadable code.

    Deobfuscator tries to reverse obfuscation. (statistically, dynamicanalysis)

    3

  • 8/12/2019 11.1.Deobfuscation Presentation

    4/45

    Why obfuscate .NET code?

    4

  • 8/12/2019 11.1.Deobfuscation Presentation

    5/45

    Introduction

    Goals of obfuscation

    improve software security

    hard to reverse engineer code

    protects the owners intellectual property

    Could also be used to hide malicious software (prevent thedetection).

    5

  • 8/12/2019 11.1.Deobfuscation Presentation

    6/45

    Introduction

    Raises 2 Questions

    What sort of techniques are useful to understand obfuscatedCode?

    What are the weaknesses of current code obfuscation techniquesand how can we address them?

    Please keep in mind these two questions during the presentation!

    We will see for example the weakness of control flow flattening

    6

  • 8/12/2019 11.1.Deobfuscation Presentation

    7/45

    Obfuscating Transformations

    2 classes of transformations

    1. surface obfuscation (focuses on syntax)

    2. deep obfuscation (focuses on program structure)

    7

  • 8/12/2019 11.1.Deobfuscation Presentation

    8/45

    Surface obfuscation

    Harder for human to understand

    No effect to determine algorithms

    Variable renaming: easy undo by applying parser that resolves thevariable references.

    8

  • 8/12/2019 11.1.Deobfuscation Presentation

    9/45

    Deep obfuscation

    Changes the programs control flow and data references

    Affects efficacy of semantic tools for reverse engineering

    Working around deep obfuscation much harder than surfaceobfuscation.

    We only take a look on the harder one! Deep obfuscation taken fromChenxi Wangs dissertation.

    9

  • 8/12/2019 11.1.Deobfuscation Presentation

    10/45

    Basic Control Flow Flattening

    Aims to obscure the control flow logic

    By flattening the control flow graph, all basic blocks will have thesame set of predecessors and successors.

    Control flow during execution is guided by a dispatcher variable .Basic Block assigns the correct value for the next block to take.

    Switch block uses the dispatcher, to jump to the block

    10

  • 8/12/2019 11.1.Deobfuscation Presentation

    11/45

    int f(int i, int j)

    { int res = -1; if (i < j) { res = i;

    } else { if (i = j) { res = 0;

    } else { res = j; } } return res;}

    res = -1i < j ?

    res = i i = j ?

    res = 0 res = j

    return res

    Y

    Y

    N

    N

    A

    B C

    ED

    F

    Example: Basic Control Flow Flattening

    11

  • 8/12/2019 11.1.Deobfuscation Presentation

    12/45

    int f(int i, int j)

    { int res = -1; if (i < j) { res = i;

    } else { if (i = j) { res = 0;

    } else { res = j; } } return res;}

    res = -1x = i < j?1 : 2

    Ares = ix = 5

    Bx = i = j ? 3 : 4

    Cres = jx=5

    Eres = 0x= 5

    Dreturn res

    F

    switch (x)

    x = 0

    0 1 2 3 4 5

    Example: Basic Control Flow Flattening

    11

  • 8/12/2019 11.1.Deobfuscation Presentation

    13/45

    Enhancements

    We will discuss 2 enhancements of basic block flattening now.

    Those make it more difficult to deobfuscate the code.

    12

  • 8/12/2019 11.1.Deobfuscation Presentation

    14/45

    Enhancement 1: Interprocedural Data Flow

    In Basic CtrFl Flattening the values assigned to the dispatchervariable are within the function.

    The control flow is not obvious.

    Reconstructing by examining the values assigned to dispatchervariable.

    Requires only intraprocedural analysis.

    13

  • 8/12/2019 11.1.Deobfuscation Presentation

    15/45

    Enhancement 1: Interprocedural Data Flow

    Improving by using interprocedural informations.

    Idea: Use global array to pass dispatch values.

    Every call-site writes the values to a random field of the array. Every

    call-site has a different random to this field.

    Obfuscated code assigns values from array to the dispatch variable.

    Locations accessed and the contents of these locations arentevident by examining the callee code.

    14

  • 8/12/2019 11.1.Deobfuscation Presentation

    16/45

    res = -1x = i < j?1 : 2

    Ares = ix = 5

    Bx = i = j ? 3 : 4

    Cres = jx=5

    Eres = 0x= 5

    Dreturn res

    F

    switch (x)

    x = 0

    0 1 2 3 4 5

    Example: Interprocedural Data Flow

    15

  • 8/12/2019 11.1.Deobfuscation Presentation

    17/45

    res = -1x = i < j?1 : 2

    Ares = ix = 5

    Bx = i = j ? 3 : 4

    Cres = jx=5

    Eres = 0x= 5

    Dreturn res

    F

    switch (x)

    x = 0

    0 1 2 3 4 5

    int A[...]; // global array of indicesint w; // offset into array A

    w = random1A[w] = 3A[w+1] = 1A[w+2] = 2A[w+3] = 4A[w+4] = 5call f(i,j)

    caller 1:w = random2A[w] = 3A[w+1] = 1A[w+2] = 2A[w+3] = 4A[w+4] = 5call f(i,j)

    caller 2:

    Example: Interprocedural Data Flow

    15

  • 8/12/2019 11.1.Deobfuscation Presentation

    18/45

    int A[...]; // global array of indicesint w; // offset into array A

    w = random1A[w] = 3A[w+1] = 1A[w+2] = 2A[w+3] = 4A[w+4] = 5call f(i,j)

    caller 1:w = random2A[w] = 3A[w+1] = 1A[w+2] = 2A[w+3] = 4A[w+4] = 5call f(i,j)

    caller 2:

    res = -1x = i < j?

    A[w+1] :A[w+2]

    res = ix = A[w+4]

    Bx = i = j ?

    A[w] :A[w+3]

    Cres = jx = A[w+4]

    Eres = 0x = A[w+4]

    Dreturn res

    F

    switch (x)

    x = 0

    0 1 2 3 4 5

    Example: Interprocedural Data Flow

    15

  • 8/12/2019 11.1.Deobfuscation Presentation

    19/45

    Enhancement 2: Artificial Blocks and Pointers

    Enhancement 1: Extended by adding artificial Blocks

    Artificial Blocks

    Some never executed

    Difficult to determine with static analysis (caused by dynamicallyindirect branch targets)

    Adding indirect loads and stores (pointers) to these unreachable

    blocks. Confusing static analysis about taken dispatch values.

    16

  • 8/12/2019 11.1.Deobfuscation Presentation

    20/45

    Enhancement 2: Artificial Blocks and Pointers

    How it works!

    Add 2 artificial blocks (B, B) for every basic block.

    B will be executed, so indirect assignments through pointers set the

    dispatch variable.

    B also contains indirect assignments, never executed. Only forconfusing static analyzer.

    17

  • 8/12/2019 11.1.Deobfuscation Presentation

    21/45

    Example: Artificial Blocks and Pointers

    B

    01 2

    3

    a = j

    switch (x)

    A

    a = 1 x = 3

    C

    i = i 1a = a*iD

    return ax = i < j ? 1 : 2x = i > 0 ? 2 : 3

    f: x = 0

    S

    Init

    18

  • 8/12/2019 11.1.Deobfuscation Presentation

    22/45

    switch (x)

    A

    p = &b*p = 4q = &c*q = 6x = 1

    p = &b*p = 3

    a = jx = b

    p = &b*p = 9

    p = &b

    q = &c*p = 8

    *q = 9x = 8

    *p = 3q = &c

    p = &a

    C

    i = i 1a = a*i

    return a

    D

    01

    2 3 4 5 6 7 8 9

    x = 0

    S

    Init

    f:

    int a, b, c, *p, *q

    p = &b*p = 3q = &c

    x = 1*q = 4

    a = 1x = i < j ? b : c

    *q = 9x = b

    x = i > 0 ? b : cx = 6x = 5

    B

    Example: Artificial Blocks and Pointers

    B

    01 2

    3

    a = j

    switch (x)

    A

    a = 1 x = 3

    C

    i = i 1a = a*iD

    return ax = i < j ? 1 : 2x = i > 0 ? 2 : 3

    f: x = 0

    S

    Init

    18

  • 8/12/2019 11.1.Deobfuscation Presentation

    23/45

    Deobfuscation

    We now consider some methods for reverse engineering obfuscatedcode.

    Obfuscation inserts spurious execution paths into programs.

    To cause bogus information during program analysis.

    1 is the original control flow path and 2 is the spurious control flowpath.

    2 introduces imprecision to program analysis,where execution paths join.

    21

    A

    B

    19

  • 8/12/2019 11.1.Deobfuscation Presentation

    24/45

    Deobfuscation

    Forward analysis (reaching definitions) results are tainted at theentry of B.

    Results of backward analysis (liveness analysis) are affected at theexit of A.

    To address this problem one could clone portion of the program

    20

  • 8/12/2019 11.1.Deobfuscation Presentation

    25/45

    Cloning

    Clone some parts of program, in such a way that spurious paths nolonger join original paths.

    Applying cloning creates a new blockB

    Improves forward dataflow information.

    Backward dataflow arent improved. At exit of A we still have thepossibility to take the spurious path.

    B

    1

    A

    2

    B

    21

  • 8/12/2019 11.1.Deobfuscation Presentation

    26/45

    Cloning (2)

    Goal of deobfuscation to identify spurious paths.

    But where should we apply cloning? We dont know which paths arespurious.

    => Simple approach clone every block where multiple paths join.

    If obfuscater is known => improve the cloning.

    22

  • 8/12/2019 11.1.Deobfuscation Presentation

    27/45

    S

    A B C

    Example Cloning

    23

  • 8/12/2019 11.1.Deobfuscation Presentation

    28/45

    S

    A B C

    S

    A B C

    S1 S2 S3

    A B C A B C A B C

    Example Cloning

    23

  • 8/12/2019 11.1.Deobfuscation Presentation

    29/45

    Static Path Feasibility Analysis

    Constraint-based static analysis to determine if a execution path(acyclic) is feasible or not.

    Given execution path with a set of live variables x at entry.

    Construction of C such that ( x)C is unsatisfiable if for allexecutions of the program the is never executed.

    So is unfeasible.

    24

  • 8/12/2019 11.1.Deobfuscation Presentation

    30/45

    Static Path Feasibility Analysis (2)

    Many ways to construct the constrain.

    We take into account arithmetic operations.

    Propagation of information among a single path, not along all

    execution paths.

    Each instruction named , value after Instruction

    Unknown value

    k k

    25

  • 8/12/2019 11.1.Deobfuscation Presentation

    31/45

    Static Path Feasibility Analysis Rules

    Assignment: => j most recent instruction defined y

    Arithmetic: =>expresses the semantic of the operation.

    If semantic is not known, or either or is unknownthen

    Indirection: Pointers can be modeled at different levels

    I k x = y C k xk = y j ,

    I k x = y z C k xk = f yi , z j f

    C k x = yi z j

    26

  • 8/12/2019 11.1.Deobfuscation Presentation

    32/45

    Static Path Feasibility Analysis Rules (2)

    Branches: for some boolean expression e

    Unconditional branches treated as

    Other: Analysis aborted and branch assumed to be feasible.

    Constraint constructed as a conjunction of every instructions

    A constraint solver could determine if the path is feasible or not.

    I k if e goto L

    C k e if I k is a taken branch in ;

    e if I k is not taken in ;

    e true

    C k

    27

  • 8/12/2019 11.1.Deobfuscation Presentation

    33/45

    B0(1)(2)(3)

    B1 B2(4) (5)

    B3(6)

    B4 B5

    x = 1

    if (u > 0) goto B1y = 2

    z = x + y z = x y

    if (z > 0) goto B5

    Example: Static Path Feasibility Analysis Rules

    28

  • 8/12/2019 11.1.Deobfuscation Presentation

    34/45

    B0(1)(2)(3)

    B1 B2(4) (5)

    B3(6)

    B4 B5

    x = 1

    if (u > 0) goto B1y = 2

    z = x + y z = x y

    if (z > 0) goto B5

    = B 0 B 2 B 3 B 5

    Example: Static Path Feasibility Analysis Rules

    28

  • 8/12/2019 11.1.Deobfuscation Presentation

    35/45

    B0(1)(2)(3)

    B1 B2(4) (5)

    B3(6)

    B4 B5

    x = 1

    if (u > 0) goto B1y = 2

    z = x + y z = x y

    if (z > 0) goto B5

    u o( )[x1 = 1 y 2 = 2 u 0

    0 z 5 = x 1 - y 2 z 5 > 0]

    = B 0 B 2 B 3 B 5

    Example: Static Path Feasibility Analysis Rules

    28

  • 8/12/2019 11.1.Deobfuscation Presentation

    36/45

    B0(1)(2)(3)

    B1 B2(4) (5)

    B3(6)

    B4 B5

    x = 1

    if (u > 0) goto B1y = 2

    z = x + y z = x y

    if (z > 0) goto B5

    u o( )[x1 = 1 y 2 = 2 u 0

    0 z 5 = x 1 - y 2 z 5 > 0]

    = B 0 B 2 B 3 B 5

    Example: Static Path Feasibility Analysis Rules

    unfeasible

    28

  • 8/12/2019 11.1.Deobfuscation Presentation

    37/45

    Combining Static and Dynamic Analysis

    Static analysis is inherently conservative.

    Set of paths from static deobfuscation is a superset of actual paths.

    On the other way dynamic analysis cant consider all possible input

    values.

    What about combining them?

    Start with underapproximated set of control flow paths fromdynamic analysis.

    29

  • 8/12/2019 11.1.Deobfuscation Presentation

    38/45

    Combining Static and Dynamic Analysis (2)

    Use static analysis to add paths, that could be taken.

    Also possible first use static then dynamic analysis.

    But the result set still could contain more or less paths.

    Authors took first dynamic then static.- Suppose we know a way to determine all taken paths during

    execution.- Mark these paths and propagate information only across these paths.

    30

  • 8/12/2019 11.1.Deobfuscation Presentation

    39/45

    Combining Static and Dynamic Analysis (3)

    Conventional static analysis is degeneration where all paths aremarked.

    Improving for analyzer: Mark paths that are taken during dynamicanalysis.

    Propagate dataflow information along these paths.

    If a branch is reached where only one outgoing path is marked, andthe outcome cant be uniquely determined. Add the untaken branchto the marked (taken) set.

    31

  • 8/12/2019 11.1.Deobfuscation Presentation

    40/45

  • 8/12/2019 11.1.Deobfuscation Presentation

    41/45

  • 8/12/2019 11.1.Deobfuscation Presentation

    42/45

  • 8/12/2019 11.1.Deobfuscation Presentation

    43/45

  • 8/12/2019 11.1.Deobfuscation Presentation

    44/45

    Conclusions

    Code obfuscation proposed by a number of researchers.

    Rely on theoretical difficulty of reasoning statically kinds of programproperties.

    Shown that combination of static and dynamic analysis bypassesmuch of the effects of obfuscators.

    Control Flow Flattening used in commercial obfuscators, can beremoved in a relatively straightforward way.

    36

  • 8/12/2019 11.1.Deobfuscation Presentation

    45/45

    Questions?

    Thank you for your attention

    37