45
Malware-Analyse und Reverse Engineering 8: Statische und dynamische Analysen zur Malware-Erkennung 11.5.2017 Prof. Dr. Michael Engel

Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

Malware-AnalyseundReverseEngineering

8: StatischeunddynamischeAnalysenzurMalware-Erkennung

11.5.2017Prof.Dr.MichaelEngel

Page 2: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

Überblick

Themen:• Statische Analysen zur Malware-Erkennung• Dynamische Analysen zur Malware-Erkennung

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

2

Page 3: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

Malware-Erkennung

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

StatischeAnalysen• Signaturen• AbstractInterpretation

Dynamische Analysen• Anomalieerkennung• Kontrollflussverfolgung

3

Page 4: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

StatischeAnalysen

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

VorigeVorlesung• StatischeAnalysevonProgramm-QuellcodezumFinden

möglicherSicherheitslücken

Jetzt• StatischeAnalysevonBinärprogrammen,umfestzustellen,ob

einProgrammMalwareenthält• StatischeAnalyse:• InformationenüberdasVerhalteneinesProgramms

ermitteln,ohnedasProgrammselbstauszuführen

4

Page 5: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

EinfacheStatischeAnalyse:Signaturen

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

Idee• Malwarecodebestehtausbestimmter(Maschinen-)Befehlsfolge• WenndieseinProgrammcodevorkommt⇒Malwareentdeckt• GrundlagevielereinfacherVirenscanner

Signatur• FolgevonBytesimProgrammcode• Nichtnotwendigerweiseaufeinanderfolgend• AuchkomplexereMustermöglich

• SyntaktischeMethode– keinVerständnisdesVerhaltens derMalware

5

Page 6: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

Signaturen:Beispiel

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

„Stoned“-Virusvon1987• AngeblichvoneinemStudenteninNeuseelandentwickelt• 1989:inNeuseelandundAustralienverbreitet• 1990:VariantenaufderganzenWeltverbreitet

• Bootsektor-Virus:befälltBootsektorenvonDiskettenundFestplatten(unterMS-DOS!)• KopiertsichbeimBootenresidentindenHauptspeicher• InfiziertneueingelegteDisketten(undneuangeschlossene

Festplatten– diewarenaberdamalsnichtimBetriebwechselbar...)• Malware-Funktionalität:• Beijedem8.BootvorgangdesPC(vonbefallenerDiskette

oderFestplatte)wirdderText„YourPCisnowStoned!”ausgegeben

6

Page 7: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

Stoned“inthewild”

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

Ein Virusvon1987macht 20Jahre später noch Ärger…

7

Page 8: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

Dumpdes“Stoned”-Virus

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

8

Page 9: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

Analysedes“Stoned”-Virus

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

Bootsektor=„MasterBootRecord“(MBR)• ErsterSektoreinerDiskette/FestplattebeiPCs,diemit

klassischemBIOS(nichtUEFI)arbeiten:512Bytes• WirdvonBIOS

beimStartdesPCvonDiskette/Plattegeladen

• EnthältCode,dermitHilfevonBIOS-Funktionen(SW-Interrupts)RestdesOSlädt

9

Page 10: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

Der“Stoned”-Bootsektor

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

0x1FE/1FF:SignaturdesBootsektors(wird später eingefügt)

0x000-004:Sprungbefehl“JMP07C0:0005”(wird oftzur Erkennungeines gültigen Bootsektorsverwendet)

0x1BE-0x1FD:Partitionstabelle(wird freigelassenundspäter ergänzt)

10

Page 11: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

Analyse:“Stoned”-Bootsektor

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

00000000 EA0500C007 jmp 07C0:0005 ; jump to set correct Code SegmentSet_Code_Segment:00000005 E99900 jmp Stoned_Start; initial address of stoned boot virus

BIOSlädt Bootsektor immer anAdresse 0x07C0:0000Virusnutzt absoluteAdresse aus,umDaten usw.zu referenzieren

11

Page 12: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

Erkennungvon“Stoned”(1)

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

ErkennenvonBytefolge,dieeindeutigVirusidentifiziertKein geeigneter Kandidat:• jeder Bootsektor enthält diese

Bytefolge(oder eine sehr ähnliche)

http://www.stoned-vienna.com/analysis-of-stoned.html

12

Page 13: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

Erkennungvon“Stoned”(2)

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

ErkennenvonBytefolge,dieeindeutigVirusidentifiziert

8

Figure 3: Search Pattern for Stoned virus [18]

3.2 Anomaly based virus detection

Anomaly based detection systems monitor the processes on a host machine for any

abnormal activity. If any abnormal activity is identified, the system raises an alarm signaling the

possible presence of malware [15]. In this detection technique, the system uses the collected

heuristics to categorize an activity as normal or malicious. Even though chances of false alarm

are relatively higher in this method, it is more reliable because it is also capable of detecting new

viruses. The important thing to note is that raising a false alarm is not as potential harmful as

allowing a new virus. However, these systems can be trained gradually by intruders to consider

abnormal behavior as normal. Thus, system will fail to detect the abnormal activity in such cases

[15].

3.3 Emulation based detection

The emulation based detection is an effective method where a virus is executed in a virtual

environment by emulating the instructions in the virus code. This type of detection is used to

detect polymorphic, as well as metamorphic, viruses. The virus instance can be executed in the

Bessere Signatur:• eigentlicher Code des Virus• länger, damit höhere

Wahrscheinlichkeit, dassBytefolge eindeutig ist

13

Page 14: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

Signaturen:Probleme

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

False Positives(falscherAlarm)• FolgevonBytes„legaler“BestandteileinesProgramms• Wiezwischengutundböseunterscheiden?

False Negatives• Virusistvorhanden,wirdabernichterkannt• Ursache:kleineÄnderungdesVirus-Codes• AndereAdressen,VerwendungandererRegister,...

• PolymorpheMalware• Viren,diesichbeijederInfektionselbstverändern,um

Signaturprüfungzuentgehen

14

Page 15: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

Signaturen:Beispielfürfalsepositives

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

„Stoned“-VirusundBitcoins• Virus-Signaturvon„Stoned“wurdeimMai2014indie

Blockchain vonBitcoineingeführt• Blockchain =Daten,keinausführbares Programm• VirenscannerscanntDateienaufFestplatte(+evtl.Inhaltdes

Hauptspeichers),kannnichtzwischenCodeundDatenunterscheiden

• Folge:MicrosoftSecurityEssentialserkannteBlockchain alsVirusundwollteDateilöschen• Bitcoin-SWlädtBlockchain neu• Virenscannererkennt„Virus“erneut...

http://thehackernews.com/2014/05/microsoft-security-essential-found.html

15

Page 16: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

WeitereProblemevonSignaturen

MitjedemneuenVirussteigtGrößederSignatur-Datenbank• AufwendigererScanvorgang– immermehrSignaturenmüssen

überprüftwerden

Signaturdatenbankmussaktuellgehaltenwerden• NeuauftretendeMalwarewirdnichterkannt,daHerstellervon

Virenscannerndieseerstanalysierenmüssen• DanachwirdSignaturdatenbankaktualisiert• ...unddann(irgendwann)verteilt...

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

16

Page 17: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

PolymorpheMalware

Ziel:UmgehenderErkennungdurchSignaturen• ÄnderungvonBytesinCodedesProgramms

• Absichtlich:MutationoderObfuskation• Einfügenvon„NOP“:0x90 =XCHGEAX,EAXoderanderen

Instruktionen,diedieSemantiknichtändern• ÄndernderCodereihenfolgedurchSprungbefehle• Ersetzenvon(Folgenvon)Maschineninstruktionendurch

semantischidentischeInstruktionen

• ...oderaucheherunabsichtlichdurchAdressänderungenimCode,Compileroptimierungenbei„Update“einesVirususw.

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

17

Page 18: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

Malware-Obfuskation:BeispielfürEinfügenvonNOPs(1)Einfügenvon„NOP“-Operationen• BeibehaltungderSemantik• aber:andereSignatur

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

Original code Obfuscated codeE8 00000000 call 0h E8 00000000 call 0h5B pop ebx 5B pop ebx8D 4B 42 lea ecx, [ebx + 42h] 8D 4B 42 lea ecx, [ebx + 45h]51 push ecx 90 nop50 push eax 51 push ecx50 push eax 50 push eax0F01 4C 24 FE sidt [esp - 02h] 50 push eax5B pop ebx 90 nop83 C3 1C add ebx, 1Ch 0F01 4C 24 FE sidt [esp - 02h]FA cli 5B pop ebx8B 2B mov ebp, [ebx] 83 C3 1C add ebx, 1Ch

90 nopFA cli8B 2B mov ebp, [ebx]

Signature New signatureE800 0000 005B 8D4B 4251 5050 E800 0000 005B 8D4B 4290 51500F01 4C24 FE5B 83C3 1CFA 8B2B 5090 0F01 4C24 FE5B 83C3 1C90

FA8B 2B

Figure 1: Original code and obfuscated code from Chernobyl/CIH, and their corresponding signatures. Newly addedinstructions are highlighted.

E800 0000 00(90)* 5B(90)*8D4B 42(90)* 51(90)* 50(90)*50(90)* 0F01 4C24 FE(90)*5B(90)* 83C3 1C(90)* FA(90)*8B2B

Figure 2: Extended signature to catch nop-insertion.

HareFinally, the Hare virus infects the bootloader sectorsof floppy disks and hard drives, as well as executableprograms. When the payload is triggered, the virusoverwrites random sectors on the hard disk, making thedata inaccessible. The virus spreads by polymorphicallychanging its decryption routine and encrypting its mainbody.The Hare and Chernobyl/CIH viruses are well known

in the antivirus community, with their presence in thewild peaking in 1996 and 1998, respectively. In spiteof this, we discovered that current commercial virusscanners could not detect slightly obfuscated versionsof these viruses.

4 Obfuscation Attacks on CommercialVirus Scanners

We tested three commercial virus scanners againstseveral common obfuscation transformations. To test theresilience of commercial virus scanners to common ob-fuscation transformations, we have developed an obfus-cator for binaries. Our obfuscator supports four com-mon obfuscation transformations: dead-code insertion,code transposition, register reassignment, and instruc-tion substitution. While there are other generic obfus-

cation techniques [11, 12], those described here seemto be preferred by malicious code writers, possibly be-cause implementing them is easy and they add little tothe memory footprint.

4.1 Common Obfuscation Transformations4.1.1 Dead-Code InsertionAlso known as trash insertion, dead-code insertion

adds code to a program without modifying its behav-ior. Inserting a sequence of nop instructions is the sim-plest example. More interesting obfuscations involveconstructing challenging code sequences that modify theprogram state, only to restore it immediately.Some code sequences are designed to fool antivirus

software that solely rely on signature matching as theirdetection mechanism. Other code sequences are com-plicated enough to make automatic analysis very time-consuming, if not impossible. For example, passing val-ues through memory rather than registers or the stackrequires accurate pointer analysis to recover values. Theexample shown in Figure 3 should clarify this. The codemarked by (*) can be easily eliminated by automatedanalysis. On the other hand, the second and third inser-tions, marked by (**), do cancel out but the analysis ismore complex. Our obfuscator supports dead-code in-sertion.Not all dead-code sequence can be detected and elim-

inated, as this problem reduces to program equivalence(i.e., is this code sequence equivalent to an empty pro-gram?), which is undecidable. We believe that manycommon dead-code sequences can be detected and elim-inated with acceptable performance. To quote the docu-

Codeaus Chernobyl/CIH-Virus:MihaiChristodorescu andSomesh Jha,“Staticanalysisofexecutablestodetectmaliciouspatterns”,USENIXSecuritySymposium- Volume12,2003

18

Page 19: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

Malware-Obfuskation:BeispielfürEinfügenvonNOPs(2)AndereSignaturbenötigt• Einfügenvon

NOPsistaberan beliebigenStellenmöglich...

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

Original code Obfuscated codeE8 00000000 call 0h E8 00000000 call 0h5B pop ebx 5B pop ebx8D 4B 42 lea ecx, [ebx + 42h] 8D 4B 42 lea ecx, [ebx + 45h]51 push ecx 90 nop50 push eax 51 push ecx50 push eax 50 push eax0F01 4C 24 FE sidt [esp - 02h] 50 push eax5B pop ebx 90 nop83 C3 1C add ebx, 1Ch 0F01 4C 24 FE sidt [esp - 02h]FA cli 5B pop ebx8B 2B mov ebp, [ebx] 83 C3 1C add ebx, 1Ch

90 nopFA cli8B 2B mov ebp, [ebx]

Signature New signatureE800 0000 005B 8D4B 4251 5050 E800 0000 005B 8D4B 4290 51500F01 4C24 FE5B 83C3 1CFA 8B2B 5090 0F01 4C24 FE5B 83C3 1C90

FA8B 2B

Figure 1: Original code and obfuscated code from Chernobyl/CIH, and their corresponding signatures. Newly addedinstructions are highlighted.

E800 0000 00(90)* 5B(90)*8D4B 42(90)* 51(90)* 50(90)*50(90)* 0F01 4C24 FE(90)*5B(90)* 83C3 1C(90)* FA(90)*8B2B

Figure 2: Extended signature to catch nop-insertion.

HareFinally, the Hare virus infects the bootloader sectorsof floppy disks and hard drives, as well as executableprograms. When the payload is triggered, the virusoverwrites random sectors on the hard disk, making thedata inaccessible. The virus spreads by polymorphicallychanging its decryption routine and encrypting its mainbody.The Hare and Chernobyl/CIH viruses are well known

in the antivirus community, with their presence in thewild peaking in 1996 and 1998, respectively. In spiteof this, we discovered that current commercial virusscanners could not detect slightly obfuscated versionsof these viruses.

4 Obfuscation Attacks on CommercialVirus Scanners

We tested three commercial virus scanners againstseveral common obfuscation transformations. To test theresilience of commercial virus scanners to common ob-fuscation transformations, we have developed an obfus-cator for binaries. Our obfuscator supports four com-mon obfuscation transformations: dead-code insertion,code transposition, register reassignment, and instruc-tion substitution. While there are other generic obfus-

cation techniques [11, 12], those described here seemto be preferred by malicious code writers, possibly be-cause implementing them is easy and they add little tothe memory footprint.

4.1 Common Obfuscation Transformations4.1.1 Dead-Code InsertionAlso known as trash insertion, dead-code insertion

adds code to a program without modifying its behav-ior. Inserting a sequence of nop instructions is the sim-plest example. More interesting obfuscations involveconstructing challenging code sequences that modify theprogram state, only to restore it immediately.Some code sequences are designed to fool antivirus

software that solely rely on signature matching as theirdetection mechanism. Other code sequences are com-plicated enough to make automatic analysis very time-consuming, if not impossible. For example, passing val-ues through memory rather than registers or the stackrequires accurate pointer analysis to recover values. Theexample shown in Figure 3 should clarify this. The codemarked by (*) can be easily eliminated by automatedanalysis. On the other hand, the second and third inser-tions, marked by (**), do cancel out but the analysis ismore complex. Our obfuscator supports dead-code in-sertion.Not all dead-code sequence can be detected and elim-

inated, as this problem reduces to program equivalence(i.e., is this code sequence equivalent to an empty pro-gram?), which is undecidable. We believe that manycommon dead-code sequences can be detected and elim-inated with acceptable performance. To quote the docu-

Original code Obfuscated codeE8 00000000 call 0h E8 00000000 call 0h5B pop ebx 5B pop ebx8D 4B 42 lea ecx, [ebx + 42h] 8D 4B 42 lea ecx, [ebx + 45h]51 push ecx 90 nop50 push eax 51 push ecx50 push eax 50 push eax0F01 4C 24 FE sidt [esp - 02h] 50 push eax5B pop ebx 90 nop83 C3 1C add ebx, 1Ch 0F01 4C 24 FE sidt [esp - 02h]FA cli 5B pop ebx8B 2B mov ebp, [ebx] 83 C3 1C add ebx, 1Ch

90 nopFA cli8B 2B mov ebp, [ebx]

Signature New signatureE800 0000 005B 8D4B 4251 5050 E800 0000 005B 8D4B 4290 51500F01 4C24 FE5B 83C3 1CFA 8B2B 5090 0F01 4C24 FE5B 83C3 1C90

FA8B 2B

Figure 1: Original code and obfuscated code from Chernobyl/CIH, and their corresponding signatures. Newly addedinstructions are highlighted.

E800 0000 00(90)* 5B(90)*8D4B 42(90)* 51(90)* 50(90)*50(90)* 0F01 4C24 FE(90)*5B(90)* 83C3 1C(90)* FA(90)*8B2B

Figure 2: Extended signature to catch nop-insertion.

HareFinally, the Hare virus infects the bootloader sectorsof floppy disks and hard drives, as well as executableprograms. When the payload is triggered, the virusoverwrites random sectors on the hard disk, making thedata inaccessible. The virus spreads by polymorphicallychanging its decryption routine and encrypting its mainbody.The Hare and Chernobyl/CIH viruses are well known

in the antivirus community, with their presence in thewild peaking in 1996 and 1998, respectively. In spiteof this, we discovered that current commercial virusscanners could not detect slightly obfuscated versionsof these viruses.

4 Obfuscation Attacks on CommercialVirus Scanners

We tested three commercial virus scanners againstseveral common obfuscation transformations. To test theresilience of commercial virus scanners to common ob-fuscation transformations, we have developed an obfus-cator for binaries. Our obfuscator supports four com-mon obfuscation transformations: dead-code insertion,code transposition, register reassignment, and instruc-tion substitution. While there are other generic obfus-

cation techniques [11, 12], those described here seemto be preferred by malicious code writers, possibly be-cause implementing them is easy and they add little tothe memory footprint.

4.1 Common Obfuscation Transformations4.1.1 Dead-Code InsertionAlso known as trash insertion, dead-code insertion

adds code to a program without modifying its behav-ior. Inserting a sequence of nop instructions is the sim-plest example. More interesting obfuscations involveconstructing challenging code sequences that modify theprogram state, only to restore it immediately.Some code sequences are designed to fool antivirus

software that solely rely on signature matching as theirdetection mechanism. Other code sequences are com-plicated enough to make automatic analysis very time-consuming, if not impossible. For example, passing val-ues through memory rather than registers or the stackrequires accurate pointer analysis to recover values. Theexample shown in Figure 3 should clarify this. The codemarked by (*) can be easily eliminated by automatedanalysis. On the other hand, the second and third inser-tions, marked by (**), do cancel out but the analysis ismore complex. Our obfuscator supports dead-code in-sertion.Not all dead-code sequence can be detected and elim-

inated, as this problem reduces to program equivalence(i.e., is this code sequence equivalent to an empty pro-gram?), which is undecidable. We believe that manycommon dead-code sequences can be detected and elim-inated with acceptable performance. To quote the docu-

19

Page 20: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

FlexibleErkennungvondurch„NOP“mutiertenVariantenSignatur wirdzureguläremAusdruck• MöglichkeitderBeschreibungoptionalerundwiederholter

BytesundBytefolgennotwendig• Aufwendigerzuanalysieren

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

Original code Obfuscated codeE8 00000000 call 0h E8 00000000 call 0h5B pop ebx 5B pop ebx8D 4B 42 lea ecx, [ebx + 42h] 8D 4B 42 lea ecx, [ebx + 45h]51 push ecx 90 nop50 push eax 51 push ecx50 push eax 50 push eax0F01 4C 24 FE sidt [esp - 02h] 50 push eax5B pop ebx 90 nop83 C3 1C add ebx, 1Ch 0F01 4C 24 FE sidt [esp - 02h]FA cli 5B pop ebx8B 2B mov ebp, [ebx] 83 C3 1C add ebx, 1Ch

90 nopFA cli8B 2B mov ebp, [ebx]

Signature New signatureE800 0000 005B 8D4B 4251 5050 E800 0000 005B 8D4B 4290 51500F01 4C24 FE5B 83C3 1CFA 8B2B 5090 0F01 4C24 FE5B 83C3 1C90

FA8B 2B

Figure 1: Original code and obfuscated code from Chernobyl/CIH, and their corresponding signatures. Newly addedinstructions are highlighted.

E800 0000 00(90)* 5B(90)*8D4B 42(90)* 51(90)* 50(90)*50(90)* 0F01 4C24 FE(90)*5B(90)* 83C3 1C(90)* FA(90)*8B2B

Figure 2: Extended signature to catch nop-insertion.

HareFinally, the Hare virus infects the bootloader sectorsof floppy disks and hard drives, as well as executableprograms. When the payload is triggered, the virusoverwrites random sectors on the hard disk, making thedata inaccessible. The virus spreads by polymorphicallychanging its decryption routine and encrypting its mainbody.The Hare and Chernobyl/CIH viruses are well known

in the antivirus community, with their presence in thewild peaking in 1996 and 1998, respectively. In spiteof this, we discovered that current commercial virusscanners could not detect slightly obfuscated versionsof these viruses.

4 Obfuscation Attacks on CommercialVirus Scanners

We tested three commercial virus scanners againstseveral common obfuscation transformations. To test theresilience of commercial virus scanners to common ob-fuscation transformations, we have developed an obfus-cator for binaries. Our obfuscator supports four com-mon obfuscation transformations: dead-code insertion,code transposition, register reassignment, and instruc-tion substitution. While there are other generic obfus-

cation techniques [11, 12], those described here seemto be preferred by malicious code writers, possibly be-cause implementing them is easy and they add little tothe memory footprint.

4.1 Common Obfuscation Transformations4.1.1 Dead-Code InsertionAlso known as trash insertion, dead-code insertion

adds code to a program without modifying its behav-ior. Inserting a sequence of nop instructions is the sim-plest example. More interesting obfuscations involveconstructing challenging code sequences that modify theprogram state, only to restore it immediately.Some code sequences are designed to fool antivirus

software that solely rely on signature matching as theirdetection mechanism. Other code sequences are com-plicated enough to make automatic analysis very time-consuming, if not impossible. For example, passing val-ues through memory rather than registers or the stackrequires accurate pointer analysis to recover values. Theexample shown in Figure 3 should clarify this. The codemarked by (*) can be easily eliminated by automatedanalysis. On the other hand, the second and third inser-tions, marked by (**), do cancel out but the analysis ismore complex. Our obfuscator supports dead-code in-sertion.Not all dead-code sequence can be detected and elim-

inated, as this problem reduces to program equivalence(i.e., is this code sequence equivalent to an empty pro-gram?), which is undecidable. We believe that manycommon dead-code sequences can be detected and elim-inated with acceptable performance. To quote the docu-

Original code Obfuscated codeE8 00000000 call 0h E8 00000000 call 0h5B pop ebx 5B pop ebx8D 4B 42 lea ecx, [ebx + 42h] 8D 4B 42 lea ecx, [ebx + 45h]51 push ecx 90 nop50 push eax 51 push ecx50 push eax 50 push eax0F01 4C 24 FE sidt [esp - 02h] 50 push eax5B pop ebx 90 nop83 C3 1C add ebx, 1Ch 0F01 4C 24 FE sidt [esp - 02h]FA cli 5B pop ebx8B 2B mov ebp, [ebx] 83 C3 1C add ebx, 1Ch

90 nopFA cli8B 2B mov ebp, [ebx]

Signature New signatureE800 0000 005B 8D4B 4251 5050 E800 0000 005B 8D4B 4290 51500F01 4C24 FE5B 83C3 1CFA 8B2B 5090 0F01 4C24 FE5B 83C3 1C90

FA8B 2B

Figure 1: Original code and obfuscated code from Chernobyl/CIH, and their corresponding signatures. Newly addedinstructions are highlighted.

E800 0000 00(90)* 5B(90)*8D4B 42(90)* 51(90)* 50(90)*50(90)* 0F01 4C24 FE(90)*5B(90)* 83C3 1C(90)* FA(90)*8B2B

Figure 2: Extended signature to catch nop-insertion.

HareFinally, the Hare virus infects the bootloader sectorsof floppy disks and hard drives, as well as executableprograms. When the payload is triggered, the virusoverwrites random sectors on the hard disk, making thedata inaccessible. The virus spreads by polymorphicallychanging its decryption routine and encrypting its mainbody.The Hare and Chernobyl/CIH viruses are well known

in the antivirus community, with their presence in thewild peaking in 1996 and 1998, respectively. In spiteof this, we discovered that current commercial virusscanners could not detect slightly obfuscated versionsof these viruses.

4 Obfuscation Attacks on CommercialVirus Scanners

We tested three commercial virus scanners againstseveral common obfuscation transformations. To test theresilience of commercial virus scanners to common ob-fuscation transformations, we have developed an obfus-cator for binaries. Our obfuscator supports four com-mon obfuscation transformations: dead-code insertion,code transposition, register reassignment, and instruc-tion substitution. While there are other generic obfus-

cation techniques [11, 12], those described here seemto be preferred by malicious code writers, possibly be-cause implementing them is easy and they add little tothe memory footprint.

4.1 Common Obfuscation Transformations4.1.1 Dead-Code InsertionAlso known as trash insertion, dead-code insertion

adds code to a program without modifying its behav-ior. Inserting a sequence of nop instructions is the sim-plest example. More interesting obfuscations involveconstructing challenging code sequences that modify theprogram state, only to restore it immediately.Some code sequences are designed to fool antivirus

software that solely rely on signature matching as theirdetection mechanism. Other code sequences are com-plicated enough to make automatic analysis very time-consuming, if not impossible. For example, passing val-ues through memory rather than registers or the stackrequires accurate pointer analysis to recover values. Theexample shown in Figure 3 should clarify this. The codemarked by (*) can be easily eliminated by automatedanalysis. On the other hand, the second and third inser-tions, marked by (**), do cancel out but the analysis ismore complex. Our obfuscator supports dead-code in-sertion.Not all dead-code sequence can be detected and elim-

inated, as this problem reduces to program equivalence(i.e., is this code sequence equivalent to an empty pro-gram?), which is undecidable. We believe that manycommon dead-code sequences can be detected and elim-inated with acceptable performance. To quote the docu-

Andenmit (90)*markiertenStellen könnten möglicherweiseeine oder mehrere NOP-Instruktionen zusätzlich auftauchen

Flexible Signatur

20

Page 21: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

Malware-Obfuskation:BeispielfürCode-UmsortierungCodewirdimSpeicherumgeordnet• VerwendenvonSprungbefehlen,umKontrollflusszuverändern

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

21

Code obfuscated through Code obfuscated through Code obfuscated throughOriginal code dead-code insertion code transposition instruction substitutioncall 0h call 0h call 0h call 0hpop ebx pop ebx pop ebx pop ebxlea ecx, [ebx+42h] lea ecx, [ebx+42h] jmp S2 lea ecx, [ebx+42h]push ecx nop (*) S3: push eax sub esp, 03hpush eax nop (*) push eax sidt [esp - 02h]push eax push ecx sidt [esp - 02h] add [esp], 1Chsidt [esp - 02h] push eax jmp S4 mov ebx, [esp]pop ebx inc eax (**) add ebx, 1Ch inc espadd ebx, 1Ch push eax jmp S6 clicli dec [esp - 0h] (**) S2: lea ecx, [ebx+42h ] mov ebp, [ebx]mov ebp, [ebx] dec eax (**) push ecx

sidt [esp - 02h] jmp S3pop ebx S4: pop ebxadd ebx, 1Ch clicli jmp S5mov ebp, [ebx] S5: mov ebp, [ebx]

Figure 3: Examples of obfuscation through dead-code insertion, code transposition, and instruction substitution. Newly addedinstructions are highlighted.

Norton® McAfee® Command®SAFEAntivirus VirusScan Antivirus

7.0 6.01 4.61.2

Chernobyl original ✓ ✓ ✓ ✓

obfuscated ✕[1] ✕[1,2] ✕[1,2] ✓

z0mbie-6.boriginal ✓ ✓ ✓ ✓

obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓

f0sf0r0original ✓ ✓ ✓ ✓

obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓

Hareoriginal ✓ ✓ ✓ ✓

obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓

Obfuscations considered: [1] = nop-insertion (a form of dead-code insertion)[2] = code transposition

Table 1: Results of testing various virus scanners on obfuscated viruses.

in control flow and register reassignments. Similarly,one must construct an abstract representation of the ex-ecutable in which we are trying to find a malicious pat-tern. Once the generalization of the malicious code andthe abstract representation of the executable are created,we can then detect the malicious code in the executable.We now describe each component of SAFE.Generalizing the malicious code: Building the mali-cious code automatonThe malicious code is generalized into an automatonwith uninterpreted symbols. Uninterpreted symbols(Section 6.2) provide a generic way of representing datadependencies between variables without specifically re-ferring to the storage location of each variable.Pattern-definition loaderThis component takes a library of abstraction patternsand creates an internal representation. These abstractionpatterns are used as alphabet symbols by the maliciouscode automaton.The executable loaderThis component transforms the executable into an in-ternal representation, here the collection of controlflow graphs (CFGs), one for each program procedure.

The executable loader (Figure 5) uses two off-the-shelfcomponents, IDA Pro and CodeSurfer. IDA Pro (byDataRescue [42]) is a commercial interactive disassem-bler. CodeSurfer (by GrammaTech, Inc. [24]) is aprogram-understanding tool that performs a variety ofstatic analyses. CodeSurfer provides an API for ac-cess to various structures, such as the CFGs and the callgraph, and to results of a variety of static analyses, suchas points-to analysis. In collaboration with GrammaT-ech, we have developed a connector that transforms IDAPro internal structures into an intermediate form thatCodeSurfer can parse.The annotatorThis component inputs a CFG from the executable andthe set of abstraction patterns and produces an anno-tated CFG, the abstract representation of a program pro-cedure. The annotated CFG includes information thatindicates where a specific abstraction pattern was foundin the executable. The annotator runs for each proce-dure in the program, transforming each CFG. Section 6describes the annotator in detail.

Code obfuscated through Code obfuscated through Code obfuscated throughOriginal code dead-code insertion code transposition instruction substitutioncall 0h call 0h call 0h call 0hpop ebx pop ebx pop ebx pop ebxlea ecx, [ebx+42h] lea ecx, [ebx+42h] jmp S2 lea ecx, [ebx+42h]push ecx nop (*) S3: push eax sub esp, 03hpush eax nop (*) push eax sidt [esp - 02h]push eax push ecx sidt [esp - 02h] add [esp], 1Chsidt [esp - 02h] push eax jmp S4 mov ebx, [esp]pop ebx inc eax (**) add ebx, 1Ch inc espadd ebx, 1Ch push eax jmp S6 clicli dec [esp - 0h] (**) S2: lea ecx, [ebx+42h ] mov ebp, [ebx]mov ebp, [ebx] dec eax (**) push ecx

sidt [esp - 02h] jmp S3pop ebx S4: pop ebxadd ebx, 1Ch clicli jmp S5mov ebp, [ebx] S5: mov ebp, [ebx]

Figure 3: Examples of obfuscation through dead-code insertion, code transposition, and instruction substitution. Newly addedinstructions are highlighted.

Norton® McAfee® Command®SAFEAntivirus VirusScan Antivirus

7.0 6.01 4.61.2

Chernobyl original ✓ ✓ ✓ ✓

obfuscated ✕[1] ✕[1,2] ✕[1,2] ✓

z0mbie-6.boriginal ✓ ✓ ✓ ✓

obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓

f0sf0r0original ✓ ✓ ✓ ✓

obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓

Hareoriginal ✓ ✓ ✓ ✓

obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓

Obfuscations considered: [1] = nop-insertion (a form of dead-code insertion)[2] = code transposition

Table 1: Results of testing various virus scanners on obfuscated viruses.

in control flow and register reassignments. Similarly,one must construct an abstract representation of the ex-ecutable in which we are trying to find a malicious pat-tern. Once the generalization of the malicious code andthe abstract representation of the executable are created,we can then detect the malicious code in the executable.We now describe each component of SAFE.Generalizing the malicious code: Building the mali-cious code automatonThe malicious code is generalized into an automatonwith uninterpreted symbols. Uninterpreted symbols(Section 6.2) provide a generic way of representing datadependencies between variables without specifically re-ferring to the storage location of each variable.Pattern-definition loaderThis component takes a library of abstraction patternsand creates an internal representation. These abstractionpatterns are used as alphabet symbols by the maliciouscode automaton.The executable loaderThis component transforms the executable into an in-ternal representation, here the collection of controlflow graphs (CFGs), one for each program procedure.

The executable loader (Figure 5) uses two off-the-shelfcomponents, IDA Pro and CodeSurfer. IDA Pro (byDataRescue [42]) is a commercial interactive disassem-bler. CodeSurfer (by GrammaTech, Inc. [24]) is aprogram-understanding tool that performs a variety ofstatic analyses. CodeSurfer provides an API for ac-cess to various structures, such as the CFGs and the callgraph, and to results of a variety of static analyses, suchas points-to analysis. In collaboration with GrammaT-ech, we have developed a connector that transforms IDAPro internal structures into an intermediate form thatCodeSurfer can parse.The annotatorThis component inputs a CFG from the executable andthe set of abstraction patterns and produces an anno-tated CFG, the abstract representation of a program pro-cedure. The annotated CFG includes information thatindicates where a specific abstraction pattern was foundin the executable. The annotator runs for each proce-dure in the program, transforming each CFG. Section 6describes the annotator in detail.

Page 22: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

Malware-Obfuskation:VerwendungsemantischidentischerInstr.Instruktionenwerdenersetzt• Semantikwirdbeibehalten• z.B.pushdurchexpliziteSP-Operation+Storeersetzen

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

22

Code obfuscated through Code obfuscated through Code obfuscated throughOriginal code dead-code insertion code transposition instruction substitutioncall 0h call 0h call 0h call 0hpop ebx pop ebx pop ebx pop ebxlea ecx, [ebx+42h] lea ecx, [ebx+42h] jmp S2 lea ecx, [ebx+42h]push ecx nop (*) S3: push eax sub esp, 03hpush eax nop (*) push eax sidt [esp - 02h]push eax push ecx sidt [esp - 02h] add [esp], 1Chsidt [esp - 02h] push eax jmp S4 mov ebx, [esp]pop ebx inc eax (**) add ebx, 1Ch inc espadd ebx, 1Ch push eax jmp S6 clicli dec [esp - 0h] (**) S2: lea ecx, [ebx+42h ] mov ebp, [ebx]mov ebp, [ebx] dec eax (**) push ecx

sidt [esp - 02h] jmp S3pop ebx S4: pop ebxadd ebx, 1Ch clicli jmp S5mov ebp, [ebx] S5: mov ebp, [ebx]

Figure 3: Examples of obfuscation through dead-code insertion, code transposition, and instruction substitution. Newly addedinstructions are highlighted.

Norton® McAfee® Command®SAFEAntivirus VirusScan Antivirus

7.0 6.01 4.61.2

Chernobyl original ✓ ✓ ✓ ✓

obfuscated ✕[1] ✕[1,2] ✕[1,2] ✓

z0mbie-6.boriginal ✓ ✓ ✓ ✓

obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓

f0sf0r0original ✓ ✓ ✓ ✓

obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓

Hareoriginal ✓ ✓ ✓ ✓

obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓

Obfuscations considered: [1] = nop-insertion (a form of dead-code insertion)[2] = code transposition

Table 1: Results of testing various virus scanners on obfuscated viruses.

in control flow and register reassignments. Similarly,one must construct an abstract representation of the ex-ecutable in which we are trying to find a malicious pat-tern. Once the generalization of the malicious code andthe abstract representation of the executable are created,we can then detect the malicious code in the executable.We now describe each component of SAFE.Generalizing the malicious code: Building the mali-cious code automatonThe malicious code is generalized into an automatonwith uninterpreted symbols. Uninterpreted symbols(Section 6.2) provide a generic way of representing datadependencies between variables without specifically re-ferring to the storage location of each variable.Pattern-definition loaderThis component takes a library of abstraction patternsand creates an internal representation. These abstractionpatterns are used as alphabet symbols by the maliciouscode automaton.The executable loaderThis component transforms the executable into an in-ternal representation, here the collection of controlflow graphs (CFGs), one for each program procedure.

The executable loader (Figure 5) uses two off-the-shelfcomponents, IDA Pro and CodeSurfer. IDA Pro (byDataRescue [42]) is a commercial interactive disassem-bler. CodeSurfer (by GrammaTech, Inc. [24]) is aprogram-understanding tool that performs a variety ofstatic analyses. CodeSurfer provides an API for ac-cess to various structures, such as the CFGs and the callgraph, and to results of a variety of static analyses, suchas points-to analysis. In collaboration with GrammaT-ech, we have developed a connector that transforms IDAPro internal structures into an intermediate form thatCodeSurfer can parse.The annotatorThis component inputs a CFG from the executable andthe set of abstraction patterns and produces an anno-tated CFG, the abstract representation of a program pro-cedure. The annotated CFG includes information thatindicates where a specific abstraction pattern was foundin the executable. The annotator runs for each proce-dure in the program, transforming each CFG. Section 6describes the annotator in detail.

Code obfuscated through Code obfuscated through Code obfuscated throughOriginal code dead-code insertion code transposition instruction substitutioncall 0h call 0h call 0h call 0hpop ebx pop ebx pop ebx pop ebxlea ecx, [ebx+42h] lea ecx, [ebx+42h] jmp S2 lea ecx, [ebx+42h]push ecx nop (*) S3: push eax sub esp, 03hpush eax nop (*) push eax sidt [esp - 02h]push eax push ecx sidt [esp - 02h] add [esp], 1Chsidt [esp - 02h] push eax jmp S4 mov ebx, [esp]pop ebx inc eax (**) add ebx, 1Ch inc espadd ebx, 1Ch push eax jmp S6 clicli dec [esp - 0h] (**) S2: lea ecx, [ebx+42h ] mov ebp, [ebx]mov ebp, [ebx] dec eax (**) push ecx

sidt [esp - 02h] jmp S3pop ebx S4: pop ebxadd ebx, 1Ch clicli jmp S5mov ebp, [ebx] S5: mov ebp, [ebx]

Figure 3: Examples of obfuscation through dead-code insertion, code transposition, and instruction substitution. Newly addedinstructions are highlighted.

Norton® McAfee® Command®SAFEAntivirus VirusScan Antivirus

7.0 6.01 4.61.2

Chernobyl original ✓ ✓ ✓ ✓

obfuscated ✕[1] ✕[1,2] ✕[1,2] ✓

z0mbie-6.boriginal ✓ ✓ ✓ ✓

obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓

f0sf0r0original ✓ ✓ ✓ ✓

obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓

Hareoriginal ✓ ✓ ✓ ✓

obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓

Obfuscations considered: [1] = nop-insertion (a form of dead-code insertion)[2] = code transposition

Table 1: Results of testing various virus scanners on obfuscated viruses.

in control flow and register reassignments. Similarly,one must construct an abstract representation of the ex-ecutable in which we are trying to find a malicious pat-tern. Once the generalization of the malicious code andthe abstract representation of the executable are created,we can then detect the malicious code in the executable.We now describe each component of SAFE.Generalizing the malicious code: Building the mali-cious code automatonThe malicious code is generalized into an automatonwith uninterpreted symbols. Uninterpreted symbols(Section 6.2) provide a generic way of representing datadependencies between variables without specifically re-ferring to the storage location of each variable.Pattern-definition loaderThis component takes a library of abstraction patternsand creates an internal representation. These abstractionpatterns are used as alphabet symbols by the maliciouscode automaton.The executable loaderThis component transforms the executable into an in-ternal representation, here the collection of controlflow graphs (CFGs), one for each program procedure.

The executable loader (Figure 5) uses two off-the-shelfcomponents, IDA Pro and CodeSurfer. IDA Pro (byDataRescue [42]) is a commercial interactive disassem-bler. CodeSurfer (by GrammaTech, Inc. [24]) is aprogram-understanding tool that performs a variety ofstatic analyses. CodeSurfer provides an API for ac-cess to various structures, such as the CFGs and the callgraph, and to results of a variety of static analyses, suchas points-to analysis. In collaboration with GrammaT-ech, we have developed a connector that transforms IDAPro internal structures into an intermediate form thatCodeSurfer can parse.The annotatorThis component inputs a CFG from the executable andthe set of abstraction patterns and produces an anno-tated CFG, the abstract representation of a program pro-cedure. The annotated CFG includes information thatindicates where a specific abstraction pattern was foundin the executable. The annotator runs for each proce-dure in the program, transforming each CFG. Section 6describes the annotator in detail.

Page 23: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

ErkennungvonObfuskationen

Vorgehensweise:• ErstellendesKontrollflussgraphen

desMaschinencodes• Markieren„unnötiger“Operationen

(NOP,JMP,etc.)• Vergleichmitbekanntem

Kontrollflussgraphen(Graph-Isomorphismus)

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

23

mov eax, dr1

jmp n_11

mov ebx, [eax+10h]

mov edi, [eax]

Loop: pop ecx

jecxz n_18

nop

(F)

pop ebx

(T)

mov esi, ecx

nop

nop

mov eax, 0d601h

jmp n_13

pop edx

jmp n_02

pop ecx

nop

call edi

jmp Loop

pop eax

push eax

pop eax

stc

pushf

mov eax, dr1

jmp n_11

Assign(eax,dr1)

mov ebx, [eax+10h]

IrrelevantJump

jmp n_02

Assign(ebx,[eax+10h])

IrrelevantJump

mov edi, [eax]Assign(edi,[eax])

Loop: pop ecxLoop: Pop(ecx)

jecxz n_18If(ecx==0)

nop

(F)

pop ebx

(T)

IrrelevantInstr

mov esi, ecxAssign(esi,ecx)

nopIrrelevantInstr

nop

mov eax, 0d601hAssign(eax,0d601h)

jmp n_13IrrelevantJump

pop edxPop(edx)

pop ecxPop(ecx)

nopIrrelevantInstr

call ediIndirectCall(edi)

jmp LoopGoTo(Loop)

Pop(ebx)

pop eaxPop(eax)

push eaxIrrelevantInstr

pop eax

stcAssign(Carry,1)

pushfPush(flags)

Figure 8: Control flow graph of obfuscated code fragment, and annotations.

Page 24: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

ÜberblickObfuskationstechnikenKombinationderverschiedenenTechnikenmöglich• ExtremhoherErkennungsaufwand

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

24

Code obfuscated through Code obfuscated through Code obfuscated throughOriginal code dead-code insertion code transposition instruction substitutioncall 0h call 0h call 0h call 0hpop ebx pop ebx pop ebx pop ebxlea ecx, [ebx+42h] lea ecx, [ebx+42h] jmp S2 lea ecx, [ebx+42h]push ecx nop (*) S3: push eax sub esp, 03hpush eax nop (*) push eax sidt [esp - 02h]push eax push ecx sidt [esp - 02h] add [esp], 1Chsidt [esp - 02h] push eax jmp S4 mov ebx, [esp]pop ebx inc eax (**) add ebx, 1Ch inc espadd ebx, 1Ch push eax jmp S6 clicli dec [esp - 0h] (**) S2: lea ecx, [ebx+42h ] mov ebp, [ebx]mov ebp, [ebx] dec eax (**) push ecx

sidt [esp - 02h] jmp S3pop ebx S4: pop ebxadd ebx, 1Ch clicli jmp S5mov ebp, [ebx] S5: mov ebp, [ebx]

Figure 3: Examples of obfuscation through dead-code insertion, code transposition, and instruction substitution. Newly addedinstructions are highlighted.

Norton® McAfee® Command®SAFEAntivirus VirusScan Antivirus

7.0 6.01 4.61.2

Chernobyl original ✓ ✓ ✓ ✓

obfuscated ✕[1] ✕[1,2] ✕[1,2] ✓

z0mbie-6.boriginal ✓ ✓ ✓ ✓

obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓

f0sf0r0original ✓ ✓ ✓ ✓

obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓

Hareoriginal ✓ ✓ ✓ ✓

obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓

Obfuscations considered: [1] = nop-insertion (a form of dead-code insertion)[2] = code transposition

Table 1: Results of testing various virus scanners on obfuscated viruses.

in control flow and register reassignments. Similarly,one must construct an abstract representation of the ex-ecutable in which we are trying to find a malicious pat-tern. Once the generalization of the malicious code andthe abstract representation of the executable are created,we can then detect the malicious code in the executable.We now describe each component of SAFE.Generalizing the malicious code: Building the mali-cious code automatonThe malicious code is generalized into an automatonwith uninterpreted symbols. Uninterpreted symbols(Section 6.2) provide a generic way of representing datadependencies between variables without specifically re-ferring to the storage location of each variable.Pattern-definition loaderThis component takes a library of abstraction patternsand creates an internal representation. These abstractionpatterns are used as alphabet symbols by the maliciouscode automaton.The executable loaderThis component transforms the executable into an in-ternal representation, here the collection of controlflow graphs (CFGs), one for each program procedure.

The executable loader (Figure 5) uses two off-the-shelfcomponents, IDA Pro and CodeSurfer. IDA Pro (byDataRescue [42]) is a commercial interactive disassem-bler. CodeSurfer (by GrammaTech, Inc. [24]) is aprogram-understanding tool that performs a variety ofstatic analyses. CodeSurfer provides an API for ac-cess to various structures, such as the CFGs and the callgraph, and to results of a variety of static analyses, suchas points-to analysis. In collaboration with GrammaT-ech, we have developed a connector that transforms IDAPro internal structures into an intermediate form thatCodeSurfer can parse.The annotatorThis component inputs a CFG from the executable andthe set of abstraction patterns and produces an anno-tated CFG, the abstract representation of a program pro-cedure. The annotated CFG includes information thatindicates where a specific abstraction pattern was foundin the executable. The annotator runs for each proce-dure in the program, transforming each CFG. Section 6describes the annotator in detail.

Code obfuscated through Code obfuscated through Code obfuscated throughOriginal code dead-code insertion code transposition instruction substitutioncall 0h call 0h call 0h call 0hpop ebx pop ebx pop ebx pop ebxlea ecx, [ebx+42h] lea ecx, [ebx+42h] jmp S2 lea ecx, [ebx+42h]push ecx nop (*) S3: push eax sub esp, 03hpush eax nop (*) push eax sidt [esp - 02h]push eax push ecx sidt [esp - 02h] add [esp], 1Chsidt [esp - 02h] push eax jmp S4 mov ebx, [esp]pop ebx inc eax (**) add ebx, 1Ch inc espadd ebx, 1Ch push eax jmp S6 clicli dec [esp - 0h] (**) S2: lea ecx, [ebx+42h ] mov ebp, [ebx]mov ebp, [ebx] dec eax (**) push ecx

sidt [esp - 02h] jmp S3pop ebx S4: pop ebxadd ebx, 1Ch clicli jmp S5mov ebp, [ebx] S5: mov ebp, [ebx]

Figure 3: Examples of obfuscation through dead-code insertion, code transposition, and instruction substitution. Newly addedinstructions are highlighted.

Norton® McAfee® Command®SAFEAntivirus VirusScan Antivirus

7.0 6.01 4.61.2

Chernobyl original ✓ ✓ ✓ ✓

obfuscated ✕[1] ✕[1,2] ✕[1,2] ✓

z0mbie-6.boriginal ✓ ✓ ✓ ✓

obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓

f0sf0r0original ✓ ✓ ✓ ✓

obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓

Hareoriginal ✓ ✓ ✓ ✓

obfuscated ✕[1,2] ✕[1,2] ✕[1,2] ✓

Obfuscations considered: [1] = nop-insertion (a form of dead-code insertion)[2] = code transposition

Table 1: Results of testing various virus scanners on obfuscated viruses.

in control flow and register reassignments. Similarly,one must construct an abstract representation of the ex-ecutable in which we are trying to find a malicious pat-tern. Once the generalization of the malicious code andthe abstract representation of the executable are created,we can then detect the malicious code in the executable.We now describe each component of SAFE.Generalizing the malicious code: Building the mali-cious code automatonThe malicious code is generalized into an automatonwith uninterpreted symbols. Uninterpreted symbols(Section 6.2) provide a generic way of representing datadependencies between variables without specifically re-ferring to the storage location of each variable.Pattern-definition loaderThis component takes a library of abstraction patternsand creates an internal representation. These abstractionpatterns are used as alphabet symbols by the maliciouscode automaton.The executable loaderThis component transforms the executable into an in-ternal representation, here the collection of controlflow graphs (CFGs), one for each program procedure.

The executable loader (Figure 5) uses two off-the-shelfcomponents, IDA Pro and CodeSurfer. IDA Pro (byDataRescue [42]) is a commercial interactive disassem-bler. CodeSurfer (by GrammaTech, Inc. [24]) is aprogram-understanding tool that performs a variety ofstatic analyses. CodeSurfer provides an API for ac-cess to various structures, such as the CFGs and the callgraph, and to results of a variety of static analyses, suchas points-to analysis. In collaboration with GrammaT-ech, we have developed a connector that transforms IDAPro internal structures into an intermediate form thatCodeSurfer can parse.The annotatorThis component inputs a CFG from the executable andthe set of abstraction patterns and produces an anno-tated CFG, the abstract representation of a program pro-cedure. The annotated CFG includes information thatindicates where a specific abstraction pattern was foundin the executable. The annotator runs for each proce-dure in the program, transforming each CFG. Section 6describes the annotator in detail.

Kommerzielle VirenscannererkanntenmutierteVarianten nicht!

Page 25: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

PolymorpheMalware:BeispielfürAdreßänderungenVerwendungeinerstatischgelinktenLibrary• Identischer

Assembler-Quelltext

• UnterschiedlicheabsoluteAdressendurchanderesSpeicherlayout

• KomplexereSignaturerkennungnotwendig

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

FLIRT Signatures: Good Scenario

● Library statically-linked, not recompiled

25

Page 26: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

PolymorpheMalware:BeispielfürCompileroptimierungenCompilereliminiertnichtbenötigteMaschineninstruktionen• Oftidentischer

(z.B.C-)Quelltext• Optimierungs-

phase desCompilerserkennt,dasseinigeerzeugteMaschinen-instruktionen inKontextnichtbenötigtwerden

• EntferntdieseausProgramm

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

FLIRT Signatures: Bad Scenario

● Library was recompiled

26

Page 27: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

KomplexereStatischeAnalyse:AbstrakteInterpretation

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

Idee• ErmittelnderSemantikeinesProgramms,ohneesauszuführen• ErkennungvonAnomalien

Methode:AbstrakteInterpretation• Informationen über dasVerhalten vonProgrammen (Analyse

derSemantik)erhalten,indemmanvonTeilen desProgrammsabstrahiert unddieAnweisungen Schritt für Schrittnachvollzieht (Interpretation)

27

PatrickCousot,Radhia Cousot:“Abstractinterpretation:aunified latticemodelforstaticanalysisofprogramsbyconstructionorapproximationoffixpoints”, SixthAnnualACMSIGPLAN-SIGACTSymposium onPrinciplesofProgramming Languages,1977

Page 28: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

AbstrakteInterpretation

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

Vorgehensweise• KonzentrationaufTeilaspektederAusführungderAnweisungen• GezieltesWeglassenvonInformation(Abstraktion)• Ergebnis:NäherungandieProgrammsemantik• KannfürgewünschtenZweckvollkommenausreichendsein

• VieleEigenschaftenvonProgrammensindnichtberechenbar• MankannkeinProgrammangeben,welchessieinendlicherZeitmit

endlichenSpeicherresourcen fürbeliebigeProgrammeberechnet(→Halteproblem)

• Approximation(WeglasseneinigerInformationen)ermöglichteinigeweitereAnalysen

28

Page 29: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

AbstrakteInterpretation:Beispiel

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

AbstrakteInterpretationistkomplexermathematischerAnsatz• DieGrundprinzipiensindabereinfach• Beispiel:BestimmendesVorzeichensvonVariablen

29

Abstract Interpretation: Signs Analysis

● AI is complicated, but the basic ideas are not

● Ex: determine each variable's sign at each point

● Replaced the

● concrete state with an abstract state

● concrete semantics with an abstract semantics

Ersetzen vonkonkretem Zustand durch abstrakten Zustandundkonkreter Semantik durch abstrakte Semantik

Page 30: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

Concept: Abstract the State

● Different abstract interpretations use different abstract states.

● For the signs analysis, each variable could be

● Unknown: either positive or negative (+/-)

● Positive: x >= 0 (0+)

● Negative: x <= 0 (0-)

● Zero (0)

● Uninitialized (?)

● Ignore all other information, e.g., the actual values of variables.

AbstrakteInterpretation:Zustandsabstraktion

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

VerschiedeneMethodenderabstr. InterpretationverwendenunterschiedlicheAbstraktionen• FürVorzeichenanalyseistjedeVariable:• unbekannt:positivodernegativ(+/-)• positiv:x>=0(0+)• negativ:x<=0(0-)• null(0)• uninitialisiert (?)

• AlleweiterenInformationenwerdenignoriert• Hierz.B.derkonkreteWertderVariablen

30

Page 31: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

AbstrakteInterpretation:Semantikabstraktion(1)

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

AbstrakteInterpretationbefolgtVorzeichenregelnfürdieMultiplikation:• „+“*„+“=„+“• „–“*„–“=„+“• „–“*„+“=„+“*„–“=„–“

• Hinweis:dieseRegelnbeziehensichaufmathematischeganzeZahlen,imComputerrepräsentierteZahlenkönnenüberlaufenundunabsichtlichihrVorzeichenändern!

31

Concept: Abstract the Semantics (*)● Abstract multiplication follows the well-known

“rule of signs” from grade school

● A positive times a positive is positive

● A negative times a negative is positive

● A negative times a positive is negative

● Note: these remarks refer to mathematical integers; machine integers are subject to overflow

* ? 0 0+ 0- +/-

? +/- 0 +/- +/- +/-

0 0 0 0 0 0

0+ +/- 0 0+ 0- +/-

0- +/- 0 0- 0+ +/-

+/- +/- 0 +/- +/- +/-

Page 32: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

AbstrakteInterpretation:Semantikabstraktion(2)

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

AbstrakteInterpretationbenötigtVorzeichenregelnfürdieAddition:• „+“+„+“=„+“• „–“+„–“=„–“• „–“+„+“=„+“+„–“=„?“

• Beispiel:• –5+5⇒ konkretesErgebnis0• –6+5⇒ konkretesErgebnis –1• –5+6⇒ konkretesErgebnis 1

32

Concept: Abstract the Semantics (+)

● Positive + positive = positive.

● Negative + negative = negative.

● Negative + positive = unknown:

● -5 + 5. Concretely, the result is 0.

● -6 + 5. Concretely, the result is -1.

● -5 + 6. Concretely, the result is 1.

+ ? 0 0+ 0- +/-

? +/- +/- +/- +/- +/-

0 +/- 0 0+ 0- +/-

0+ +/- 0+ 0+ +/- +/-

0- +/- 0 0- 0+ +/-

+/- +/- 0 +/- +/- +/-

Page 33: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

KonkreteAnwendungderabstraktenInterpretation:SprunganalyseHerausfindenderStruktureinercompilierten „switch“-Anweisung• CompilerverwendenSprungtabellenfüreffizientenCode

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

33

Switch Tables: Contiguous, Indexed

Page 34: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

Sprunganalyse(2)

SwitchmitweitverteiltenWerten• Sprungtabellewärenurdünnbesetzt(vieleNullen)• ErsetzungdurchFolgenvon„if“-Abfagen istineffizient• VieleAbfragenundbedingteSprüngenotwendig:O(N)• stattdessenverwendenCompilerEntscheidungsbäume,

Aufwand:O(log(n))

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

34

Switch Tables: Sparsely-Populated

Switch cases are sparsely-distributed.

Cannot implement efficiently with a table.

One option is to replace the construct with a series of if-statements.

This works, but takes O(N) time.

Instead, compilers generate decision trees that take O(log(N)) time, as shown on the next slide.

Page 35: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

Decision Trees for Sparse SwitchesSprunganalyse(3)

Entscheidungsbaum

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

35

Page 36: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

Sprunganalyse:Assemblercode(1)

ImplementierungdesEntscheidungsbaums

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

36

Assembly Language Reification

Additional, slight complication: red instructions modify EAX throughout the decision tree.

Vergleich mit Entscheidungswert

Größer?Springen!

Gleich?Springen!

Sonst:weiter im Code,nächsten Vergleich ausführen

Kleines Problem:eax wird vonmarkierten Instruktionenverändert

Page 37: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

Sprunganalyse:Assemblercode(2)

Kontrollflussdarstellung:

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

37

Assembly Language Reification, Graphical

Page 38: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

AbstraktionderEntscheidungen imswitch-Statement• Erkenntnis:• wichtig:welcheWertebereiche führen

zurAusführungwelchescase-Falls?

• Datenabstraktion:• Intervalle[l,u]mitl<=u

• switch implementiertmitsub,dec undcmp-Instruktionen– allesSubtraktionen– undbedingtenSprüngen

• Semantikabstraktion:• BeibehaltenderSubtraktionen• Verzweigungbeibranch-Befehlen

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

38

Switch Tables: Sparsely-Populated

Switch cases are sparsely-distributed.

Cannot implement efficiently with a table.

One option is to replace the construct with a series of if-statements.

This works, but takes O(N) time.

Instead, compilers generate decision trees that take O(log(N)) time, as shown on the next slide.

Assembly Language Reification

Additional, slight complication: red instructions modify EAX throughout the decision tree.

Page 39: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

Analysis Results

Beginning with no information about arg_0, each path through the decision tree induces a constraint upon its range of possible values, with single values or simple

ranges at case labels.

Analyseergebnisseswitch-Statement• KeineinitialeInformationüberarg_0vorhanden• JederPfaddurchCodeschränktmöglicheWertefürarg_0ein• WertanBlättern(case-Labels):Wert/einfacherWertebereich

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

39

Page 40: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

DynamischeAnalysen

ÜberwachungdesProgrammverhaltenszurLaufzeit• StatischeAnalysenbenötigenSignaturen• Könnenneueundevtl.mutierteMalwarenichterkennen

• IdeederdynamischenAnalyse:• BetrachtendesVerhaltenseinesProgrammszurLaufzeit

• SemantischeMethode– AnalysedesVerhaltens einesProgramms

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

40

Page 41: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

DynamischeAnalyse:KontrollflussverfolgungIdee:Überprüfen,obtatsächlichstattgefundenerKontrollflussmitSemantikeinesProgrammsübereinstimmt• z.B.aufFunktionsebene:

ÜberprüfenvonRücksprungadressenbeiEintrittinFunktion

• VergleichmitFunktionsaufruf-graph

• Entdecktz.B.manipulierteRück-sprungadressen

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

41

Page 42: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

KomplexerFunktions-aufrufgraphKomplexereProgrammemachenAnalyseaufwendig...

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

42

Page 43: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

ImplementierungderKontrollflussverfolgungIdee:ErstelleneinerRepräsentationdesAufrufgraphen• Automatischdurch

CompilererstellbarBeiEintrittinFunktion:• Überprüfen,obKantevom

Aufrufer(Returnadresse istauf demStack!)inGraphenthaltenist

• z.B.darf(kann?)„io_png_write_raw“nurvon„io_png_write_f32“und„io_png_write_u8“aufgerufenwerden

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

43

Page 44: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

ProblemederKontrollflussverfolgungDynamischerCode• VerwendungvonFunktionszeigern/Sprungtabellen:

CompilerkannzurÜbersetzungszeitkeinevollständigeListederFunktionenerstellen,diezumAufrufeinerFunktionfführen• AlsoistderAufrufgraphunvollständig

• Lösung:• Zeigeranalysewäreerforderlich...✘

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

44

Page 45: Malware-Analyse und Reverse Engineering€¦ · Analyse des “Stoned”-Virus MARE 08 – Statische und dynamische Analysen zur Malware-Erkennung Bootsektor = „Master Boot Record“

Fazit

StatischeMethodenzurErkennungvonMalware• Signaturcheckssindproblematisch• MutierendeMalware,mangelndeAktualität

• AufwendigerestatischeMethodenexistieren• AbstrakteInterpretation

DynamischeAnalysenzurErkennungvonMalware• KontrollflussanalysedurchÜberwachungvonFunktionsaufrufen• mehr:nächsteWoche...

MARE08– StatischeunddynamischeAnalysenzurMalware-Erkennung

45