Zhuo Zhang  张倬


Oops, your browser doesn't support this application.

I will be joining the Department of Computer Science at Columbia University as an Assistant Professor in Spring 2026. I completed my Ph.D. at Purdue University under the supervision of Samuel Conte Professor Xiangyu Zhang. Prior to that, I earned my B.Sc. with Zhiyuan Honors from Shanghai Jiao Tong University (SJTU).

I study the security of modern software systems, combining formal reasoning with data-driven methods to encode human expert insight in scalable, precise ways. My research aims to push the boundaries of how we rigorously understand software behavior, by improving either (a) the precision of analyses in a sound manner or (b) the scalability of analyses in an unsound manner, while explicitly qualifying the degree of confidence in their unsoundness.

Beyond academia, I have been was an avid capture-the-flag (CTF) player. I enjoy hardcore hacking, bug hunting, and contributing to open-source projects. You can learn more about my independent work on Github (⭐ ).

🎯 I'm actively looking for self-motivated students to join my research group at Columbia CS. Strong software engineering skills, experience with low-level systems, or a background in CTFs would be highly preferred.

📬 If you're interested in working with me, please drop me an email with (1) your CV and (2) a brief overview of your research interests and relevant background.

May, 2025 I will join the Department of Computer Science at Columbia University as an Assistant Professor!
Oct, 2024 I am deeply honored to have received the prestigious ACM SIGSAC Doctoral Dissertation Award!
May, 2023 We have released our Web3 bug dataset, which has received stars on GitHub!

The best way to reach me is via email. If you're specifically interested in topics related to binary analysis or Web3, feel free to reach out to [email protected] or [email protected], respectively. For other general inquiries, please contact [email protected].

Special Notes. Sometimes I might miss your email, so don't hesitate to follow up if you don't hear back for a while.

That said, I do have a small trick to help you craft your email in a way that will easily catch my attention. I've encrypted this tip using a customized script, mycipher.py. Below is the encrypted message (in hex format):

040fc29c860b901336eae5caaf6942d79591253e75716770e794a31b0643382099ab594d5a4a106348b91f1ea98a4286b664f899a1ad632503253d4c3c87e89b315f8180fe00765afb5c0f286c573b6dc36d408f809cc9a368d20793a6c8a6f65541bc5a5b04eb4c1bbcd8742f94f9a7616ce1f347775badeaf5ac4b80be8e4a7979ac88f170ba7f86a7ad379e4381ead39a7bc33a05ebe9b492fabbeb12f318aa1aef96a518f59d16da6daafd9e1f5c930e241894a558b5609fcfb5b4094adf3344188d58f68a9c4b5266816f547313163b62cfb314811ad7a30bad24cfaafc62a31abfba4f45b28bf8c57585a8e8fa30acf74d87804afa3e213c4ad1925570ae221dc5f440a3aa96d48d02c743de13fc878836571bfcb57cbab43316995c248bb649e78561c149545ababf35b0082fff118fb4b9addd888bb27b201395ee6839dfafe9601d948bafdb669087f0ea3b9a071f22e50b76795741fb163f13e4ca4c465838a29deb24db89fdb56729987a98df495375d813e3901a6ee2d19dcd5d179d7fd184938b37e5963a1c1889da013a1af9c5f6df86dbea83ffb85b85b861

Unfortunately, the secret encryption key in mycipher.py has been lost 😕, and all I have are a few known plaintext-ciphertext pairs in data. If you do manage to decrypt the instructions above 🕵️‍♂️, just follow what they say 🧩.

  • Revamping Binary Analysis with Sampling and Probabilistic Inference Logo Logo   Logo Logo
    Zhuo Zhang, Ph.D. Dissertation, Purdue University, August 2023
    ACM SIGSAC Doctoral Dissertation Award
    Excellent Score in Code Delivery and Evaluation of Office of Naval Research (ONR)
  • Keywords: Decompilation, Probabilistic Analysis close
    Abstract:
         Binary analysis, a cornerstone technique in cybersecurity, enables the examination of binary executables, irrespective of source code availability. It plays a critical role in understanding program behaviors, detecting software bugs, and mitigating potential vulnerabilities, specially in situations where the source code remains out of reach. However, aligning the efficacy of binary analysis with that of source-level analysis remains a significant challenge, primarily due to the uncertainty caused by the loss of semantic information during the compilation process.
         This dissertation presents an innovative probabilistic approach, termed as probabilistic binary analysis, designed to combat the intrinsic uncertainty in binary analysis. It builds on the fundamental principles of program sampling and probabilistic inference, enhanced further by an iterative refinement architecture. The dissertation suggests that a thorough and practical method of sampling program behaviors can yield a substantial quantity of hints which could be instrumental in recovering lost information, despite the potential inclusion of some inaccuracies. Consequently, a probabilistic inference technique is applied to systematically incorporate and process the collected hints, suppressing the incorrect ones, thereby enabling the interpretation of high-level semantics. Furthermore, an iterative refinement mechanism is deployed to augment the efficiency of the probabilistic analysis in subsequent applications, facilitating the progressive enhancement of analysis outcomes through an automated or human-guided feedback loop.
         This work offers an in-depth understanding of the challenges and solutions related to assessing low-level program representations and systematically handling the inherent uncertainty in binary analysis. It aims to contribute to the field by advancing the development of precise, reliable, and interpretable binary analysis solutions, thereby setting the groundwork for future exploration in this domain.

  • Unleashing the Power of Generative Model in Recovering Variable Names from Stripped Binary
    Xiangzhe Xu, Zhuo Zhang, Zian Su, Ziyang Huang, Shiwei Feng, Yapeng Ye, Nan Jiang, Danning Xie, Siyuan Cheng, Lin Tan, Xiangyu Zhang
    Proceedings of the 32rd Network and Distributed System Security Symposium (NDSS 2025)
    San Diego, CA, February 2025
  • Keywords: Binary Analysis, Variable Name Recovery, Large Language Model close
    Abstract:
         Decompilation aims to recover the source code form of a binary executable. It has many applications in security and software engineering such as malware analysis, vulnerability detection and code reuse. A prominent challenge in decompilation is to recover variable names. We propose a novel method that leverages the synergy of large language model (LLM) and program analysis. Language models encode rich multi-modal knowledge, but its limited input size prevents providing sufficient global context for name recovery. We propose to divide the task to many LLM queries and use program analysis to correlate and propagate the query results, which in turn improves the performance of LLM by providing additional contextual information. Our results show that 75% of the recovered names are considered good by users and our technique outperforms the state-of-the-art technique by 16.5% and 20.23% in precision and recall, respectively.

  • Consolidating Smart Contracts with Behavioral Contracts
    Guannan Wei, Danning Xie, Wuqi Zhang, Yongwei Yuan, Zhuo Zhang
    Proceedings of the ACM on Programming Languages Volume 8 Issue PLDI (PLDI 2024)
    Copenhagen, Demark, June, 2024
  • Keywords: Behavioral Contract, Smart Contract, Runtime Monitoring close
    Abstract:
         Ensuring the reliability of smart contracts is of vital importance due to the wide adoption of smart contract programs in decentralized financial applications. However, statically checking many rich properties of smart contract programs can be challenging. On the other hand, dynamic validation approaches have shown promise for widespread adoption in practice. Nevertheless, as part of the programming environment for smart contracts, existing dynamic validation approaches have not provided programmers with a notion to clearly articulate the interface between components, especially for addresses representing opaque contract instances. We argue that the ``design-by-contract'' approach should complement the development of smart contract programs. Unfortunately, there is limited linguistic support for it in existing smart contract languages.
         In this paper, we design a Solidity language extension ConSol that supports behavioral contracts. ConSol provides programmers with a modular specification and monitoring system for both functional and latent address behaviors. The key capability of ConSol is to attach specifications to first-class addresses and monitor violations when invoking these addresses. We evaluate ConSol using 20 real-world cases, demonstrating its effectiveness in expressing critical conditions and preventing attacks. Additionally, we assess ConSol's efficiency and compare gas consumption with manually inserted assertions, showing that our approach introduces only marginal gas overhead. By separating specifications and implementations using behavioral contracts, ConSol assists programmers in writing more robust and readable smart contracts.

  • Nyx: Detecting Exploitable Front-Running Vulnerabilities in Smart Contracts
    Wuqi Zhang, Zhuo Zhang, Qingkai Shi, Lu Liu, Lili Wei, Yepang Liu, Xiangyu Zhang, Shing-Chi Cheung
    Proceedings of the 45th IEEE Symposium on Security and Privacy (Oakland 2024)
    San Francisco, CA, May, 2024   [code]
  • Keywords: Frontrunning Attack, Web3 Security, Blockchain close
    Abstract:
         Smart contracts are susceptible to front-running attacks, in which malicious users leverage prior knowledge of upcoming transactions to execute attack transactions in advance and benefit their own portfolios. Existing contract analysis techniques raise a number of false positives and false negatives in that they simplistically treat data races in a contract as front-running vulnerabilities and can only analyze contracts in isolation. In this work, we formalize the definition of exploitable front-running vulnerabilities based on previous empirical studies on historical attacks, and present Nyx, a novel static analyzer to detect them. Nyx features a Datalog-based preprocessing procedure that efficiently and soundly prunes a large part of the search space, followed by a symbolic validation engine that precisely locates vulnerabilities with an SMT solver. We evaluate Nyx using a large dataset that comprises 513 realworld front-running attacks in smart contracts. Compared to six state-of-the-art techniques, Nyx surpasses them by 32.64%-90.19% in terms of recall and 2.89%-70.89% in terms of precision. Nyx has also identified four zero-days in real-world smart contracts.

  • On Large Language Models' Resilience to Coercive Interrogation
    Zhuo Zhang, Guangyu Sheng, Guanhong Tao, Siyuan Cheng, Xiangyu Zhang,
    Proceedings of the 45th IEEE Symposium on Security and Privacy (Oakland 2024)
    San Francisco, CA, May, 2024   [code]   [website]
  • Keywords: Jailbreaking, Model Alignment, Large Language Model close
    Abstract:
         Large Language Models (LLMs) are increasingly employed in numerous applications. It is hence important to ensure that their ethical standard aligns with humans’. However, existing jail-breaking efforts show that such alignment could be compromised by well-crafted prompts. In this paper, we disclose a new threat to LLMs alignment when a malicious actor has access to the top-k token predictions at each output position of the model, such as in all open-source LLMs and many commercial LLMs that provide the needed APIs (e.g., some GPT versions). It does not require crafting any prompt. Instead, it leverages the observation that even when an LLM declines a toxic query, the harmful response is concealed deep within the output logits. We can coerce the model to disclose it by forcefully using low-ranked output tokens during autoregressive output generation, and such forcing is only needed in a very small number of selected output positions. We call it model interrogation. Since our method operates differently from jail-breaking, it has better effectiveness than state-of-theart jail-breaking techniques (92% versus 62%) and is 10 to 20 times faster. The toxic content elicited by our method is also of better quality. More importantly, it is complementary to jail-breaking, and a synergetic integration of the two exhibits superior performance over individual methods. We also find that with interrogation, harmful content can even be extracted from models customized for coding tasks.

  • Pelican: Exploiting Backdoors of Naturally Trained Deep Learning Models In Binary Code Analysis   Logo
    Zhuo Zhang, Guanhong Tao, Guangyu Shen, Shengwei An, Qiuling Xu, Yingqi Liu, Yapeng Ye, Yaoxuan Wu, Xiangyu Zhang
    Proceedings of the 32nd USENIX Security Symposium (Security 2023)
    Anaheim, CA, August, 2023   [bibtex]
  • Keywords: Binary Analysis, Deep Learning Security, Probabilistic Analysis close
    Abstract:
    Logo      Deep Learning (DL) models are increasingly used in many cyber-security applications and achieve superior performance compared to traditional solutions. In this paper, we study backdoor vulnerabilities in naturally trained models used in binary analysis. These backdoors are not injected by attackers but rather products of defects in datasets and/or training processes. The attacker can exploit these vulnerabilities by injecting some small fixed input pattern (e.g., an instruction) called backdoor trigger to their input (e.g., a binary code snippet for a malware detection DL model) such that misclassification can be induced (e.g., the malware evades the detection). We focus on transformer models used in binary analysis. Given a model, we leverage a trigger inversion technique particularly designed for these models to derive trigger instructions that can induce misclassification. During attack, we utilize a novel trigger injection technique to insert the trigger instruction(s) to the input binary code snippet. The injection makes sure that the code snippets' original program semantics are preserved and the trigger becomes an integral part of such semantics and hence cannot be easily eliminated. We evaluate our prototype PELICAN on 5 binary analysis tasks and 15 models. The results show that PELICAN can effectively induce misclassification on all the evaluated models in both white-box and black-box scenarios. Our case studies demonstrate that PELICAN can exploit the backdoor vulnerabilities of two closed-source commercial tools.

  • Your Exploit is Mine: Instantly Synthesizing Counterattack Smart Contract
    Zhuo Zhang, Zhiqiang Lin, Marcelo Morales, Xiangyu Zhang, Kaiyuan Zhang
    Proceedings of the 32nd USENIX Security Symposium (Security 2023)
    Anaheim, CA, August, 2023   [bibtex]
  • Keywords: Maximal Extractable Value, Web3 Security, Blockchain close
    Abstract:
         Smart contracts are susceptible to exploitation due to their unique nature. Despite efforts to identify vulnerabilities using fuzzing, symbolic execution, formal verification, and manual auditing, exploitable vulnerabilities still exist and have led to billions of dollars in monetary losses. To address this issue, it is critical that runtime defenses are in place to minimize exploitation risk. In this paper, we present STING, a novel runtime defense mechanism against smart contract exploits. The key idea is to instantly synthesize counterattack smart contracts from attacking transactions and leverage the power of Maximal Extractable Value (MEV) to front run attackers. Our evaluation with 62 real-world recent exploits demonstrates its effectiveness, successfully countering 54 of the exploits (i.e., intercepting all the funds stolen by the attacker). In comparison, a general front-runner defense could only handle 12 exploits. Our results provide a clear proof-of-concept that STING is a viable defense mechanism against smart contract exploits and has the potential to significantly reduce the risk of exploitation in the smart contract ecosystem.

  • Demystifying Exploitable Bugs in Smart Contracts
    Zhuo Zhang, Brian Zhang, Wen Xu, Zhiqiang Lin
    Proceedings of the 45st ACM/IEEE International Conference on Software Engineering (ICSE 2023)
    Melbourne, Australia, May 2023   [dataset: ⭐ ]
  • Keywords: Smart Contract, Web3 Security, Blockchain close
    Abstract:
         Exploitable bugs in smart contracts have caused significant monetary loss. Despite the substantial advances in smart contract bug finding, exploitable bugs and real-world attacks are still trending. In this paper we systematically investigate 516 unique real-world smart contract vulnerabilities in years 2021-2022, and study how many can be exploited by malicious users and cannot be detected by existing analysis tools. We further categorize the bugs that cannot be detected by existing tools into seven types and study their root causes, distributions, difficulties to audit, consequences, and repair strategies. For each type, we abstract them to a bug model (if possible), facilitating finding similar bugs in other contracts and future automation. We leverage the findings in auditing real world smart contracts, and so far we have been rewarded with $102,660 bug bounties for identifying 15 critical zero-day exploitable bugs, which could have caused up to $22.52 millions monetary loss if exploited.

  • OSPREY: Recovery of Variable and Data Structure via Probabilistic Analysis for Stripped Binary   Logo
    Zhuo Zhang, Yapeng Ye, Wei You, Guanhong Tao, Wen-chuan Lee, Yonghwi Kwon, Yousra Aafer, Xiangyu Zhang
    Proceedings of the 42th IEEE Symposiums on Security and Privacy (Oakland 2021)
    Virtually, May 2021   [bibtex]   [evaluation data]
  • Keywords: Binary Analysis, Variable Recovery, Probabilistic Analysis, Reverse Engineering close
    Abstract:
    Logo      Recovering variables and data structure information from stripped binary is a prominent challenge in binary program analysis. While various state-of-the-art techniques are effective in specific settings, such effectiveness may not generalize. This is mainly because the problem is inherently uncertain due to the information loss in compilation. Most existing techniques are deterministic and lack a systematic way of handling such uncertainty. We propose a novel probabilistic technique for variable and structure recovery. Random variables are introduced to denote the likelihood of an abstract memory location having various types and structural properties such as being a field of some data structure. These random variables are connected through probabilistic constraints derived through program analysis. Solving these constraints produces the posterior probabilities of the random variables, which essentially denote the recovery results. Our experiments show that our technique substantially outperforms a number of state-of-the-art systems, including IDA, Ghidra, Angr, and Howard. Our case studies demonstrate the recovered information improves binary code hardening and binary decompilation.

  • StochFuzz: Sound and Cost-effective Fuzzing of Stripped Binaries by Incremental and Stochastic Rewriting   Logo
    Zhuo Zhang, Wei You, Guanhong Tao, Yousra Aafer, Xuwei Liu, Xiangyu Zhang
    Proceedings of the 42th IEEE Symposiums on Security and Privacy (Oakland 2021)
    Virtually, May 2021   [benchmarks]   [bibtex]   [code: ⭐ ]   [poster]
    CSAW 2021 Best Applied Security Paper Award TOP-10 Finalists
  • Keywords: Fuzz, Binary Rewriting, Probabilistic Analysis close
    Abstract:
    Logo      Fuzzing stripped binaries poses many hard challenges as fuzzers require instrumenting binaries to collect runtime feedback for guiding input mutation. However, due to the lack of symbol information, correct instrumentation is difficult on stripped binaries. Existing techniques either rely on hardware and expensive dynamic binary translation engines such as QEMU, or make impractical assumptions such as binaries do not have inlined data. We observe that fuzzing is a highly repetitive procedure providing a large number of trial-and-error opportunities. As such, we propose a novel incremental and stochastic rewriting technique STOCHFUZZ that piggy-backs on the fuzzing procedure. It generates many different versions of rewritten binaries whose validity can be approved/disapproved by numerous fuzzing runs. Probabilistic analysis is used to aggregate evidence collected through the sample runs and improve rewriting. The process eventually converges on a correctly rewritten binary. We evaluate STOCHFUZZ on two sets of real-world programs and compare with five other baselines. The results show that STOCHFUZZ outperforms state-of-the-art binary-only fuzzers (e.g., e9patch, ddisasm, and RetroWrite) in terms of soundness and cost-effectiveness and achieves performance comparable to source-based fuzzers. STOCHFUZZ is publicly available.

  • BDA: Practical Dependence Analysis for Binary Executables by Unbiased Whole-Program Path Sampling and Per-Path Abstract Interpretation Logo
    Zhuo Zhang, Wei You, Guanhong Tao, Guannan Wei, Yonghwi Kwon, Xiangyu Zhang
    Proceedings of the ACM on Programming Languages Volume 3 Issue OOPSLA (OOPSLA 2019)
    Athens, Greece, October 2019   [artifact]   [bibtex]
    ACM SIGPLAN Distinguished Paper Award
  • Keywords: Path Sampling, Abstract Interpretation, Binary Analysis, Data Dependence close
    Abstract:
    Logo      Binary program dependence analysis determines dependence between instructions and hence is important for many applications that have to deal with executables without any symbol information. A key challenge is to identify if multiple memory read/write instructions access the same memory location. The state-of-the-art solution is the value set analysis (VSA) that uses abstract interpretation to determine the set of addresses that are possibly accessed by memory instructions. However, VSA is conservative and hence leads to a large number of bogus dependences and then substantial false positives in downstream analyses such as malware behavior analysis. Furthermore, existing public VSA implementations have difficulty scaling to complex binaries.
         In this paper, we propose a new binary dependence analysis called BDA enabled by a randomized abstract interpretation technique. It features a novel whole program path sampling algorithm that is not biased by path length, and a per-path abstract interpretation avoiding precision loss caused by merging paths in traditional analyses. It also provides probabilistic guarantees. Our evaluation on SPECINT2000 programs shows that it can handle complex binaries such as gcc whereas VSA implementations from the-state-of-art platforms have difficulty producing results for many SPEC binaries. In addition, the dependences reported by BDA are 75 and 6 times smaller than Alto, a scalable binary dependence analysis tool, and VSA, respectively, with only 0.19% of true dependences observed during dynamic execution missed (by BDA). Applying BDA to call graph generation and malware analysis shows that BDA substantially supersedes the commercial tool IDA in recovering indirect call targets and outperforms a state-of-the-art malware analysis tool Cuckoo by disclosing 3 times more hidden payloads.

Academic Awards

Selected Capture-The-Flag (CTF)

  • 1st place at Paradigm CTF 2023 (w/ Offside Labs)
  • 1st place at DEFCON CTF 2020 (w/ A*0*E)
  • 1st place at the 40th IEEE S&P Celebration Scavenger Hunt (solo)
  • 4th place at DEFCON CTF 2018 (w/ A*0*E)
  • 3rd place at DEFCON CTF 2017 (w/ A*0*E)
  • "Advancing Security Red-Teaming through Probabilistic Binary Analysis" @ RIT, UT Arlington, TAMU, UH, Rice, WPI, Columbia, Duke, CityU HK, ASU, UNC Chapel Hill, HKUST, Cornell, Cornell Tech, CUHK, Georgia Tech, UT Austin, February - April 2025
  • "On Large Language Models' Resilience to Coercive Interrogation" @ Oakland'24, May 2024
  • "Your Exploit is Mine: Instantly Synthesizing Counterattack Smart Contract" @ USENIX Security'23, August 2023
  • "Pelican: Exploiting Backdoors of Naturally Trained Deep Learning Models In Binary Code Analysis" @ USENIX Security'23, August 2023
  • Program Committee Member
  • USENIX Security Symposium, 2025
    The ACM Conference on Computer and Communications Security (CCS), 2024, 2025
    International Conference on Software Engineering (ICSE), 2025, 2026
    International Conference on Automated Software Engineering (ASE), 2024
    International Symposium on Software Testing and Analysis (ISSTA), 2024, 2025
    International Symposium on Research in Attacks, Intrusions and Defenses (RAID), 2024, 2025
    The ACM ASIA Conference on Computer and Communications Security (ASIACCS), 2024
    Workshop on Binary Analysis Research (BAR@NDSS), 2022
  • Reviewer
  • IEEE Transactions on Software Engineering
    IEEE Transactions on Information Forensics and Security
    IEEE/ACM Transactions on Networking
    The Association for Computational Linguistics (ACL) Rolling Review

August 2017 - June 2019

Project: radeco (⭐ )
Radare2-based binary analysis framework