FileBuzz: Software Download
Find shareware, freeware downloads from thousands of software titles

Program Name: Parsing binary files with regular expressions

Published By: code.activestate.com

License Type: Freeware

Date Released: January 08, 2012


Parsing binary files with regular expressions v1.0 Instant Download

Parsing binary files with regular expressions Desciption:


Advertisements



This script allows you to use the regular expression engine to parse binary files, especially those for which the struct module alone is inadequate.

The typical way to parse binary data in Python is to use the unpack method of the struct module. This works well for fixed-width fields, but becomes more complicated when you need to parse variable-width fields. Perl's implementation of unpack accepts "*" as the field length, and even allows grouping with parentheses, which mitigates this problem. Python does not currently offer these features. Although you can dynamically generate a format string for unpack with a lot of slicing and calls to calcsize, the resulting code will likely be hard to read and error-prone.

Fortunately, in some cases there is a simpler way to do it: use the regular expression engine to grab each field, and use struct.unpack on the results.

First, you construct a regular expression (RE) describing the entire record structure, grouping each field you'd like to extract with parentheses, and compile it.

To create the regular expression, you just have to remember that one character in the RE equals one byte in the record. So, the expression ".." would match any short (2 bytes). To match a variable-width field, the RE
engine will have to be able to recognize where the field ends. In a null-terminated string, for example, the field ends with a zero byte. You'd therefore look for any number of characters followed by a null byte: "(.*?)". Note the use of the non-greedy qualifier "?" -- this way, we only match up to the first null, rather than the last null in the buffer.

When compiling, make sure to pass the re.DOTALL flag to the compiler, or it will consider bytes that happen to match ASCII '' to be newlines. Then, you use the findall method of the compiled expression object on your buffer. findall finds all non-overlapping matches, one match for each record. It returns a list of tuples, one for each match; each tuple will contain one element for each field you grouped in the RE.

You still need to unpack the fields in the tuples before using them, since they're still strings rather than usable values. Generally, you'll call unpack once for each field, with only one format character. (You can also group multiple consecutive fixed fields in one set of parentheses in the RE, and then unpack them in one call. But that may get confusing.)

The code above demonstrates how to unpack a binary file that has an indeterminate number of variable-width records, each consisting of a little-endian short, a null-terminated string, and two more shorts. It drops the resulting values into a list and also into a dictionary.

This technique is useful when your variable-width fields are terminated with a sentinel, such as the zero-terminated strings described above. If your field length is embedded in the data, and you can't use the "p" (Pascal string) modifier, you'll probably have to resort to slicing the buffer up manually.

This technique is also applicable even if your fields are all fixed-width. The findall method will operate on the entire buffer at once with a single regular expression, which saves you from having to dynamically create a long format string encapsulating all your data, or alternatively iterating over slices of the buffer.




License: Freeware | Downloads (144)

Platform: All

Language: Python


Related Software
New Reviews
New Downloads Top Downloads Top Search

New Downloads

dbForge Data Generator for Oracle
StarCode Plus POS and Inventory Manager
SEO PowerSuite
CryptoTracker.tax
Maiar Browser
1AVCenter
Aiseesoft Mac FoneLab
CSV2OFX
Active@ ZDelete
Apeaksoft Screen Recorder
CSV2QBJ
AnyMP4 MTS Converter
CSV2QBO
1AV Sound Recorder
FlexiHub for Linux
CSV2QFX
Soda PDF
ProgramEdit
dbForge Studio for Oracle
PresenTense Time Client
Express Accounts Accounting Free
SPRuler
Express Accounts Plus Edition
Bank2QBO
Bank2QFX

Top Downloads

Opera Mini
Turbo C++
Abyss Web Server X1
TaskMerlin Project Management Software
Macrorit Disk Partition Expert Server Edition
Foxit Advanced PDF Editor
Kids Online Browser
Rapid PHP Editor
MathCast
simplitec simplisafe
Open-School Community Edition
Cleanup and Update Tool for Cisco CUCM
FTP!
Count Code
Multi-Process Killer Portable
SCEA Part 2 & 3 Exam EPractize Labs Enterprise
Silva 2.1a2
Agama Web Menus
aXmag Free
Flash Player Pro
Red Call Recorder
Syston Data Recovery Free
Photo! 3D Album
Photo! Editor
Sondle Virtual Desktop Assist

Top Search

Microsoft Regular Expressions
Tools To Read Binary Files
Copy Files With File Names Too Long
Application To Decode Binary Files
Compare Binary Files Notepad
Ascii Binary Files
Recover Files With Zero Bytes
Add Text In Jpg Files With Java
Copying Files With Long File Names
How To Decrypt Binary Files
Batch Find Files With Similar Names
Binary Files Split Tool In Linux
Avi Files With Octave
Best Copy Files With Filter
Convert Cda Files With Mac
Copy Files With Crc Error
How To Merge Pdf Files With Foxit
Compare Two Huge Binary Files
How Do You Download Files With Lit
Erase Files With Bad File Names
Binary Files To Hex Code
Extract Binary Files Sql Databases
Open Autocad Files With Mac
Open Files With Usb
Data Recovery Binary Files