White Paper: Memorializing Online Transactions with PDF Documents
Memorializing Online Transactions with PDF Documents
This white paper was written by Gerald Holmann, founder and president of Qoppa Software.
Summary
Today, more and more transactions are being handled online across a broad range of categories. Transactions can be Business-to-Consumer (B2C) and Business-to-Business (B2B), local and international, goods and services, and can be settled using credit cards, bank transfers or peer to peer payment networks.
While different approaches have been deployed to ensure the safety of data transmission and to verify the identity of the parties, very little has been done to ensure that the transactions are memorialized in a secure and reliable manner.
Memorializing the transaction in a reliable way becomes critical anytime that a transaction needs to be revisited, which can happen for many reasons, including audits, disputes and others. In these situations, it is imperative to have the ability to retrieve the transaction in a human readable form, for close inspection.
Most eCommerce systems in common use do not provide this guarantee, with the potential to cause legal and financial problems when transactions have to be examined.
Current Approach
By far, the most common method to memorialize transactions is to store the transaction data in RDBMS systems, in database records spread across multiple tables. The transaction is broken up into its components ((i.e. item numbers, quantities, cost per item, line items, etc) and then stored as database records that can be spread across multiple tables.
The human readable / visual representation of the transaction is usually not stored at all.
If the transaction needs to be examined at any point in the future, the data is retrieved from the database and a human readable visual representation of the transaction is recreated either in printed form or for display in a browser.
Even when storing simple transactions, and more so for complex transactions, a single transaction will be stored as multiple records in multiple tables in a database. For instance, a simple invoice might be composed of a main record in an invoice table and multiple records in a line items table.
The transaction data then might have a complex structure and so recreating the human readable version of the transaction is not a trivial matter.
This approach generally works because there is a level of trust in the party that is storing the transaction, that the data in the database records will remain unaltered and that the method used to convert the data to a human readable form will remain the same as when the transaction was performed.
Neither of these conditions may be true or may not remain true over time. When either condition fails, it will not be possible to reconstruct the human readable form of the transaction accurately. There are two main reasons why this assumption might fail:
Data Integrity
Once stored in a database, transaction data is supposed to remain unchanged. However, there are many reasons why this may not be the case, including data migration, defects in the eCommerce system and even manual intervention by IT staff or hackers.
Any time that transaction data is modified, it creates the potential that the transaction might be changed in meaningful ways.
To compound the problem, database systems generally do not keep a full audit record of all transaction data, so in most cases where the data is modified, there is no way to tell that the data has been modified at all, much less to verify that the data matches the original transaction data.
Some scenarios where transaction data can be modified:
- Upgrades to the eCommerce system: Modern eCommerce systems and their components are constantly being upgraded to enhance features, address defects and improve security. The upgrades might include changes to the user interface, the transaction workflows and more importantly to the data stored for each transaction, for instance, to add fields for new options in a transaction.Anytime that the data structures are modified, the current data in the database system must be migrated to match the new structure. This can introduce errors in the data during the migration process, which in turns modifies the historical transactions.
- Complete change of the eCommerce system: Transaction handlers will occasionally change their eCommerce system in its entirety. When this happens, transaction data has to be migrated from the old system to the new system.The database structures of the two system will normally be very different, with many cases where there is no one-to-one correspondence between some of the data. This means that the data is not just copied over, but that it has to be transformed to match the new system.Keeping in mind that modern eCommerce systems may have hundreds of database tables, the probabilities of mapping data incorrectly or losing data in the porting process are quite high.Additionally, the new system will probably have a different set of features related to transactions, so there might not be adequate mapping of all fields and records in the old system to the new system. This would make it impossible to create an exact reproduction of the human readable version from the new data structures.
- Defects in eCommere system – There is no software system that is completely free of defects. Of concern in this context is the code that interfaces with the database systems, especially the code that forms SQL queries to get data in and out of databases. SQL script is particularly fragile with the possibility of extreme consequences. For instance, the exclusion of a single condition clause in a SQL statement can result in the corruption of all records in a table.This means that there is a possibility of old transaction data getting modified inadvertently by faulty SQL queries working on new transactions. These types of defects can go undetected for long periods of time, so when they occur, they can be particularly troublesome in that large amounts of historical transaction data might be invalid. By the time the corruption is detected, it might be too late to recover any of the original data.
- Finally, transaction data might be corrupted intentionally, with malicious intent. All data in a database system can be modified given sufficient access, and this access has to be granted at least to the IT staff managing database systems. Additionally, 3rd parties might gain access to these database systems by hacking (yes, it does happen). Once a user with malicious intent has access to the database system that stores the transaction data, all the data is accessible and modifiable.
Conversion of Transaction Data to Human Readable Representation
Transaction data is stored in databases as a group of records in multiple tables, connected through references. Even though the raw records can be viewed using a query tool, the data is not human readable for most intents. When a person needs to view a transaction, the data must be transformed into a visual representation that a human can understand. This human readable form can then be printed or displayed in the browser or stored in a PDF document.
Because transaction data is complex, the conversion to a human readable form is not a trivial process. The code that converts the data must gather all the different parts of the transaction, organize them and then create a visual representation that makes sense to a human.
Over time, as eCommerce systems are upgraded, the data that is stored with transactions will evolve, and this conversion process needs to evolve with it. As the data and the conversion process are modified, the converted results will also change, to a point where it might not be possible to reconstruct the same visual representation of the transaction at some point in the future.
Even when the visual representation might be equivalent to the original transaction, and look similar, accumulated changes over time might introduce subtle changes in the interpretation of the transaction until at one point they might make a material difference.
This problem is compounded when a company replaces their eCommerce system with a different system. Not only does the data have to be transformed to fit the schema of the new system, but the conversion methods in the two systems will be very different, resulting in differences in the human readable forms as well.
Proposed Method
We propose that by saving a visual representation of the transaction, created at the time that the transaction is made, these issues will be resolved. The natural format to store this representation would be the PDF format, for a number of reasons. PDF is the de facto universal electronic document format, it is used and accepted by anyone that uses electronic documents, and provides features for long term archiving and document integrity.
By capturing the visual representation at the time that the transaction is processed, it is guaranteed that the data used in creating the document is current and valid and the visual representation of the transaction matches the expectations of all the parties involved in the transaction.
Once capture, the PDF document should be stored separately from the transaction data records, preferably in a system designed to store documents, such as a document or content management system. Once stored, the document can carry a transaction id or similar reference to be able to connect to the transaction data records.
Some factors to consider when using this approach, and specifically when using the PDF format:
Background vs Foreground transactions
On foreground transactions where there are one or more humans actively involved, there will always be a human readable version of the transaction that is used through the transaction process. For instance, when a shopper is purchasing products online, they will see their cart with the items before checkout and they will see a confirmation screen after the transaction is committed. In such transactions, the confirmation screen (or equivalent) should be saved as the human readable version of the transaction.
On background (automated) transactions, the parties involved should agree on a specific visual representation when the automated processes are put in place, and then produce the views for every transaction at the time that each transaction is committed. On any changes to the view, the parties involved need to approve the new view before it is put into place.
PDF/A for Long Term Archiving
The PDF format describes a related sub-format called PDF/A, that is specifically intended for long term archival of PDF documents.
PDF/A compliant PDF files are still valid PDF files but have additional requirements to make sure that the content can be rendered correctly at any time in the future, even on different systems. These requirements include embedding all fonts in the PDF document, strict definition of the colors used in the document and others to remove all dependencies on the environment that the document may be opened in.
Storing transactions using the PDF/A format would ensure exact reproducibility anytime in the foreseeable future.
Digital Signatures in PDFs
In addition to using the PDF/A sub-format to store the transaction view, the PDF documents should also include a digital signature that includes a timestamp from a certified timestamp server.
The purpose of a digital signature in this context is not so much to positively identify the signer of the document, but rather to ensure that there are no modifications done to the document after its creation. Digital signatures in PDF documents can include a timestamp that will certify the time and date that the digital signature was applied.
By applying a digital signature at the time that the document is created, any changes made to the document henceforth would invalidate the signature, thereby protecting the original document against any modifications.
The embedded digital signature can and should also include a timestamp from a certified timestamp server. The timestamp serves two purposes:
- It certifies the time and date of the transaction.
- It protects the document against modification: Without a timestamp, the digital signature in a document can be removed, the document can then be modified, and then a new digital signature can be added to make the document appear legitimate. Including a timestamp as part of the signature would prevent this because the new signature would have a different time and date.
Conclusion
Today PDF documents are widely used for statements and legal contracts. We suggest expanding the use of PDF documents to keep electronic receipts of all important transactions. Simply storing scattered data in a database is an unreliable solution for long term archiving. Visual documents that are locked and approved by all parties can provide safe immovable records.