If you have ever used File Transfer Protocol (FTP), Common Internet File System (CIFS), Network File System (NFS) or Virtual Private Network (VPN) connections to server shares to move large files between separate locations, perhaps over a wide area network (WAN), like me you have probably found yourself asking “There must be a better way to do this!”
Anyone who must routinely transfer media files, image files, large data sets, log files or software installer files to and from client sites or must deliver the results of their projects to their customer’s servers knows that while the Internet has eliminated the cost of using courier services to transfer material, and in many cases, lowered the transfer time, it has introduced its own concerns. Unpredictable transfer speed, the lack of accountability, server maintenance, security issues and the difficulty of managing the optimization of the use of a network connection across the organisation have all contributed to the frustration of using traditional protocols to move significant quantities of data from A to B with any frequency.
File Transfer Protocol (or FTP) has historically been the tool of choice for many Media Managers and System Administrators to move files between servers and sites however, as data sets continue to grow in size and scope, Media Files become larger and the scope of distribution increases, and project turnaround times decrease, the increases in available network speed as technology improves do nothing to decrease the time it takes to get data moved. This is especially the case when the distance between the source and destination increases – in fact the opposite is true. As the distance between sites increases, network latency (a factor of network distance and error rates in the path taken by data from source to destination) becomes a more significant factor in determining the time it will take to move a quantity of data than the overall network bandwidth. This along with security concerns, and the lack of file tracking information make FTP of limited use in complex data distribution workflows.
Some very helpful Internet “Cloud” services such as Google Drive, Dropbox YouSendIt have made the transfer of items that are too large to be emailed very straightforward. These however use the standard internet protocols to move the data from a computer desktop into their servers and from their servers down to the destination on request. This is a good and economical model for perhaps periodically transferring moderate sized data packages but this mechanism may not be optimal for the needs of organisations who for example are creating multiple multi-gigabyte media files which must be transferred between set and production house, sites, bureaus or affiliate offices.
One company that has tackled the conundrum of how best to transfer data packages over Internet links in the most efficient and secure way is Aspera with their FASP technology. This American organisation’s products are the result of ground-breaking research into the behaviour of network protocols when used to transfer large units of data. This research has revealed that larger networks are not ideal for moving quantities of data over distance and require specially designed mechanisms in order to optimise the transfer of quantities of data predictably and securely.
How does FASP technology work? Well, to answer that I thought that it might be useful to explain some network terminology before we get into that.
The varied and complex communication protocols used for computer networking are commonly divided into “layers” when being described by organisations when attempting to explain the functionality of network interaction. One model frequently used is the Open System Interconnection (or OSI) Reference Model, developed by the International Standards Organisation (ISO). This model breaks network communication protocols into 7 distinct sections or “Layers”. Software running on a platform, interacting with software running on a second platform using a network, can be shown to use each of these seven layers to establish a connection with, communicate with, and exchange data with the other platform through a network medium.
It is the OSI model that is the source of terms that you may have heard when referring to computer networking hardware; For example Ethernet Switching hardware is commonly referred to as “Layer 2” hardware and the reason for this is that it is the Data Link layer (layer 2 of the OSI model) that defines the protocols used to switch data on a network. Similarly “Layer 3” is commonly referred to when discussing network routing and here the Network Layer (layer 3 of the OSI model) defines protocols that route data on a network.
The 7 layers of the OSI model are as follows:
Each of the seven OSI Model layers defines the requirements of network functions within it. The seven layers of the model can be further divided again. The lower layers, Layers 1 to 4, are described as the “Data Transport” layers and are handled as a combination of software and hardware. Layer 1 – the Physical Layer is closest to the actual network medium itself defining such elements as the Ethernet Standard. The upper layers, Layers 5 to 7, are described as the “Application” layers and are implemented typically in software running on a network client device. The top layer, Layer 7 – the Application Layer, is closest to the computer user. An example of a protocol defined by layer 7 is Hyper Text Transfer Protocol (or HTTP) used by Web Browser applications. In this example the Web Browser application uses HTTP in the Application layer to communicate with computing devices outside of the host computer running the web browser application.
Each layer in the OSI model provides services for the operation of the layer above it so, to continue our example above, the Application Layer (layer 7) protocol HTTP interacts directly with services provided by protocols defined in the Presentation layer (layer 6). Each layer interacts with the layer adjacent to it on the same device with data moving up and down the layers.
In order for software running on one platform to communicate with software running on a second platform using a network, data and messages must pass down through the layers on the first platform, out onto the network, and back up through the layers on the second. In order for a communication session to be established however, the layers of one platform must interact with the same layers on the other platform, for example the Physical layer (layer 1) of the first platform must interact with the Physical layer, of platform 2, and the Transport layer (layer 4) of platform 1 must interact with the Transport layer of platform 2.
Transport Control Protocol or TCP is defined in Layer 4 of the OSI model – the Transport Layer. This protocol uses a method of “hand-shaking” to establish a network session between two network clients which is used to control the exchange of data in a regulated and measured way between the clients. This in turn ensures that any loss of data, during a transaction, is detected and that the lost data is resent. This method of first establishing a connection between network clients and then exchanging data is known as “Connection-Oriented” and the mechanisms used for recovering lost data denotes that this protocol is “Reliable”. This comes at a price however of increased data overhead for a given data exchange. Several interactions must occur before data can be exchanged, consuming time and adding data overhead, and any lost data will be resent until all data is exchanged safely, thus further increasing the amount of data placed on the network. In most cases this increased overhead is acceptable because of the need for assured data exchange in many networks.
User Datagram Protocol or UDP is also located in the Transport layer of the OSI model – the same layer as TCP. UDP is a light-weight “connectionless” protocol that uses data sockets to establish a mechanism to deliver data from a source port to a destination port. This “Connectionless” description refers to the fact that UDP does not spend any time establishing and acknowledging connections using handshaking mechanisms. Further, UDP is known as an “unreliable” messaging protocol because it lacks any built-in mechanism to resend data if a datagram is sent but not received. Its advantage however over more sophisticated protocols (that do include the ability to resend lost data) is its light weight structure. This is a significant advantage to applications such as voice-over-IP and streaming media where small amounts of data loss are preferable to the delays incurred by resending and the resulting re-ordering of data used by other protocols which would interfere with the real-time nature of the mechanism.
The principle suite of protocols underpinning the Internet is the TCP/IP suite. This group of technologies are defined across the spectrum of the lower four OSI layers and they form the data standards for and provide mechanisms for error correction, data ordering, data re-sending, data routing, network media and many other functions. The TCP/IP suite is a deceptively simple group of letters naming an incredible eco-system of bewilderingly complex interaction. Each transaction on these networks undergoes hundreds if not thousands of logical decision making processes in order to get from source to destination. Most network users have no notion of this and take for granted that what is now the networking de-facto standard can be used for an incredibly varied range of purposes at any time of day or night.
When it comes to moving large units of data, it is the availability and flexibility of the TCP/IP suite however that is perhaps its “Achilles heel”; the self-healing and self-regulating nature of the technology used to underpin networks that service larger workgroups such as Wide Area Networks (WANs), Metropolitan Area Networks (MANs) and of course the internet, ensuring that there will always be a congestion free route from network A to network B, actually affects the performance of networks when transferring larger “bulk” quantities of data.
The Internet is a complex matrix of interconnected networks. This matrix ensures that there are always multiple routes that data transactions between separated hosts can take which allows for failures in network paths and allows for different paths to be selected when other paths become congested with heavy network traffic. The process of selecting alternate paths and determining which path is optimal is handled by protocols defined in the TCP/IP suite. But, since these mechanisms are looking at ensuring that a data path exists and that it is congestion free, it has no way of guaranteeing that your data will be routed by the most delay free path or indeed over the least constricted path. An unfortunate by-product of mechanisms that are commonly used to resend lost or delayed data actually contributes to the amount of regulation that the transport protocols use which ultimately reduces the potential overall performance of any data path through interconnected networks if the path relies purely on TCP/IP.
To eliminate the limitations of using the TCP/IP suite to transport data and to provide optimal use of a data pipeline between two network connected clients, Aspera uses a combination of UDP and custom written Application software.
“Hold on a minute – it seems that UDP would not deliver the data integrity that I would need so why does Aspera use UPD?” you might say. Well, that’s a good question!
UDP includes the capability of embedding checksum information in its data so that the receiving device can determine the integrity of any one datagram however this will likely not be sufficient to guarantee data integrity in all applications since it does not provide a mechanism to determine whether any other datagrams have been lost on the network. The answer to the question lies in the way that Aspera uses UDP for its low overhead agility in sending data to a destination in combination with unique Application layer software to determine whether data has reached its destination, as well as constantly analysing the traversal of the network.
This software provides the necessary interaction between server and client to establish the connection, monitor the data integrity, and measure for latency and delay. This circumvents the less optimal mechanisms in other protocols and ensures that the data rate between the server and client is carefully throttled to optimise the data pipeline between the two platforms; it ensures that only data lost datagrams are resent and then at the same data rate as other datagrams and thus makes the best use of the available network bandwidth without unduly stressing the network and its constituent hardware.
Aspera applications with fasp technology can be purchased to provide variety of file transfer solutions including file synchronization, and are designed to be hosted on commodity hardware. The products are flexible enough that customer workflows can be designed and implemented with a view to the number of users required and the necessary bandwidth required to transfer data in the required transfer time. This flexibility extends to the platforms that Aspera supports and Aspera products are available for Windows, Linux and Mac Operating Systems as well as for mobile platforms.
If you are looking for a solution that meets the needs of a complex secure Digital Cinema Package delivery system or Set to Production House transfer mechanism talk to root6 about how Aspera might meet your needs. We can assist with workflow and help to identify the licensing model to meet your implementation needs. If you would like to learn more about how Aspera’s fasp technology works, a detailed white paper with an analysis of common data transport methods discussed in comparison to fasp technology can be downloaded here: fasp – a Critical Technology Comparison.pdf. A white paper on Aspera’s File Synchronization technology based on fasp can be downloaded from here: Aspera Sync White Paper
When the UK went into lockdown back in March, Cardiff-based Gorilla were in a better position than most...
We prised our experts away from our demo units for as long as we possibly could to find out how it’s been handling their testing.