preloader
image

Performance Kings, Storage Glory —-StorSwift distributed storage powers large-scale genetic computing

High-speed development of genetic computing

       

The vagaries of gene sequencing determine the genetic variability of human traits. The study of the human genome is a fundamental study of the life sciences. Some scientists see the genome map as a finger map, or the periodic table of elements in chemistry; others compare the genome map to a dictionary. But whichever way one interprets it, the prospect of deciphering man’s own genetic code to promote human health, prevent disease and prolong life is extremely promising in its application.

As the cost of genetic sequencing decreases and genetic testing becomes more common, researchers are using the means of genetic sequencing to optimally personalise treatment for different patients based on their disease manifestations, tolerance to drugs, etc. At the same time, DNA samples and medical records are used to build powerful and comprehensive medical databases from which specific genes affecting various genetic diseases can be identified through big data analysis, as a means of improving diagnosis, treatment and prevention of disease in clinical applications.

The need for high-capacity, high-performance storage for genetic computing

The challenge for researchers is to manage and analyse this vast amount of unstructured genomic data. A single human genome ‘run’ alone can generate 500GB of raw data files, which are complex and contain scattered, unstructured scientific data that is difficult to manage and analyse. To successfully sort through this complex unstructured scientific data, researchers need computing systems that can compute large volumes of data and analyse it at high speed, as well as flexibility, but traditional computing systems have not kept pace with the demands of data.

In genomics research, datasets often need to be stored, analysed and then stored again. This would be a staggering amount of data, all of which would have to be stored on external storage devices for future transfer to computers over the network, analysed and then stored back on the external storage devices. This process places an incredible burden on the traditional IT infrastructure. Most storage management devices are not able to withstand the strain of these workloads because they do not have the scalability, persistence and longevity required for today’s biomedical applications.

It is important for research organisations to find a solution for distributed file storage that not only stores data with ease, but can also be accessed again by other researchers to further leverage the petabytes of data already accessed for more computation and analysis. StorageX’s distributed file storage system, CX-CLOUD-FS, is the recommended storage platform for genetic computing and high performance computing due to its many advantages such as high performance, high scalability, high reliability, easy management and maintenance, which can bring the best data storage and security mechanism for genetic computing platforms.

               

CX-CLOUD-FS in a genetic computing company in Beijing

To meet the needs of a Beijing-based genetic computing company for large-scale centralized storage, Shanghai Storage Xun Information Technology Co., Ltd. designed and deployed a CX-CLOUD-FS distributed file storage system, which provides high-speed read and write capabilities for complex, unstructured data. At the same time, with almost unlimited scalable storage capacity, this allows application/computing servers, big data analysis software to directly query and read and write to the same storage pool, and management of massive amounts of data becomes easy and simple.

       

The first batch of 10 CX-CLOUD-FS storage nodes are online, with client read and write bandwidth easily exceeding 16GB/s. All data is made a copy, supporting up to five nodes in case of drop, power failure or corruption, ensuring maximum data reliability. Uses Infiniband network connectivity to provide higher bandwidth and lower access latency. Supports high-speed TCP/IP read/write mode and RDMA mode, fully meeting the business needs of high-performance computing such as genetic computing.

       

avatar

       

CX-CLOUD-FS functional features.

Unified namespace: multiple storage nodes using this system can be integrated into a single namespace, for the user to see a single file path, the user only needs to deposit the file into it, do not care about the specific location of the file storage.

Scale Out: When capacity and performance are not enough, simply add storage nodes without affecting existing business operations and without changing the application architecture design.

Enterprise-class interface compatibility: Users can access files through multiple interfaces (self-contained clients supporting POSIX access, standard CIFS and NFS access protocols, FTP, and object storage interfaces).

High Performance: The system can enhance the performance of the system at multiple levels, including cache optimisation, internal data synchronisation and node bandwidth aggregation, to meet the IO requirements of highly concurrent read and write systems, as well as supporting RDMA high-performance access mode and Windows/Linux client-specific engine tuning.

High Availability: Multiple technologies enable high availability, so that when any storage node has a hardware failure (e.g. power failure, bad hard disk, or even the whole node cannot be accessed), or a software problem (e.g. file corruption in a node), the data can still be read and written normally to the users of the storage system. This can be set up so that data can still be read and written when multiple nodes are completely corrupted.

High Stability: Using a variety of technologies, the system’s stable operation is fully guaranteed from both hardware and software aspects, reducing the rate of bad disks and allowing the whole system to run trouble-free for a very long time.

Huge Capacity: The system can be expanded seamlessly and the total capacity can easily reach 100PB, which can fully meet the needs of current large-scale enterprise applications.

Access Control: Unlimited local permission settings and support for Windows Active Directory/LDAP authentication mode.

Easy Maintenance: A variety of advantageous technologies (e.g. seamless expansion, Easy UI-based Web management system and centralised management system, extremely fast hard disk rebuild technology, etc.) are used to enable users to manage and maintain the product at minimal cost once it is up and running.