Successful application of StorageXon distributed storage in big data analytics

Since its inception, Shanghai Storage Xun Information Technology Co., Ltd. has been committed to the development of highly available, high-performance and easy-to-manage private cloud storage and has launched a new generation of CX-Cloud distributed storage system. Its leading-edge architecture, large capacity, horizontal scalability, multi-node fault tolerance, high performance, security and control, easy deployment and easy management, and many other features have been well received by many customers.

This is not, CX-Cloud has been successfully applied to an Internet financial company’s big data analysis project, based on the big data processing scenario under high concurrent IO conditions, providing customers with a solid and reliable underlying storage guarantee.


What is distributed storage?

Our public number introduced it before when CX-Cloud was released, but it was too professional, so I guess many people are looking at it in the clouds and confused, so what is the difference between distributed storage and the original storage in the end?

Simple understanding, using human language to explain: traditional storage in order to solve the performance problem (single hard disk performance is too poor) and reliability problems (a single disk is bad data will be lost), a bunch of hard disk combination together, and then through the algorithm (what algorithm? The algorithm (what algorithm?) is too deep to be clear in one sentence, so I won’t ramble on) will break up the data and write to all the drives at the same time (so fast!), it doesn’t matter if one drive is bad, there is an algorithm to get the lost data back anyway, so both problems are solved ^_^

But …… But …… What if …… What if the whole machine breaks down, and there is a limit to the number of drives that can be added to a machine: (This is where distributed storage comes in, and then algorithms -_- again! Treat a machine as a hard disk and group many machines together so that it doesn’t matter if the machine breaks down, it’s as simple as that.



Why customers choose CX-Cloud

What does this internet finance company do? According to the exchange between the editor and the client, we understand that they are a comprehensive platform that provides New Third Board listing, financing and M&A advisory services, and also integrates fundraising, investment, incubation, operation and promotion functions.

Where to store so much important data? The fast and reliable CX-Cloud, of course!

It was a waste of so much data to store it, so the client wanted to refine the original data and do big data analysis, so that the data would be worth something.

To achieve this, the client built a CloudStack big data analytics platform. This platform is to add all the servers to a virtualization group, and then build virtual machines on it, giving each virtual machine as many CPUs as it wants and as much memory as it wants (of course, the total cannot exceed the sum of all physical machines combined).

CPUs are available, memory is available, but what about hard disks? Where can I find such a large hard drive? The hard drive is our CX-Cloud, of course! We put all the high speed SAS drives of the CX-Cloud distributed storage system into a huge storage pool, which CloudStack uses as its own oversized hard drive, into which all the system’s data, such as virtual machine files and data files, can be placed. Yes! It’s that simple!


CX-Cloud functional features

Finally, I’ve taken the trouble to post our CX-Cloud functional features again -_-!



Unified namespace: multiple storage nodes using this system can be integrated into a single namespace, for the user to see a single file path, the user only needs to deposit the file into it, do not care about the specific location of the file.

Scale Out: When capacity and performance are not sufficient, storage nodes can simply be added without affecting existing business operations and without changing the application architecture.

Standard access interfaces: Users can access files through multiple interfaces (self-contained clients supporting POSIX access, standard CIFS and NFS access protocols, FTP, and OpenStack-compatible object storage interfaces).

High Performance: The system improves performance at multiple levels, including cache optimisation, internal node data synchronisation and node bandwidth aggregation, to meet the IO requirements of highly concurrent read and write systems.

High Availability: Multiple technologies enable high availability so that when any storage node has a hardware failure (e.g. power failure, bad hard disk, or even the whole node cannot be accessed) or a software problem (e.g. file corruption in a node), the data can still be read and written normally to the users of the storage system. This can be set up so that data can still be read and written when multiple nodes are completely corrupted.

Huge Capacity: The system can be expanded seamlessly and the total capacity can easily reach 100PB, which can fully meet the needs of current large-scale enterprise applications.

Easy Maintenance: A variety of advantageous technologies (such as seamless capacity expansion, Easy UI-based web management system and centralised management system, hard disk rebuild technology, etc.) are used to enable users to manage and maintain the product at minimal cost once it is up and running.