January 8, 2018

New scale-out NAS generation with Qumulo

Qumulo, famous Scale-Out NAS vendor, offered last month to the IT Press Tour crew a superb interactive session where we discovered what really make them pretty unique on the market. 

Founded in March 2012 in Seattle but really launched in March 2015, Qumulo has raised $130M from top investors and has of today around 200 employees. The firm was created by several formers Isilon leaders and has probably one of the best file system team in the storage industry having participate to the revolution early 2000 with Isilon, still the reference in Scale-Out NAS.

Qumulo, in addition to a few other strong technology players in file storage, participate to push object storage in the corner, where that approach should stay, I mean capacity and long-term data preservation. Some players have made some tries to offer a file system mode, in fact file storage, but it only works on the paper having some difficulties, both as a team and as the company management, and it fails with data integrity problem and high latency issues. Just think about 2 core functions of the file system: rename() and link() and you got an idea of challenges to solve not to say again the need for a strict consistency model. These points are important to confirm that building a file system on top of an object storage is still a dream or an utopia but the reverse is easier, offering a object storage API on top of a file system. It explains why object storage had real market difficulties in 2017.

Known for its innovative approach in distributed file system, Qumulo has recently repositioned its message around Qumulo File Fabric aka QF2. Funny to see that many companies use the term Fabric to finally replace and extend what we name a few years ago FAN i.e File Area Network.

And Qumulo has forgotten a key fundamental player in file system and I’m pretty surprise they didn't list Veritas. For people who know, build and play with file system, they know the role of Veritas in that space with VxFS both a technology and market presence. And if you consider the companion volume manager aka VxVM and some file system accelerator options, you get the whole picture, in other words, Veritas has invented everything in that space. Just realize that snapshot existed in 1992, resize file system in both direction – shrink or grow – dynamically again in 1992 among many capabilities. And some interesting flavors like SGI XFS could be also listed when it got introduced with IRIX 5.3 in 1993. I need to probably build the same map I created for object storage and CAS, I'm sure you remember this famous article.

Back to Qumulo, the philosophy is to provide a highly scalable file system cluster deployed in various flavors: on-prem with Qumulo appliances, on-prem software model on commodity servers such HPE Apollo with their dense servers and finally within the cloud in AWS running in EC2. It illustrates perfectly the SDS approach and advantages giving flexibility to deploy on preferred users' models and evolve with it.

Qumulo has chosen to build independent clusters glued together with a data propagation method. Imagine a global environment with a local cluster with Qumulo appliances, a second cluster deployed on Apollo servers and a third running on AWS. The company has developed an Asynchronous Automatic Continuous Replication (AACR) method to distribute data across clusters. AACR is a file-based model that copies the entire file today without deduplication and not block-based yet. With such data copies techniques, Qumulo is able to run jobs in various places on-demand, a pretty clever approach especially for some vertical use cases.

This design invites us for the next remark about the data consistency, Qumulo QF2 is strong consistent within a cluster and eventual consistent across data centers or WAN. Qumulo is also an adept of Paxos in term of consensus protocol.

In term of data protection, started with replication the company has offered for a few quarters now erasure coding and relies on the Intel ISA-L service, pretty good choice finally. With billions of files stored on the platform potentially, modern file systems had to design a new additional element to satisfy operations on metadata. Qumulo built QumuloDB distributed across all cluster nodes, a similar side metadata database model is also used for RozoFS but it is one central database per file system. Imagine a recurrent backup task that wishes to select and protect only the modified files since a certain date, let's in an incremental manner. With small volumes, walking the tree is acceptable but with huge volume and tons of files, this step is a suicide as the task will almost never finished taking too much time and you have to do this protection pretty often. Worse case, similar tasks will be added on the system as tasks take longer than the time interval between backups. Now imagine if all this create/update operations on files are stored on a side fast database you can query to find out the list of file you need to backup, it will be almost magic and super fast, you got it back path files and submit to the backup job. Same remark for archiving, tiering or migration that you need to integrate as a viable solution. This is just an illustration of this kind of service QumuloDB offers, historical and current metadata tracking and storage, with the ability to freeze a version of the database to provide a snapshot mechanism. In other words, a version of the DB is a version of the file system.

And Qumulo appeared recently for the first time in the bizarre Gartner Magic Quadrant about distributed file system and object storage, I wrote a long analysis of it recently in StorageNewsletter, you can read it here and here. Funny analysts associate object storage and distributed file system why not with secondary platform…

Clearly Qumulo is one of the few gems in file storage business with Avere Systems, now a Microsoft company, Elastifile, Panasas, Quantum with StorNext-based offering so now the last Xcellis scale-out NAS iteration, Rozo Systems and WekaIO. They all demonstrated the superiority of their native file-based approach, sometimes with a parallel mode, over things like object storage that is good finally for capacity and long-tem retention but not for high demanding files environments. Some dream about it but the market invite them to consider the business reality.

We expect Qumulo to introduce a geo-dispersed approach even with restrictions and we hope a tiering feature across cluster, cloud… and a new iteration of AACR capability explained above. The company prepares to land in Europe in Q1 2018 and I anticipate a pretty rapid growth there. Honestly the product is strong so no doubt the Seattle company will recruit top guns to rapidly gain market share and penetrate the old continent. We hope to meet them again next year during a future tour to measure progress and to confirm development directions.

December 21, 2017

Panasas is back

Panasas, historic leader of high performance file storage, has started a new era following several years of redesign and re-architect of its solution.

In fact, the original motivation behind this period was to go beyond traditional HPC and apply scalable file storage to other market segments. In other words, it exists market categories with similar needs where the Panasas’ solution would be a very good fit.

Recently with the IT Press Tour crew, we had the privilege to spend a few hours at the Panasas HQ in Sunnyvale. It was a very interesting session, very interactive and the executive team was very transparent with our team.

Back to the root of the company, Panasas was founded in 1999 in Pittsburgh, PA by Garth Gibson, famous researcher associated with RAID patents. Garth Gibson and his past colleagues had an approach summarized later in the SCSI T10 with Object-based Storage Devices or OSDs. For readers who discovered Panasas the name means Pittsburgh Advanced Network Attached Storage Application Software. So far the company has raised $171 million - last round was in 2013 - and has delivered its product to more than 500 customers in 50 countries. Immediately if we do a simple math of 500/18 we obtain 28 customers meaning on average more than 2 per months during 216 months. Many players in such markets would dream about this number. The mission was and is still to deliver a high performance scale-out NAS solution. The company had several executives for several years changes but Faye Pairman (left on the photo below) is the CEO for now about 7 years. A few members of the current team have in common Adaptec who was a key storage player many years ago.

Initiated and supported by famous US research labs, the company has developed so far a pretty unique solution to address and solve file storage performance challenges in very high demanding IT environments. As already mentioned this story doesn't end with HPC but it’s also a very good fit for several use cases in M&E, Manufacturing, Life Sciences, Education/University and Government and of course Energy. We still don't understand why Gartner has decided to remove Panasas from its “bizarre” Magic Quadrant for Distributed File Systems and Object Storage. Read my comments in the article I published on StorageNewsletter almost 2 months ago.

We have also some remarks about the following picture as Panasas has omitted to list Primary Data, Rozo Systems, Quantum Xcellis scale-out NAS or WekaIO for asymmetric distributed parallel file system, very similar to Panasas PanFS, or Avere Systems, Elastifile or Qumulo for the “classic” NAS play. Panasas sells ActiveStor appliances powered by PanFS to be clear.

Back to the product, it’s fundamental to understand what make different a parallel file system and especially a design philosophy such PanFS. A consumer, i.e client, of the file system is able to send a file to multiple storage targets at the same time splitting the content cross these multiple units. Thus the time to write and read is dramatically reduced. This is very different if you send a file via SMB or NFS as the entire file is sent via only one NAS head. If you wish to do it with NFS, you have to consider pNFS with NFS v4.2, if not, you need a special piece of software embedded in the client machine to understand the interaction between meta-data server(s) and data servers and process I/O operations. A parallel file system can be asymmetric or symmetric, this is just related to the how the metadata server role is operated, again PanFS use an asymmetric model. By the way, Panasas was a key contributor to pNFS, a standardized proposal to extend NFS with this asymmetric mode. I invite you to refer to pNFS.org for more details.

To detail the definition of an asymmetric distributed parallel file system, we need to mention that:
  • asymmetric is the use of side machines acting as metadata servers (this role can also added on data servers),
  • distributed means that the file systems spans and relies of multiple machines and
  • parallel, as explained above, is related to the concurrent consideration of storage targets.
With the current market terminology, we use control plane for the metadata servers and data plane for the data servers.

In 2 words one of the benefits reside in the elapsed time to do I/O operations. If you need T seconds to write a file, you will need only approx. T/10 seconds if you send the same file across 10 back-end servers. And it makes clearly sense when applications consume large files as most of the time is dominated by data I/Os and not metadata I/Os, we see very often a 5-10% in metadata operations and 90-95% in favor of data operations.

Panasas PanFS supports both modes: parallel with the DirectFlow agent and is fully POSIX compliant and NAS with NFS and SMB protocols.

With such performance in critical environments, this kind of platform must provide advanced data protection mechanisms. Panasas offers file-based erasure coding in a N+2 fashion thus tolerating 2 simultaneous drives failures. RAID 6 and other disk-based oriented approaches fail to protect data with limited rebuilt time especially with large drives and for large capacity. For small files and small data volume, file replication across nodes is still a pretty good method.

I/O performance and protection improves with scale as stripe could be larger reducing elapsed operation time.

For PanFS, the team has made great effort to facilitate the management of the platform with an intuitive GUI and console and of course with a powerful CLI.

Panasas has made recently a few announcements:
  • An even more disaggregated architecture with a 2U director blade – you know the famous metadata servers – with 4 nodes in the chassis, it is name ActiveStor Director 100 or ASD-100, pretty well aligned with metadata intensive operations. This ASD-100 node has 8GB NVDIMM for the transaction logs beyond 96GB of DDR4 RAM and 2x40/4x10 GbE Chelsio NIC.
  • A new storage data blade – the ActiveStor Hybrid 100 aka ASH-100 -, hybrid this time, with a choice between HDD and SSD sizes.
  • A new DirectFlow software with 15%+ more B/W and availability yon MacOS in addition to Linux.
  • A new SMB stack coming from Samba with PanFS ACL translations module,
  • And an updated foundation with FreeBSD.
A very good meeting that invites us to anticipate some more good news from Panasas in 2018.

December 13, 2017

Quantum unveils Xcellis Scale-out NAS

Quantum (NYSE:QTM), famous leader in secondary storage, continues to extend and promote primary storage with part of its portfolio. The company just announced Xcellis Scale-Out NAS as a new iteration of the union of StorNext and Xcellis delivering a high performance highly scalable file storage solution. I invite you to read the full announcement on StorageNewsletter with the associated long comment.

The new solution uses a scale-out approach both for capacity and access as both layers can scale independently. NAS means industry file sharing protocols with NFS and SMB and in that case the file system is exposed through multiple NAS heads but each file is fully write via one head. To leverage the parallelism of the platform, users must use the client software or agent able to split and stripe data across multiple storage targets.

The solution also introduces a good set of data services such automated tiering, encryption, point in time copies, WORM, load balancing and data protection with replication, RAID and erasure coding to list a few. Multiple configurations are possible from full flash to hybrid, entry level and finally an archive model illustrating a wide range of flexible configurations to fit in various environments.

With this announcement, Quantum maintains the contact with the club of top commercial file storage players such Avere Systems, Elastifile, Panasas, Qumulo, Rozo Systems and WekaIO.

We understand that the company must react to some revenue erosion for several quarters even years and the recent departure of his long time CEO John Gacek and CTO Bassam Tabbara. FY 2018 will be interesting to watch.


December 12, 2017

New Edge filer from Avere

Avere Systems, one of the few file storage gem companies, continues to release product at an interesting pace. The company just announced the FXT 5850 with double the DRAM and SSD capacity, 2.5 times higher network bandwidth and finally 2x the performance if we compare with the previous model.

You can configure a cluster of 24 nodes, I let you imagine the capacity an performance you can achieve in that case, configurations are fully redundant to avoid any impact on production, this is the case with the failover capability and mirrored writes.

Recognized by the performance and flexibility, Avere marks a new milestones with a formula 1 product perfectly aligned with the high demanding characteristic of some verticals segments that need high data capacity and high speed at the same time.

The Avere FXT 5850 starts at $211,500 and is available now. Huge achievement that maintains Avere in the top file storage club.

November 27, 2017

SuperComputing 17 was a good conference

Super Computing 2017 is always a very interesting conference and you see very often technologies that will arrive in more classic IT a few years later.

I invite you to read the long summary I wrote for StorageNewsletter available here. And in a nutshell a few points below.

Topics were about GPU, of course, burst buffers and fast I/O, file systems/storage, NVMe and composable infrastructure.

The organization also unveiled the new Top500 ranking and introduced the new IO500.

A lot of Lustre-and Spectrum Scale-based file storage solution of course, Quantum with StorNext of course, but also Rook, a multi-protocol SDS - file, block and object - product for massive volume of data, based on Ceph. We saw Panasas as well, the company has announced the 7.0 major PanFS release and a disaggregated model. Vexata demonstrated a file-based solution running Spectrum Scale and of course companies such DDN, NetApp, HPE, Cray, IBM, Dell...

Among the new file storage vendors or vendors with a innovative distributed file system, we noticed the presence of Avere Systems, Elastifile, Panasas and Qumulo but Rozo Systems and Weka IO didn't have a booth.

On the object storage side, it was limited as pure players were pretty much absent like Cloudian, except Caringo.

It confirms two things again: object storage is a capacity tier and file access is king.

Next year the event will take place in Dallas, TX, from 12 to 15 of November, 2018.


November 16, 2017

IT Press Tour #25 in California with a huge program

The IT Press Tour (www.itpresstour.com), the leading press event for IT press, just announced an amazing list of participating companies for the 25th edition early December in California.

A few surprises as well during this edition but I can't reveal any of these.

Topics will be around Software-Defined Infrastructure with of course Big Data, Storage, Networking, Data Management and Containers with a flavor of open source and edge computing.

Here is the list:
  1. AetherWorks, inventor of AetherStore and more recently FogCoin and ActiveAether,
  2. Datos IO, new comer in data protection for distributed data,
  3. DriveScale, pioneer in composable infrastructure for demanding applications,
  4. Hedvig, leader in multi-protocol SDS,
  5. Igneous, recent player in secondary storage,
  6. iXsystems, reference in open source storage,
  7. Minio, the promising fast growing object storage platform,
  8. Panasas, leader in high performance file system,
  9. Quantum, top vendor for secondary storage solutions,
  10. Qumulo, key actor in new scale-out NAS generation,
  11. Rubrik, the fast growing data protection platform,
  12. Spanning, major player in Cloud-to-Cloud and SaaS data protection,
  13. Sysdig, a model for others in Container monitoring,
  14. and Vexata, young flash storage vendor dedicated to speed.
This edition will be again huge with a dense program and top innovators. I invite you to follow us with @ITPressTour, #ITPT, various publications and reporters Twitter handles.


November 13, 2017

Bizarre Gartner Hype Cycle for Storage Technologies 2017

The Gartner Hype Cycle is a tough exercice delivered every year for 20 years. And for storage technologies it’s probably one of the toughest in the industry with so many innovations, convergences and merges of technologies.

The first remark is about the list of technologies listed and we wonder why Storage Cluster File Systems, IDA, Storage Multi-tenancy or Online Data Compression, to name a few, have such roles.

Surprised also to see that Cloud Data Backup is emerging with players such Spanning, Backblaze, Backupify (acquired by Datto already in 2014) or Code42. For instance Backupify was founded in 2008. Same remark for Integrated Backup Appliance with for sure new vendors such Rubrik, Cohesity but also established ones like Dell EMC with Data Domain line, NEC, Veritas or Exagrid.

At the same time we don't see topic such Persistent Storage for Container represented by several players such Portworx, StorageOS, Virtuozzo or Blockbridge. What about subject like Cloud Tiering or Cloud Archiving as we see Cloud Data Backup.

Where is P2P storage, only BitTorrent and Storj are listed with really different companions? We should have here Aerofs, AetherStore, Blockade, Cloudplan, Ugloo or Sia.tech.

What about VTL, is it obsolete in the Gartner grammar? What about multi-protocol SDS?

We’re also surprised to read that object storage lives a revival due to S3. In fact, there is more and more players offering an S3 interface and it doesn't mean they belong to the object storage category. The question is: does an interface or an access method defines a storage category? What is sure is that the ubiquitous presence of S3 reduces differentiators between vendors and kills the API battle.

Gartner has also a strong position about Cloud Storage Gateways (CSG) as the analyst firm declares that this technology is “obsolete before plateau” as written at the end of the Hype Cycle paragraph. What is sure is that the cloud gateway capability is now offered by various products but at the same time CSG provide more things that just a path to the cloud. Some time ago ESG has introduced a term that should be a better fit: Cloud Integrated Storage. The report displays different categories and we need to correct a few things:

  • Shared Accelerated Storage: where are Apeiron or Pavilion? Why Weka IO is listed here? • Management SDS: the missing actor is ProphetStor with Federator.
  • Cloud Data Backup: Where is Veritas? If you list Datos IO, you should list Imanis Data (former Talena).
  • Infrastructure SDS: what a mix. If you consider objet storage why vendors such DDN is not listed but globally this is a bizarre listing as we find vendors such Maxta or StorMagic and at the same time SwiftStack. Why StorPool is not listed here? We should see here StorONE next year.
  • Hyperconvergence: Atlantis Computing is still listed, ok you have an excuse, but the company sold its assets to Hive-IO. Where is GridStore?
  • Integrated Backup Appliances: as said above where are ExaGrid and NEC?
  • Storage Cluster File Systems: Really a bad name especially as Gartner defines it with the distributed file systems term as a single parallel file system. Wow. We should see here Rozo Systems and Weka IO.
  • IDA: very bizarre again with too diverse players. Why SimpliVity is listed here? Where are P2P players mentioned in a previous paragraph. We should see here Datomia if the category is maintained.
  • Object Storage: Where are Exablox, Hedvig, Huawei, Igneous, NEC, Noobaa?
  • Emerging Data storage Protection Schemes: what is this stuff? If Erasure Coding is listed and it is, Rozo Systems developer of the Mojette Transform must be added. Sqme for MemoScale, even Weka IO with its own N+M model. And one question: Is Reed Solomon really an emerging technology? Isilon introduced it in 2001!
  • Cloud Storage Gateways: Where is BridgeStor? This category sees a convergence of technology as many products add a S3 extension. It could be a tiering product like Komprise, StrongBox or Versity, a backup, archiving, migration, distributed file systems such Elastifile or Weka IO. This a large category not obsolete for sure as so many flavors exist.
  • Virtual Machine Backup and Recovery: Where is Nakivo? And of course Veritas?
  • Online Data Compression, Storage Multi-tenancy and Automated Storage Tiering are bizarre as well.
I stopped here as the report presents a huge work but the technologies listed are quite strange.

Also we wonder why the following vendors are not listed: Acronis, Apeiron, BridgeStor, Datomia, Datrium, Exablox, GridStore, Igneous, Imanis Data, Komprise, Nakivo, Noobaa, Pavilion Data Systems, Portworx, ProphetStor, Rozo Systems, StorageOS, StorPool, StrongBox, Vexata.

November 11, 2017

AetherStore passes the 30,000 users over 150 countries

AetherStore (www.aetherstore.com), product developed by AetherWorks team, continues to penetrate the market with their brilliant P2P storage solution. Great, I'm a true believer of that with my own project KerStor several years ago.

Visited in New-York in September 2012 during the IT Press Tour #9, AetherStore has made great progress since that meeting. The adoption is pretty rapid with now more than 30,000 users in 150 countries globally. It marks a special milestone as the market changed a bit since the first projects in different world cities.

It reminds me some pretty similar p2p, cloud, decentralized or dispersed storage approaches on private or public clouds, with players like Aerofs, Blockade, Cloudplan, Kerstor, Ubistorage, Ugloo, Sia.tech, Space Monkey, Storj, Symform, Transporter, Tudzu, or Wuala. Some of them still exist, others disappeared, got acquired or changed their model to offer P2P backup.

Like others, AetherStore targets now a backup use case that reduces the original promises made by all the players in that category. Initially these solutions bring on the table several key advantages:
  1. Stop to buy and over provision with new hardware,
  2. Increase the used ratio of the current deployment storage entities,
  3. Then the storage optimization is improved in favor on a real better used TB/$.
The difficulty came from initially the drop of the storage cost but above all the lack of partnership as nobody in the sales chain wished to reduce the storage sales. I’m sure you remember the swiss company named Wuala who approached the market with a trade model. The company got acquired by LaCie in 2009 and LaCie got acquired three years later by Seagate in 2012. Guess what, Seagate shut downed the Wuala service. It’s difficult to promote a service that could participate of the erosion of hard drives sales.

Pooling free storage space on distributed computing, geo dispersed potentially, is a great concept, it would be good if the AetherStore team thinks about doing the same approach with compute, building a gigantic virtual computer to address cpu-bound demanding applications. And if you link that to bitcoin… Wow, you can have a huge impact of the planet. Hey FogCoin and ActiveAether are all that.

And it will be fantastic as the team has decided to join The IT Press Tour to pitch us about FogCoin and ActiveAether.

November 10, 2017

New file Manager from Google for Android-based phone

Current phone offers high storage capacity and it's pretty common to buy 64 or 128GB of capacity. This is huge and at the same time, some computer vendors sell machine with also 128GB of flash. Funny right? So if you have a strong file manager on your computer why not on your phone. And last point I forgot to mention, you phone is a computer able to connect to a phone networks in addition to Wifi, Bluetooth and GPS.

It exists plenty of file manager you can download on Google Play and the giant from Mountain View finally never released any strong product in that space. The current version is today in public beta available from this link, you need Android 5.0 minimum. After you download the app, you need to accept a "Trusted Tester Agreement" in order to use it.

The official launch is scheduled for December. With 2 views - Storage and Files - it offers the capability to free space, to empty the cache, to check space occupied by your files – video, music, images, large files, downloaded files, documents … - and able to track duplicates, unused applications… in other words 3 functions: system cleanup, file management and transfer files. It gives also the capability to transfer files via a P2P Wifi connection.

With the recent announcement of Pixel2, Google promotes infinite cloud storage, so we expect a transparent tiering capability between Files Go and Google Drive with a easy to setup and simple file policies based on age, size… like a traditional HSM finally.


October 31, 2017

StorONE, a new SDS gem

Founded in 2011 by Gal Naor, who has founded Storwize in 2004 and sold it later in 2010 to IBM for $140 million, StorONE has made huge efforts to disrupt the market and the result is quite significant in term of approach and price, delivering the true value of the device users buy everyday. We had the privilege to meet the team and visit StorONE HQ in Tel-Aviv a few day ago with The IT Press Tour crew. And if you follow me, I wrote one of the very first article about StorONE in August 2016 when they made a surprising presence at VMworld 2016 in San Francisco.

Six years of development allowed by more than $20 million of VCs - Seagate, Giza and JGV - and private investors. Another remarkable element resides in the number of patents - 50 - granted or almost granted before the first version of the software. Wow there is some IP behind this. The company had invited to its board Edward Zander, former CEO of Motorola and COO and president at Sun Microsystems, and John W. Thompson, chairman of Microsoft and former CEO of Symantec, more recently at Virtual Instruments, and they have made private investments.

"Over provisioning" is a terrible world especially today with the gigantic volume of data people have to manage, if finally a user can deploy the only necessary infrastructure to sustain the business based on the real capabilities of its deployed hardware, it would be perfect.

So the idea is pretty simple on the paper but a real tough mission to solve. Gal explained us that its software named TRU for True Resource Utilization reduces all the layers complexity in the entire storage software stack into one single seamless layer that removes the chokepoint. The beauty of this come from the performance delivered and the price, less than $0.01/GB and globally less than $0.002/GB for multi-petabyte installations.

All features, all protocols and support of any drives - HDD, SSD or NVMe - are included and there is no option that finally reduce the value of the software.

Many vendors remove options from the proposal instead of doing discount of the full software and data services stack showing a limited value proposition to users. As a period of launch, the company has decided to invite potential users to its Early Access Program with a compelling proposal: to offer the hardware for 1PB fueled by TRU. The strategy is simple as finally you try the solution and you won't give it up. TRU is available as physical server or virtual appliance.

I offer to name this approach a multi-protocol SDS as TRU offers block, file and object access methods with unlimited snapshots with mixed drives in same servers.