<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Sysconfig&#039;s Blog &#187; Storage</title>
	<atom:link href="http://sysconfig.org.uk/category/hardware/storage/feed/" rel="self" type="application/rss+xml" />
	<link>http://sysconfig.org.uk</link>
	<description>FreeBSD, Linux, Virtualisation, Resilience, Scalability, Storage, and other (random) things</description>
	<lastBuildDate>Thu, 25 Aug 2011 10:41:34 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>GlusterFS, a workhorse that needs to be tamed</title>
		<link>http://sysconfig.org.uk/2011/07/glusterfs-a-workhorse-that-needs-to-be-tamed/</link>
		<comments>http://sysconfig.org.uk/2011/07/glusterfs-a-workhorse-that-needs-to-be-tamed/#comments</comments>
		<pubDate>Sun, 31 Jul 2011 19:51:41 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Virtualisation]]></category>

		<guid isPermaLink="false">http://sysconfig.ossafe.org/?p=671</guid>
		<description><![CDATA[I&#8217;m sure by now most of you will have heard of GlusterFS, which allows you to store data on a very large scale, replicated, striped, or both &#8211; across multiple physical boxes. At the face of it, and if you believe the marketing, it is THE most reliable and fastest solution. And yes indeed, it [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m sure by now most of you will have heard of <a title="GlusterFS Community Website" href="http://gluster.com/community/documentation/index.php/Main_Page" target="_blank">GlusterFS</a>, which allows you to store data on a very large scale, replicated, striped, or both &#8211; across multiple physical boxes. At the face of it, and if you believe the marketing, it is THE most reliable and fastest solution. And yes indeed, it has got massive potential, and it has matured a lot over the years since I last wrote about it. However, it still has got a few nasty pitfalls, which you need to be aware of before deploying it into a production environment. You should really test thoroughly how it copes with your workload, and how your applications and infrastructure behave in case of failure.</p>
<h2><span id="more-671"></span>What is GlusterFS, and what is it not?</h2>
<p>You can think of GlusterFS as a RAID device, which works across the boundaries of a single physical disk array. Take RAID-1 for example, which mirrors data between two identical disks. In GlusterFS&#8217;s jargon, you run two <em>bricks</em> in replicate mode, where a brick is defined as storage in general terms; it can be an array of disks (which could use RAID), a single disk, a partition, a directory. Anything that can be mounted into your filesystem hierarchy qualifies as a brick. The key feature of GlusterFS is now, to treat bricks on different physical machines as one volume, which can be accessed by any number of clients. It can be mounted either via Fuse/GlusterFS client, or even via NFS or CIFS/Samba.  You can use RAID-0 style striping for read speed, RAID-1 style mirroring for real-time replication, RAID-10 for both, or you can go beyond any of those and spread the stripes or mirrors across any number of bricks. 4-node replication? No problem at all. GlusterFS gives you truly enormous flexibility and performance when it comes to making large amounts of data available across multiple nodes.<br />
Since version 3.2 (if I&#8217;m not mistaken), they have even added GeoReplication, which allows a Master/Slave setup, where the slave can be a local or remote site. Be it for backups or to have a standby version of your application in a different geographical location&#8230; it&#8217;s possible. Due to the fact that GeoReplication does not require locking or synchronous replication, the network speed to your remote site isn&#8217;t that important either. It copes well with it.</p>
<p>This sounds very different than for example a DRBD/GFS2 or DRBD/OCFS2 setup, doesn&#8217;t it? And indeed it is! GlusterFS, unlike DRBD, is not providing a block device. What it means is that it compares hashes of files, and if files on nodes differ (for example after a failure), it will copy entire files across, not only the changed blocks. In normal day-to-day operation that&#8217;s not a big problem, in particular as you get a lot of flexibility, which is unmatched by other solutions. Where it does make a difference is during recovery. More on that in the Caveats section.</p>
<h2>A variety of different connectors</h2>
<p>I mentioned earlier that you can use a couple of different ways to connect to your GlusterFS volumes. First, there&#8217;s their own GlusterFS client, which uses the kernel&#8217;s Fuse layer. This client is Gluster&#8217;s recommendation, if your workload requires a high amount of fast write operations. If your workload is more about reading small files quickly, they recommend NFS. (The NFS server is part of the glusterd daemon, which serves the volumes to the clients.) Samba/CIFS is probably mainly targeting Windows clients.</p>
<p>All these connectors have their advantages and disadvantages. You want to test that thoroughly for your particular workload. Also, in SELinux environments, you will require some tweaking of your policies, if you use the GlusterFS client, whereas NFS is a lot more straight forward (don&#8217;t forget that apache needs to be allowed to access NFS directly if that&#8217;s your intention; <em>setsebool -P httpd_use_nfs=on</em> is your friend). I know most people find it easier to switch off SELinux altogether, but for me personally that is <em>never</em> an option. I&#8217;d rather spend hours tweaking the SELinux policies, if necessary. Anyhow, the bottom line is that both NFS and CIFS make GlusterFS very attractive for platforms beyond Linux. FreeBSD for example, although I&#8217;m not sure if the native client has reached a production-ready state there yet; I shall give that a spin soon, and in the meantime NFS will do.</p>
<h2>Performance</h2>
<p>As a rule of thumb you can say that high-availabily, robustness, scalability etc always come with a downside: write performance. During write operations, all nodes need to be kept in sync, which means that the weakest &#8220;link&#8221; (or slowest disk for that matter) together with some locking and network/protocol overhead determines the actual write speed. That is normal. (Note: pure throughput must not be confused with the time it takes to actually be able to access a file on a different node than it was written to)</p>
<p>For that reason you can never expect a high availability file system to solve all your problems. There&#8217;s no such thing as &#8220;one size fits it all&#8221;. Your application need to be cluster/HA aware. In practice that means you will have to select carefully which type of information you store where. This is of course true for GlusterFS, too. However, when it comes to read performance, GlusterFS is actually very fast. Not as fast as a local block device, obviously, but personally I wasn&#8217;t able to tell the difference between native NFS and Gluster&#8217;s NFS implementation. The GlusterFS client (fuse/glusterfs, not NFS) however seems to be a little bit slower reading data, while being faster writing. It really depends on your workload. Bottom line is: GlusterFS is fast and flexible, which alone is a big plus over many other solutions. For maximum read performance you can of course use stripes (data scattered across multiple nodes), which the glusterfs client connects to simultaneously. It&#8217;s kind of obvious that in particular big files benefit from such a setup.</p>
<h2>Caveats</h2>
<p>If you intend to deploy GlusterFS, you better plan a serious amount of time for the first tests, integration into your setup, including benchmarks and failover. GlusterFS is powerful and not too difficult to get started with, but you&#8217;ll soon run into various rather specific questions, which aren&#8217;t documented well (or not at all). Quite frankly the online documentation is poor, or rudimentary. Obviously Gluster, a business, wants to sell their expertise, and there&#8217;s nothing wrong with it. So be prepared to browse mailing list archives or hang out in #gluster in irc.freenode.net or so.</p>
<p>GlusterFS has matured a lot over the last years, and you certainly don&#8217;t need to be worried about losing data (after all it&#8217;s filesystem based and you can copy anything out of the bricks&#8217; directories directly, if you wish). However, some major issues and pitfalls still exist.</p>
<ul>
<li>If you reintroduce or replace a node, which was either faulty or offline for a while, the self-healing will transfer entire files back from up-2-date nodes onto the reintroduced one. This consumes a lot of network bandwidth, and even worse, CPU load (possibly due to the hash comparison). If a GlusterFS brick lives on a box together with other services, you will experience a significant performance hit.</li>
<li>Large files are locked while being replicated. In practice that means that you really can&#8217;t use GlusterFS as a backend for VMs at the moment, unless recovery always happens in a controlled manner at times where you can afford to shut down running VMs for the entire duration of the healing. That somehow defeats the purpose of a high-availability storage cluster.<br />
However, a GlusterFS engineer has told me earlier today on irc.freenode.net that this issue will be tackled in GlusterFS 3.3, if not earlier. Only a question of months, I suppose.</li>
<li>You absolutely must synchronise the system time of all bricks. If you&#8217;re not doing that already anyway, do it before deploying GlusterFS. (use NTP for your own sanity)</li>
<li>Make sure that the bricks of one volume are of identical size and that you don&#8217;t by mistake fill the disk space by other means. I had a situation the other day where I wanted to replace a brick; what I didn&#8217;t realise first was that someone set a disk quota on the new brick. Consequently it stopped writing long before all data could be copied. However, GlusterFS did not warn me, nor did it report an error; it actually confirmed successful migration, although only 1/3 of the files were transferred!<br />
Clearly the lack of accessible disk space wasn&#8217;t GlusterFS&#8217;s fault, and is probably not a common scenario either, but it should spit out at least an error message. Imagine what would have happened if I had taken the other node offline after allegedly successful migration! Total mess.</li>
</ul>
<p>Presumably none of these things would have happened, if I had taken their commercial offerings. <img src='http://sysconfig.org.uk/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />   Those of you who prefer D.I.Y., better be prepared to spend a serious amount of time to fit it into your use-case and more importantly&#8230; monitor it closely!</p>
<h2>Summary</h2>
<p>GlusterFS has made a lot of positive progress over the last 2-3 years. It&#8217;s very easy to get started, especially on RHEL/CentOS, and it offers enormous flexibility and opportunities. The new CLI makes basic configuration much much easier than it used to be before. With a few simple commands you can create your volumes (on multiple servers, aka &#8220;peers&#8221;, simultaneously). You could say that it&#8217;s actually fun to use GlusterFS!</p>
<p>However, if you (like me) are looking at GlusterFS as a backend for Xen or VMware VMs in order to facilitate live-migration and resilience, you will probably need to wait for version 3.3, unless controlled recovery with planned downtime is an option for you. Might be worth keeping an eye on their <a title="GlusterFS Git Repository" href="https://github.com/gluster/glusterfs" target="_blank">Git repository</a> (I certainly will). While using it to serve files for all sorts of things already, I&#8217;m really looking forward to using it as a backend for Xen soon! <img src='http://sysconfig.org.uk/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Version 3.3 brings some other new promising features, too&#8230; Unified storage, object storage&#8230; I see memcached on the list of dependencies&#8230; looks promising. Beta 1 is out, by the way.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://sysconfig.org.uk/2011/07/glusterfs-a-workhorse-that-needs-to-be-tamed/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Filemate SolidGo 48GB EC/34</title>
		<link>http://sysconfig.org.uk/2009/10/filemate-solidgo-48gb-ec34/</link>
		<comments>http://sysconfig.org.uk/2009/10/filemate-solidgo-48gb-ec34/#comments</comments>
		<pubDate>Sun, 11 Oct 2009 11:08:37 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Hardware]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Expresscard/34]]></category>
		<category><![CDATA[MacBook Pro]]></category>
		<category><![CDATA[SSD]]></category>

		<guid isPermaLink="false">http://sysconfig.ossafe.org/?p=202</guid>
		<description><![CDATA[To cut a long story short: massive speed improvement excellent for work with many small files (compile times for big projects reduced significantly, to give one example) good value for money (~ £125) And now the down-sides: getting very hot I have been using it for 4 months, until it died today. I&#8217;m almost sure [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone size-medium wp-image-204" title="photo" src="http://sysconfig.ossafe.org/wp-content/uploads/2009/10/photo-300x225.jpg" alt="photo" width="300" height="225" /></p>
<p>To cut a long story short:</p>
<ul>
<li>massive speed improvement</li>
<li>excellent for work with many small files (compile times for big projects reduced significantly, to give one example)</li>
<li>good value for money (~ £125)</li>
</ul>
<p>And now the down-sides:</p>
<ul>
<li>getting <strong>very</strong> hot</li>
</ul>
<div>I have been using it for 4 months, until it <strong>died</strong> today. I&#8217;m almost sure that it was the temperature together with the very slim design of my MacBook Pro. So I can definitely <strong>not</strong> recommend this solid state drive for 2008 MacBook Pro&#8217;s, but it <em>may</em> work for others.</div>
<div>It&#8217;s a shame&#8230;</div>
]]></content:encoded>
			<wfw:commentRss>http://sysconfig.org.uk/2009/10/filemate-solidgo-48gb-ec34/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ZFS on FreeBSD 7 &#8212; experimental for a reason!</title>
		<link>http://sysconfig.org.uk/2008/05/zfs-on-freebsd-7-experimental-for-a-reason/</link>
		<comments>http://sysconfig.org.uk/2008/05/zfs-on-freebsd-7-experimental-for-a-reason/#comments</comments>
		<pubDate>Wed, 28 May 2008 18:04:58 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[BSD]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Work]]></category>

		<guid isPermaLink="false">http://blog.admin-at-once.co.uk/?p=12</guid>
		<description><![CDATA[Yes, it has a reason that ZFS is not yet marked stable on FreeBSD! I had to learn the hard lesson today. Under very high load and many concurrent read requests (I set up the company&#8217;s mail server with ZFS and root from ZFS), the two disks in the Raid array repeatedly lost sync, forcing [...]]]></description>
			<content:encoded><![CDATA[<p>Yes, it has a reason that ZFS is not yet marked stable on FreeBSD! I had to learn the hard lesson today.</p>
<p>Under very high load and many concurrent read requests (I set up the company&#8217;s mail server with ZFS and root from ZFS), the two disks in the Raid array repeatedly lost sync, forcing an automatic re-silvering (auto healing) process to be started, which blocked the system as everything (except /boot) was running from that ZFS arrray. As far as I figured out, the system halted entirely as there was another inconsistency occuring while the re-silvering was still in progress.</p>
<p>I would have investigated further, if it wasn&#8217;t a crucial production machine. And that kind of traffic is very difficult to simulate under laboratory situations (maybe I can do that when I have more time). So I had to revert back to UFS as the downtime had to be minimized. It&#8217;s a shame, really, because I love the features ZFS offers. On my private server it runs very smoothly, but traffic, load and I/O are not comparable to the mail server in question.</p>
]]></content:encoded>
			<wfw:commentRss>http://sysconfig.org.uk/2008/05/zfs-on-freebsd-7-experimental-for-a-reason/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Two neat storage machines</title>
		<link>http://sysconfig.org.uk/2008/05/two-neat-storage-machines/</link>
		<comments>http://sysconfig.org.uk/2008/05/two-neat-storage-machines/#comments</comments>
		<pubDate>Sat, 10 May 2008 13:44:16 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[BSD]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Work]]></category>

		<guid isPermaLink="false">http://blog.admin-at-once.co.uk/?p=10</guid>
		<description><![CDATA[Yesterday I set up one of our new storage machines for testing: Dell 2950, Quad Xeon, 8GB, 6&#215;750 GB HDD. I installed FreeBSD 7 with ZFS (following up this article). Firstly it seemed to be a bit tricky, because the PERC/6i controller configuration is &#8212; sorry &#8212; crap from the usuability point of view. It [...]]]></description>
			<content:encoded><![CDATA[<p>Yesterday I set up one of our new storage machines for testing: Dell 2950, Quad Xeon, 8GB, 6&#215;750 GB HDD. I installed FreeBSD 7 with ZFS (following up <a href="http://blog.admin-at-once.co.uk/2008/04/zfs-on-freebsd/" target="_self">this article</a>). Firstly it seemed to be a bit tricky, because the PERC/6i controller configuration is &#8212; sorry &#8212; crap from the usuability point of view. It seemed not to support non-RAID configurations, but taking a closer look it turned out to be a wrong assumption. Six RAID-0 Arrays with only one drive each is in fact the same as no RAID at all. (The reason why RAID does not make sense is, that ZFS will do this job, and its auto-healing is much better than any hardware controller&#8217;s auto-healing)</p>
<p>After having set up the minimal FreeBSD and doing some tuning (such as creating the ZFS volumes), I ran some tests. You won&#8217;t believe me, but writing a 10GB file (/dev/random to the ZFS volume) resulted in a transfer rate at about 160MB/sec and reading (cp testfile /dev/null) was done at a speed of more than 270MB/sec!!</p>
<p>To be continued&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://sysconfig.org.uk/2008/05/two-neat-storage-machines/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>(C&#124;G)lustered Storage</title>
		<link>http://sysconfig.org.uk/2008/04/clustered-storage/</link>
		<comments>http://sysconfig.org.uk/2008/04/clustered-storage/#comments</comments>
		<pubDate>Fri, 25 Apr 2008 14:12:25 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[BSD]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Work]]></category>

		<guid isPermaLink="false">http://blog.admin-at-once.co.uk/?p=7</guid>
		<description><![CDATA[As the company I work with has to store many media files, backups, rapidly changing documents and so on, they used to run a NetworkAppliance FAS2020 storage machine, which is quite neat. Unfortunately, the current setup does not allow to scale the volumes any more. So we needed to find an alternative. Consequently we asked [...]]]></description>
			<content:encoded><![CDATA[<p>As the company I work with has to store many media files, backups, rapidly changing documents and so on, they used to run a NetworkAppliance FAS2020 storage machine, which is quite neat. Unfortunately, the current setup does not allow to scale the volumes any more. So we needed to find an alternative.</p>
<p><span id="more-7"></span></p>
<p>Consequently we asked for quotes on bigger NetApp devices. Unfortunately, they cost more than a good car. The investigation on other solutions began&#8230; <img src='http://sysconfig.org.uk/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>After doing some research and testing as well performance as failure behaviour, the decision was made: I will setup a storage cluster based on i386 hardware and <a href="http://www.gluster.org/glusterfs.php" target="_blank">GlusterFS</a>. Its speed was quite impressive. And also the fact that mirroring (for data security) and striping (for performance) can be combined, is very convincing.</p>
<p>We will start with two huge servers which run partly in mirrored and partly in striped mode. The only thing which needs to be tested before hand is, if FreeBSD&#8217;s UFS in combination with its snapshot feature makes sense here (this could be a bit tricky). If it works, this solution will be as good as the proprietary offer. But it costs less than 1/5!</p>
<p>I will keep you posted.</p>
]]></content:encoded>
			<wfw:commentRss>http://sysconfig.org.uk/2008/04/clustered-storage/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using apc
Database Caching 1/6 queries in 0.008 seconds using apc
Object Caching 967/967 objects using apc

Served from: sysconfig.org.uk @ 2012-02-05 20:07:17 -->
