Published Date: 2021/11/14 by: DaVieS
ZFS is a transactional filesystem by Sun Microsystems (ORACLE)
This means: data corruption never should be.
This can be achieved by: checksuming, duplications, and various raid-levels that built-in to ZFS. Yes. ZFS is a volume manager too.
ZFS is popular since FreeBSD introduced it in their releases few years ago.
ZFS has its functionality above other common filesystems and a well-tuned ZFS filesystem mostly can achieve more speed.
I'm using ZFS since its available and here is my review and some tips.
ZFS its a volume manager too, you are directly can use the disks as "software-raid" arrays.
Example: zpool create storage raidz1 /dev/vtbd0 /dev/vtbd1
On 1-Parity (RAID5, RAIDZ1) systems you can continue to use your array in "DEGRADED" mode when you have 1 failing disk from array.
On Double-Parity (RAID6, RAIDZ2) systems you can continue to use your array in "DEGRADED" mode when you have 2 failing disk from array.
On Triple-Parity (RAIDZ3) systems you can continue to use your array in "DEGRADED" mode when you have 3 failing disk from array.
DEGRADED means fully-functional and danger.
root@storage:~ # zpool status pool: zroot state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM zroot ONLINE 0 0 0 raidz3-0 ONLINE 0 0 0 mfisyspd0p3 ONLINE 0 0 0 mfisyspd1p3 ONLINE 0 0 0 mfisyspd10p3 ONLINE 0 0 0 mfisyspd11p3 ONLINE 0 0 0 mfisyspd2p3 ONLINE 0 0 0 mfisyspd3p3 ONLINE 0 0 0 mfisyspd4p3 ONLINE 0 0 0 mfisyspd5p3 ONLINE 0 0 0 mfisyspd6p3 ONLINE 0 0 0 mfisyspd7p3 ONLINE 0 0 0 mfisyspd8p3 ONLINE 0 0 0 mfisyspd9p3 ONLINE 0 0 0 errors: No known data errors root@storage:~ # zpool iostat capacity operations bandwidth pool alloc free read write read write ---------- ----- ----- ----- ----- ----- ----- zroot 57.8T 72.2T 0 484 3.66K 43.7M
In ZFS all files are stored either as a single block of varying sizes (up to the recordsize) or using multiple recordsize blocks.
ZFS doing checksum and metadata by blocks (with size set by recordsize), if you have a lot of random Read/Write like torrent you may set recordsize to physical blocksize, 4K or 16K (torrent write/read 16K chunks at usual).
If you have large files with sequential read / write than you can setup recordsize more than 128K.
For databases 16/32k, for images 128k, virtual images (VDI, VMDK) 4K.
There are two userspace utility: zpool, zfs
'zpool' can handle the "LVM" things.
while 'zfs' can handle the filesystem level.
ZFS has more advantage that you ever imagine, there are partitions but they are dynamically growing types. That means you can set maximum size to partition (quota) but you can have almost unlimited partitions, because they are not reserving space.
ZFS has a power utility, you can realtime set filesystem variables, even recordsizes too. but notice that setting compression, recordsize, ... will be takes effect on newly created files, existing content only can be inherit new settings when you copy, rename (mv) not works.
ZFS can compress your files with zfs set compression=[on/off/[other_algo]] I recommend lz4 because it's CPU friendly.
Turning off sync is not that dangerous as it usually we think, but you can loose data when you set it to disabled but wont affect filesystem integrity.
Dedup is a nice feature on papers, but useless until you not have as much GB of RAM as you TB*7 you have.
ZFS has its own CACHE feature like L1, L2ARC.
L1 ARC is based on RAM while L2ARC only if you have cache (SSD) drive to array.
L1 and L2 is not meant for write-cache, therefore no significant changes on write-performance occurs.
Some sysctl variables are also available to you.
On Linux: These parameters are can be found in /sys/module/zfs/parameters
root@storage:~ # zfs get all zroot
NAME PROPERTY VALUE SOURCE
zroot type filesystem -
zroot creation Tue Mar 13 18:06 2018 -
zroot used 42.0T -
zroot available 49.5T -
zroot referenced 256K -
zroot compressratio 1.00x -
zroot mounted yes -
zroot quota none default
zroot reservation none default
zroot recordsize 128K default
zroot mountpoint /zroot local
zroot sharenfs off default
zroot checksum on default
zroot compression lz4 local
zroot atime off local
zroot devices on default
zroot exec on default
zroot setuid on default
zroot readonly off default
zroot jailed off default
zroot snapdir hidden default
zroot aclmode discard default
zroot aclinherit restricted default
zroot canmount on default
zroot xattr off temporary
zroot copies 1 default
zroot version 5 -
zroot utf8only off -
zroot normalization none -
zroot casesensitivity sensitive -
zroot vscan off default
zroot nbmand off default
zroot sharesmb off default
zroot refquota none default
zroot refreservation none default
zroot primarycache all default
zroot secondarycache all default
zroot usedbysnapshots 0 -
zroot usedbydataset 256K -
zroot usedbychildren 42.0T -
zroot usedbyrefreservation 0 -
zroot logbias latency default
zroot dedup off default
zroot mlslabel -
zroot sync standard default
zroot refcompressratio 1.00x -
zroot written 256K -
zroot logicalused 42.0T -
zroot logicalreferenced 31K -
zroot volmode default default
zroot filesystem_limit none default
zroot snapshot_limit none default
zroot filesystem_count none default
zroot snapshot_count none default
zroot redundant_metadata all default
FreeBSD can use ZFS natively, also ZFS is more powerful under BSD.
See this pretty 'top' showing ZFS parameters by native.
last pid: 35802; load averages: 0.71, 0.59, 0.52 up 12+20:47:44 15:17:05 73 processes: 1 running, 72 sleeping CPU: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle Mem: 7436K Active, 54M Inact, 31M Laundry, 61G Wired, 1296M Free ARC: 56G Total, 325M MFU, 53G MRU, 36M Anon, 149M Header, 2121M Other 53G Compressed, 68G Uncompressed, 1.29:1 Ratio Swap: 24G Total, 11M Used, 24G Free
We have this GEM, use IT! This article created from 5+ years of heavy ZFS usage, no data loss and acceptable speed, and not worrying since replication started :)
If you like the article then don't forget to share!
nPulse.net going dark (again)
Back in time the most of the websites were DARK, specially the tech ones and forums. Then people started to use more slick, curved and light..
sysAdmin ToolBox v2.3.0
We just uploaded the very new release of this application.We added new function called: WiFi Discovery I'm sure everyone will love that, so..
New Software (PFR)
Our new Software released under Open Source license. PFR is a cross-platform easy-to-use powerfull tool to recover broken files that caused HDD..
BVCP for FreeBSD Bhyve Released today!
I'm happily announce that, one of our greatest product just released for the public! BVCP is a Webcontrol interface for FreeBSD Bhyve aka..
Upgrade into FreeBSD Bhyve was successfull
As I told before we were started to migrate our infrastructure from Linux/KVM (FreeBSD) into FreeBSD/Bhyve (FreeBSD). I announce that we..
Hello Bhyve, Im moving in ...
We are performing an update at this weekend, there will be some interrupts in our services. Please be patient! ..
Launched BVCP Today!
FreeBSD Bhyve Web Control Panel launched today as planned as pre-release.Project started at 2021.05 month and yet ready for production use..
bhyve webadmin, web control panel
FreeBSD uses bhyve as hypervisor! So.. no questions we are using FreeBSD for web, mailing, devel, for everything.Now we would like to drop KVM /..
Side Quest, BVCP WebUI for FreeBSD Bhyve
Okay, so we are in middle to upgrade and realign our infrastructure and happened days ago with a random facebook talk, someone hinted FreeBSD..
Corrupted innoDB on linux ext4, data recovery
I could say I saw a everthing but not, here is the case: There is a VM Host with ZFS Storage, direct attached, and there is a Linux VPS with ext4..