2 min read

Rclone

RCLONE
RCLONE

Robust software to distribute files between A and B.  Distribute operations are  move/copy/sync and A,B is roughly 80 different protocols.  

It is an high quality software with a well deserved huge user base and was/is written by Nick Craig-Wood . I can easily recommend it as an component wherever files need to go in/out between systems or zones.

What Rclone is not (per 2022-07-05)

  • It does not contain the RSync algoritm implementation. https://rsync.samba.org/ Instead respective protocol's feature is used to implent similar features like checksum/resume etc. This is agnostic from the user but check the Rclone documentation for your specific protocol if checksum is implemented etc.
  • It doesnt emit event for each file. Instead it's metrics or log file can be used for  this , or add your custom event  easily by 1-2 lines of Golang , see below and my post "use Rclone programaticly".
  • It doesnt do "pure" streaming. It can read pipe but the whole file is read in memory behind the scenes.
  • It doesnt (didnt ..) contain a directory watch function. Instead an cron job could be used to repeatly call an operation like cp,mv etc. If last operation is not finished before next cron job an parameter that allocates an specific port could be used. If the  port is taken by an previously rclone job , the command fails directly.
  • Strangly enough it is missing WebHDFS protocol .  It is the only protocoll I found missing.
  • It is not totally safe against missuse ... ie a failed file transfer will be cleaned up nicely BUT it will be visible on the destination side for a flickering amount of time.

This video catches the moment Rclone finds out the source file is still growing (perhaps from an onging ftp put operation into your servers filesystem) and correcly aborts the transfer and cleans the destination file away.  Note that RClone has multiple ways of mitigate this default behavior where one option is  wait a minimum time before transfering the option "--min-age 10m" for a 10 minute marginal before starting to read the file, but lets pretend we didnt kow this !

0:00
/

Programatic usage

The below minimal example uses Rclone as a library and lists a directory programaticly.

package main

import (
	"context"
	"fmt"
	"github.com/rclone/rclone/fs/config/configfile"
	"log"

	_ "github.com/rclone/rclone/backend/all"
	_ "github.com/rclone/rclone/backend/drive"
	_ "github.com/rclone/rclone/backend/local"
	"github.com/rclone/rclone/fs"
	"github.com/rclone/rclone/fs/filter"
	"github.com/rclone/rclone/fs/sync"
)

func main() {

ctx := context.Background()
	configfile.Install()


	fsource, err := fs.NewFs(ctx, "hadoop:")
	if err != nil {
		log.Fatal(err)
	}
	entries, err := fsource.List(context.Background(), "/data")
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(entries)
}

My config file is located under /home/rickard/.config/rclone/rclone.conf

[hadoop]
type=hdfs
namenode = 10.1.1.190:8020
username = rickard

It is easy to see that the above example could be extended to mitigate a the problem with the  "flickering file" by placing a empty marker/checksum file for each sucessfully transfered and ready file. The destination side should monitor these marker files , verify the checksum and then grab the file.  See my upcoming post "use Rclone programaticly"