Designing your own manager
If you'd like to extend the functionality of the package, please feel free to make a pull request on the project's github.
To extend the functionality by supporting another storage medium, you can inherit from the Manager abstract base class and implement the abstract methods it declares. You can then incorporate the manager by exposing your new Manager via the python entry point system.
Important
stow uses the entry point stow_managers to find managers
Add your managers to this entry point to integrate seamlessly with the stow stateless interface and connect utilities.
Base classes
Managers should be implemented as either a LocalManager or RemoteManager
from stow.manager import LocalManager, RemoteManager
The main functions on Manager use a method localise to get an absolute path to artefacts with which they want to interact. This method is responsible for ensuring the artefacts availability for the other methods and it is the key difference between the LocalManager and RemoteManager.
A LocalManager can access their artefacts directly and a RemoteManager must retrieve their artefacts before they can work with them.
Each Manager implements a localise function for these situations respectively. The RemoteManager object's localise function is a lot more involved to avoid pulling and pushing information anymore more than it needs to.
localise makes use of your abstract methods defined below to uphold the interface of Manager and does not need to be re-implemented.
Note
You may inherit from the Manager base class directly if you wish but you will have to implement the localise method in addition to the other abstract methods. I'd only suggest doing this if you have very special behaviour you want to express.
If you do find yourself in this situation, please consider adding this special behaviour as it's own abstract base class back to the original project to help others.
Abstract methods
_abspath(managerPath)
Return the absolute path on the backend provider from the standardised manager path.
managerPath(str) — The manager relative path which is to be converted to an absolute path
The manager absolute path
For the filesystem, this will be the full absolute path to the object. For s3 this is the key of the object.
>>> stow.connect(manager='FS', path='/home/ubuntu')._abspath('/hello/there')
'/home/ubuntu/hello/there'
>>> stow.connect(manager='s3', bucket='bucket-example')._abspath('/hello/there')
'hello/there'
def _abspath(self, managerPath: str) -> str:
path = self.join(self._path, managerPath, joinAbsolutes=True)
if os.name == 'nt':
path = path.replace('/', '\\')
return path
_identifyPath(managerPath)
For the path given, create an Artefact for the object at the location on the manager but do not add it
into the manager. If no object exists - return None
abspath— The path for artefact on disk
The artefact object that represents the item on disk or None if nothing exists
def _identifyPath(self, managerPath: str):
abspath = self._abspath(managerPath)
if os.path.exists(abspath):
stats = os.stat(abspath)
# Created time
createdTime = datetime.datetime.utcfromtimestamp(stats.st_ctime)
createdTime = pytz.UTC.localize(createdTime)
# Modified time
modifiedTime = datetime.datetime.utcfromtimestamp(stats.st_mtime)
modifiedTime = pytz.UTC.localize(modifiedTime)
# Access time
accessedTime = datetime.datetime.utcfromtimestamp(stats.st_atime)
accessedTime = pytz.UTC.localize(accessedTime)
if os.path.isfile(abspath):
return File(
self,
managerPath,
stats.st_size,
modifiedTime,
createdTime,
accessedTime,
)
elif os.path.isdir(abspath):
return Directory(
self,
managerPath,
createdTime=createdTime,
modifiedTime=modifiedTime,
accessedTime=accessedTime,
)
return None
_get(source, destination)
Fetch the artefact and downloads its data to the local destination path provided
The existence of the file to collect has already been checked so this function can be written to assume its existence
source(Artefact) — The source object and context that is to be downloadeddestination(str) — The local path to where the source is to be written
def _get(self, source: Artefact, destination: str):
# Convert source path
sourceAbspath = self._abspath(source.path)
# Identify download method
method = shutil.copytree if os.path.isdir(sourceAbspath) else shutil.copy
# Download
method(sourceAbspath, destination)
_getBytes(source)
Fetch the file artefact contents directly. This is to avoid having to write the contents of files to discs for some of the other operations.
The existence of the file to collect has already been checked so this function can be written to assume its existence
source(Artefact) — The source object and context that is to be downloaded
The bytes content of the disk
def _getBytes(self, source: Artefact) -> bytes:
with open(self._abspath(source.path), "rb") as handle:
return handle.read()
_put(source, destination)
Put the local filesystem object onto the underlying manager implementation using the absolute paths given.
To avoid user error - an artefact cannot be placed onto a Directory unless an overwrite toggle has been passed which is False by default. This should protect them from accidentally deleting a directory.
In the event that they want to do so - the deletion of the directory will be handled before operating this function. Therefore there is no need to check/protect against it. (famous last words)
source(str) — A local absolute path to an artefact (File or Directory)destination(str) — A manager abspath path for the artefact
def _put(self, source: str, destination: str):
# Convert destination path
destinationAbspath = self._abspath(destination)
# Ensure the destination
os.makedirs(os.path.dirname(destinationAbspath), exist_ok=True)
# Select the put method
method = shutil.copytree if os.path.isdir(source) else shutil.copy
# Perform the putting
method(source, destinationAbspath)
_putBytes(fileBytes, destination)
Put the bytes of a file object onto the underlying manager implementation using the absolute path given.
This function allows processes to avoid writing files to disc for speedier transfers.
If it's not possible to transmit bytes - I'd recommend writing the bytes to a tempfile and then operating the put method.
fileBytes(bytes) — files bytesdestinationAbsPath(str) — Remote absolute path
def _putBytes(self, fileBytes: bytes, destination: str):
# Convert destination path
destinationAbspath = self._abspath(destination)
# Makesure the destination exists
os.makedirs(os.path.dirname(destinationAbspath), exist_ok=True)
# Write the byte file
with open(destinationAbspath, "wb") as handle:
handle.write(fileBytes)
_cp(source, destination)
Method for copying an artefact local to the manager to another location on the manager. Implementation would avoid having to download data from a manager to re-upload that data.
If there isn't a method of duplicating the data on the manager, you can call self._put(self._abspath(source.path), destination)
Which will mean the behaviour defaults to the put action.
source(Artefact) — the manager local source artefactdestination(str) — a manager abspath path for destination
def _cp(self, source: Artefact, destination: str):
self._put(self._abspath(source.path), destination)
_mv(source, destination)
Method for moving an artefact local to the manager to another location on the manager. Implementation would avoid having to download data from a manager to re-upload that data.
If there isn't a method of duplicating the data on the manager, you can call self._put(self._abspath(source.path), destination) self._rm(self._abspath(source.path))
Which will mean the behaviour defaults to the put action and then a delete of the original file. Achieving the same goal.
source(Artefact) — the manager local source filedestination(str) — a manager abspath path for destination
def _mv(self, source: Artefact, destination: str):
# Convert the source and destination
source, destination = self._abspath(source.path), self._abspath(destination)
# Ensure the destination location
os.makedirs(os.path.dirname(destination), exist_ok=True)
# Move the source artefact
os.rename(source, destination)
_ls(directory)
List all artefacts that are present at the directory objects location and add them into the manager.
managerPath— the manager path to the directory whose content is to be indexed
def _ls(self, directory: str):
# Get a path to the folder
abspath = self._abspath(directory)
# Iterate over the folder and identify every object - add the created
for art in os.listdir(abspath):
self._addArtefact(
self._identifyPath(
self.join(directory, art, separator='/')
)
)
_rm(artefact)
Delete the underlying artefact data on the manager.
To avoid possible user error in deleting directories, the user must have already indicated that they want to delete everything
artefact(Artefact) — The artefact on the manager to be deleted
def _rm(self, artefact: Artefact):
# Convert the artefact
artefact = self._abspath(artefact.path)
# Select method for deleting
method = shutil.rmtree if os.path.isdir(artefact) else os.remove
# Remove the artefact
method(artefact)
_signatureFromURL(url)
Create the signature that can be passed to the init of the manager to create a new instance using the information passed via the url ParseResult object that will have been created via the stateless interface
url(ParseResult) — The result of passing the stateless path through urllib.parse.urlparse
A manager of this type loaded with information from the url Relpath: The manager relative path for the artefact that may have been referenced
Error— Errors due to missing information and so on
def _signatureFromURL(cls, url: urllib.parse.ParseResult):
return {"path": "/"}, os.path.abspath(os.path.expanduser(url.path))
toConfig()
Generate a config which can be unpacked into the connect interface to initialise this manager. To be
used to seralise and de-seralise a manager object.
NOTE Defaulted values or environment variables are not guaranteed to be saved
A dictionary of the kwargs of the init of the manager
def toConfig(self):
return {'manager': 'FS', 'path': self._path}
Special cases
Depending on the storage medium, it may be more efficient to load (read the metadata of) multiple artefacts simultaneously. s3 for example, returns the metadata for all files at a level when asked. It would be more efficient to instantiate all of these objects at this point rather than singling out any single object.
This can be achieved by overloading the _loadArtefact method on the Manager, which is the method used internally to create/ensure an artefact object.
def _loadArtefact(self, managerPath: str) -> Artefact:
if managerPath in self._paths:
# Artefact was previously loaded and can be returned normally
return super()._loadArtefact(managerPath)
try:
# Ensure the owning directory and fetch the directory object
directory = self._ensureDirectory(self.dirname(managerPath))
except (exceptions.ArtefactNotFound, exceptions.ArtefactTypeError) as e:
raise exceptions.ArtefactNotFound("Cannot locate artefact {}".format(managerPath)) from e
# Add all artefacts of the directory into the manager at this level
self._ls(directory.path)
directory._collected = True
# Return the now instantiated artefact
if managerPath in self._paths:
return self._paths[managerPath]
else:
raise exceptions.ArtefactNotFound("Cannot locate artefact {}".format(managerPath))