iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🐰

macOS Package-Based Documents and Zip Archiving

に公開

What is a Document

Document Icons

I will introduce the techniques for implementing package-style documents in macOS Document-Based Apps. The prerequisite UI framework is AppKit. I will not mention SwiftUI's Document App.

A "Document" is an abstract representation for containing content generated by the user within an application. During persistence, it is written to a "file" in an arbitrary format, and when loaded into the application, it is restored as a document. In AppKit, this is handled by subclasses of NSDocument.

When handling documents (files) in a native macOS application, you choose a structure to represent them. Besides primitive text formats or directly handling binary data like images, there are methods such as implementing a unique file format using a directory-based package style.

  • Text
  • Binary Data
  • Package
  • Package + Archiving

A package format is sometimes called a "Bundle." macOS developers might be more familiar with this name. Additionally, there seems to be an approach of adopting Core Data (NSPersistentDocument[1]) for data persistence, but since detailed technical documentation is scarce, I won't delve into it deeply.

Regardless of the method, implementations generally follow the Cocoa Document Architecture. While it is possible to implement document file I/O without following the Cocoa Document Architecture, I want to follow the architecture to enjoy "macOS-like behavior" as much as possible.

In this article, I will briefly introduce the package format and the techniques for archiving it to make it even easier to handle.


The layout of Cocoa Document Architecture: https://developer.apple.com/documentation/appkit/documents_data_and_pasteboard/developing_a_document-based_app

Intent of Adopting the Package Format for Documents

First, let's briefly review the package format: it appears as a normal folder on the filesystem. Finder may present some packages and bundles as a single file instead of a folder[2]. Representative examples include .app, .bundle, .plugin, and .menu, but specific extensions aren't necessarily always represented as packages/bundles; developers can name extensions as they like.

In Cocoa, you manipulate packages using FileWrapper[3]. FileWrapper is a class for manipulating filesystem node representations from code. Since you can handle the constituent data of a document file as if you were manipulating a directory, it is one of the quickest ways to persist a unique document format while conforming to the Cocoa Document Architecture.

In short, adopting a package format makes it easier to design unique documents and allows you to represent the contents of a document file directly through a directory hierarchy.

Even with a package format, basic features like document auto-save and version restoration can be used almost as-is, so the biggest advantage is the ease of creating a macOS-like Document-Based App. It is also easy to debug because you can peek inside directly with Finder. If the content being handled is not general data like plain text or images, and if the document structure is complex and designed with a unique architecture, it is best to first consider the package format.

The "Archiving Package Format" Approach

The problem with pure package formats is that they are unsuitable for multi-platform deployment. A package appears as just a collection of directories to many filesystems other than Finder, so when exchanging with Windows or via cloud storage, there is a risk of document file corruption. Since it is a directory (folder), it becomes data that is difficult to handle for sharing partners outside of macOS. Therefore, a possible countermeasure is to archive the package and write it out as a single binary data (file). Specifically, you insert a process to archive it as a Zip when saving the package.

Documents for the iWork series (Keynote, Pages, Numbers), such as .key, .pages, and .numbers, and Sketch documents .sketch take the approach of representing the document in macOS package format and then archiving it with Zip. If you try appending .zip to the end of a document file created with these applications and then unzip it, you should be able to extract it.


Contents of a .sketch file after unzipping. You can see that a Sketch document is just a directory hierarchy containing JSON files and other items.

The reason for adopting Zip as the archive format is likely just because it is easy to handle. Being an open-source format that has been widely popular for a long time, it offers high portability. Thus, for example, when you upload a Zip-archived document file to the web, there is room to easily expand it using web technologies. This allows for product expansions like a macOS version of the editor and a web version of the viewer.

Furthermore, using Zip compression can potentially reduce the document's data size. For a macOS developer, relying on the Zip mechanism without writing custom compression logic makes implementation very easy. Of course, it is also possible to make it an uncompressed Zip.

By the way, iWork series documents have multiple formats: the "Single File" format is a Zip-archived package, and the "Package" format is a regular directory-style package. The "Single File" format is more suitable for exchange via cloud storage other than iCloud.

Selecting document format

Recommended for macOS / Swift Development Environment: ZIP Foundation

ZIP Foundation is a library that allows you to easily create and expand Zip archives from Swift code. While there are several Zip libraries available for Apple platforms, ZIP Foundation seems to be the most famous and user-friendly.

By the way, it uses Apple's Compression instead of depending on other external libraries for the compression process, so it can be quickly introduced using Swift Package Manager while keeping dependencies simple. It also supports the creation of uncompressed Zips, so if you are concerned about the cost of compression processing, you can choose to simply archive without compression[4].

Implementation Policy for NSDocument I/O Using ZIP Foundation

A rough sequence diagram of the process looks like this. For the write process, returning data via NSDocument's data(ofType: String) treats it as the document file. However, the flow involves first writing to a temporary directory to materialize the package, then reading and zipping it, and finally writing the Zip data formally as the document.

Reading is performed twice using NSDocument methods.

We will implement this by creating a subclass of NSDocument. There are multiple methods for document I/O processing, and which one to use is determined by the purpose. We will focus on three main methods in this process.

First, for reading, we use two methods: one for reading the Zip archive and another for reading the package. The one actually called is read(from: URL, ofType: String), which receives the URL of the Zip file. We unzip it using that URL, create a FileWrapper, and then manually call read(from: FileWrapper, ofType: String) to expand the contents of the FileWrapper into memory.

Methods for reading into NSDocument
// Reading a Zip archive
nonisolated func read(
    from url: URL,
    ofType typeName: String
) throws

// Reading a package
nonisolated func read(
    from fileWrapper: FileWrapper,
    ofType typeName: String
) throws

read(from: URL, ofType: String)
read(from: FileWrapper, ofType: String)

On the other hand, for writing, we use one method to write the Zip archive data to a file. We perform multiple write operations separately before reaching this point, which we call ourselves via FileManager, FileWrapper, or ZIP Foundation.

Method for writing NSDocument
func data(ofType typeName: String) throws -> Data

data(ofType: String) -> Data

Defining Document Types and Exported Type Identifiers

In the design of a Document-Based App, it is necessary to define Document Types and Type Identifiers in the Info.plist.

Document Types (CFBundleDocumentTypes) is a required item. Imported Type Identifiers (UTImportedTypeDeclarations) is for handling existing file formats, while Exported Type Identifiers (UTExportedTypeDeclarations) is for defining and handling unique file formats. In this case, since we are using a unique format (even though the underlying entity is a Zip), we will record the information in the Exported section.

The screenshot below is an example; please rewrite Identifier, Description, Extensions, and Icon Text as appropriate for your own application design. Specify the name of your NSDocument subclass in the Class field.

The Conforms To field for the Exported Type specifies the UTI that this format inherits from. For a pure package format, you would use com.apple.package, but since we are representing it as a Zip archive, let's try public.data[5].

Correction Addendum
As a reference, looking at the specification for TextPack (.textpack), which is the Zip archive format for TextBundle, it specifies the inherited UTI as com.pkware.zip-archive. Following that example, it is best to enter com.pkware.zip-archive.
http://textbundle.org/spec/


Example of Document Types and Exported Type Identifiers definitions.

Differences Between CFBundleDocumentTypes / UTImportedTypeDeclarations / UTExportedTypeDeclarations

CFBundleDocumentTypes

CFBundleDocumentTypes declares the attributes of the document formats that the app can open. You can describe this in various ways, such as UTI, file extension, MIME type, or OS Type (a four-character code used in Classic Mac OS / Carbon), but this is due to historical reasons. Since some methods are deprecated, you should generally use the declaration method using UTI. A Role definition is mandatory.

When describing with UTI, it is also recommended to define UTImportedTypeDeclarations and UTExportedTypeDeclarations at the same time.

UTImportedTypeDeclarations

UTImportedTypeDeclarations is used to describe related UTIs and attributes when you want to support a known data format that your app does not own. For example, public.json or public.xml would fall into this category. It is very similar to UTExportedTypeDeclarations, but note that it is used in scenarios where you are supporting a format you do not own.

UTExportedTypeDeclarations

UTExportedTypeDeclarations is used when you want to declare a data format that your own app defines uniquely. For example, Keynote describes its unique type as com.apple.iwork.keynote.key. Use this definition when the owner of that data format is your own app.

For more detailed information, please refer to the official documentation.

Processing for Writing Documents

The FileWrapper object is used to organize the data required for writing a document to a file. In essence, it is an abstract representation of the file. Before the writing (saving) process, the necessary data is collected into the FileWrapper.

To write the FileWrapper directly to the filesystem, use write(to:options:originalContentsURL:). This allows you to temporarily write the document's content to the filesystem in a package format, and then use that URL for the Zip process.

Writing a FileWrapper directly to the filesystem
let tempDirURL: URL = ...
let documentFileWrapper = FileWrapper(directoryWithFileWrappers: [:])

try? documentFileWrapper.write(to: tempDirURL, options: [], originalContentsURL: nil)

To create a Zip archive, use the FileManager method extended by ZIP Foundation. The Zip file is written directly to the filesystem.

Creating a Zip archive
import ZIPFoundation

let tempDirURL: URL = ... // Destination where documentFileWrapper was written
let tempZipURL: URL = ... // Location to place the Zip file

// `.none` for uncompressed, `.deflate` for compressed
let compression: CompressionMethod = .none
// zip
try? FileManager.default.zipItem(at: tempDirURL,
				 to: tempZipURL,
				 shouldKeepParent: false,
				 compressionMethod: compression)
Formally writing the zipped document data
class Document: NSDocument {

	// Called at arbitrary times, such as during save execution
	override func data(ofType typeName: String) throws -> Data {
		let tempZipURL: URL = ... // URL of the Zip file
		
		// Convert the Zip file directly into a Data object
		let archivedData = Data(contentsOf: tempZipURL)
		// Returning this writes it formally as a document file
		return archivedData
	}

Processing for Reading Documents

In the reading process, read(from: URL, ofType: String) is called first, so you unzip the Zip archive indicated by that URL. Since its content is effectively just a directory (the package), you manually expand it into a FileWrapper. At this point, you manually call NSDocument's read(from: FileWrapper, ofType: String) method to load the data.

Creating a FileWrapper object after unzipping the document
import ZIPFoundation

class Document: NSDocument {

	// Called at arbitrary times, such as when opening a document file
	override func read(from url: URL, ofType typeName: String) throws {
		let tempDirURL: URL = ...
		// (Omitted)
	
		// unzip
		try? FileManager.default.unzipItem(at: url, to: tempDirURL)
		// Convert to a FileWrapper object
		let documentFileWrapper = FileWrapper(url: tempDirURL, options: [])
		// Formally read the contents of the FileWrapper
		try? read(from: documentFileWrapper, ofType: typeName)
		
		// Throw if an error occurs
	}
Extracting data from FileWrapper
class Document: NSDocument {

	override func read(from fileWrapper: FileWrapper, ofType typeName: String) throws {
		// Perform a UTI check using typeName and extract data from fileWrapper
		// Or throw if an error occurs
	}

That's roughly the core part.

It's possible to achieve more advanced operations by directly using ZIP Foundation's Archive or Entry. For example, it seems you can perform random access to internal data without unzipping[6]. If you want to implement I/O processing quickly, you can write it simply using the method introduced here.

To make it function as an App, some other minimal implementations are required, but I will omit them in this article. If you start a project with the Xcode Document App template and follow the definitions in Info.plist and the design flow introduced, it should basically work.

Please refer to the sample on my GitHub for the complete code.
https://github.com/usagimaru/ZipArchivedDocument


脚注
  1. Japanese resources referring to Document-Based Apps using NSPersistentDocument.
    https://banjun.hatenadiary.org/entry/20070409/1176107618
    https://hylom.net/2017/04/09/macos-app-develop-with-storyboard-and-coredata/ ↩︎

  2. For application bundles and some plugin bundles, you can expand the contents even from the Finder window by selecting "Show Package Contents" from the context menu. Alternatively, if you explore via the CLI, they appear as normal directories. ↩︎

  3. A class that was formerly called NSFileWrapper. ↩︎

  4. I haven't confirmed exactly how much processing time increases when compression is applied. I can't say for sure if uncompressed processing is significantly lighter. ↩︎

  5. If it's Zip, com.pkware.zip-archive might be appropriate. UTI Reference
    Update: Initially, I suggested setting the parent UTI for Conforms To to public.data, but the specification for TextPack (.textpack), the Zip archive format for TextBundle, defines the inherited UTI as com.pkware.zip-archive. Therefore, it's better to enter com.pkware.zip-archive following that.
    http://textbundle.org/spec/ ↩︎

  6. Reference article: https://kean.blog/post/pulse-store ↩︎

Discussion