14. Using TensorFlow Lite in iOS Apps – AI and Machine Learning for Coders

Chapter 14. Using TensorFlow Lite in iOS Apps

Chapter 12 introduced you to TensorFlow Lite and how you can use it to convert your TensorFlow models into a power-efficient, compact format that can be used on mobile devices. In Chapter 13 you then explored creating Android apps that use TensorFlow Lite models. In this chapter you’ll do the same thing but with iOS, creating a couple of simple apps, and seeing how you can do inference on a TensorFlow Lite model using the Swift programming language.

You’ll need a Mac if you want to follow along with the examples in this chapter, as the development tool to use is Xcode, which is only available on Mac. If you don’t have it already, you can install it from the App Store. It will give you everything you need, including an iOS Simulator on which you can run iPhone and iPod apps without a physical device.

Creating Your First TensorFlow Lite App with Xcode

Once you have Xcode up and running, you can follow the steps outlined in this section to create a simple iOS app that incorporates the Y = 2X – 1 model from Chapter 12. While it’s an extremely simple scenario, and definite overkill for a machine learning app, the skeleton structure is the same as that used for more complex apps, and I’ve found it a useful way of demonstrating how to use models in an app.

Step 1. Create a Basic iOS App

Open Xcode and select File → New Project. You’ll be asked to pick the template for your new project. Choose Single View App, which is the simplest template (Figure 14-1), and click Next.

After that you’ll be asked to choose options for your new project, including a name for the app. Call it firstlite, and make sure that the language is Swift and the user interface is Storyboard (Figure 14-2).

Figure 14-1. Creating a new iOS application in Xcode
Figure 14-2. Choosing options for your new project

Click Next to create a basic iOS app that will run on an iPhone or iPad simulator. The next step is to add TensorFlow Lite to it.

Step 2. Add TensorFlow Lite to Your Project

To add dependencies to an iOS project, you can use a technology called CocoaPods, a dependency management project with many thousands of libraries that can be easily integrated into your app. To do so, you create a specification called a Podfile, which contains details about your project and the dependencies you want to use. This is a simple text file called Podfile (no extension), and you should put it in the same directory as the firstlite.xcodeproj file that was created for you by Xcode. Its contents should be as follows:

# Uncomment the next line to define a global platform for your project
platform :ios, '12.0'

target 'firstlite' do
  # Comment the next line if you're not using Swift and don't want to 
  # use dynamic frameworks
  use_frameworks!

  # Pods for ImageClassification
  pod 'TensorFlowLiteSwift'
end

The important part is the line that reads pod 'TensorFlowLiteSwift', which indicates that the TensorFlow Lite Swift libraries need to be added to the project.

Next, using Terminal, change to the directory containing the Podfile and issue the following command:

pod install

The dependencies will be downloaded and added to your project, stored in a new folder called Pods. You’ll also have an .xcworkspace file added, as shown in Figure 14-3. Use this one in the future to open your project, and not the .xcodeproj file.

Figure 14-3. Your file structure after running pod install

You now have a basic iOS app, and you have added the TensorFlow Lite dependencies. The next step is to create your user interface.

Step 3. Create the User Interface

The Xcode storyboard editor is a visual tool that allows you to create a user interface. After opening your workspace, you’ll see a list of source files on the left. Select Main.storyboard, and using the controls palette, you can drag and drop controls onto the view for an iPhone screen (Figure 14-4).

Figure 14-4. Adding controls to the storyboard

If you can’t find the controls palette, you can access it by clicking the + at the top right of the screen (highlighted in Figure 14-4). Using it, add a Label, and change the text to “Enter a Number.” Then add another one with the text “Result goes here.” Add a Button and change its caption to “Go,” and finally add a Text Field. Arrange them similarly to what you can see in Figure 14-4. It doesn’t have to be pretty!

Now that the controls are laid out, you want to be able to refer to them in code. In storyboard parlance you do this using either outlets (when you want to address the control to read or set its contents) or actions (when you want to execute some code when the user interacts with the control).

The easiest way to wire this up is to have a split screen, with the storyboard on one side and the ViewController.swift code that underlies it on the other. You can achieve this by selecting the split screen control (highlighted in Figure 14-5), clicking on one side and selecting the storyboard, and then clicking on the other side and selecting ViewController.swift.

Figure 14-5. Splitting the screen

Once you’ve done this, you can start creating your outlets and actions by dragging and dropping. With this app, the user types a number into the text field, presses Go, and then runs inference on the value they typed. The result will be rendered in the label that says “Result goes here.”

This means you’ll need to read or write to two controls, reading the contents of the text field to get what the user typed in, and writing the result to the “Results goes here” label. Thus, you’ll need two outlets. To create them, hold down the Ctrl key and drag the control on the storyboard onto the ViewController.swift file, dropping it just below the class definition. A pop-up will appear asking you to define it (Figure 14-6).

Figure 14-6. Creating an outlet

Ensure the connection type is Outlet, and create an outlet for the text field called txtUserData and one for the label called txtResult.

Next, drag the button over to the ViewController.swift file. In the pop-up, ensure that the connection type is Action and the event type is Touch Up Inside. Use this to define an action called btnGo (Figure 14-7).

Figure 14-7. Adding an action

At this point your ViewController.swift file should look like this—note the IBOutlet and IBAction code:

import UIKit

class ViewController: UIViewController {
    @IBOutlet weak var txtUserData: UITextField!
    
    @IBOutlet weak var txtResult: UILabel!
    @IBAction func btnGo(_ sender: Any) {
    }
    override func viewDidLoad() {
        super.viewDidLoad()
        // Do any additional setup after loading the view.
    }
}

Now that the UI is squared away, the next step will be to create the code that will handle the inference. Instead of having this in the same Swift file as the ViewController logic, you’ll place it in a separate code file.

Step 4. Add and Initialize the Model Inference Class

To keep the UI separate from the underlying model inference, you’ll create a new Swift file containing a ModelParser class. This is where all the work of getting the data into the model, running the inference, and then parsing the results will happen. In Xcode, choose File → New File and select Swift File as the template type (Figure 14-8).

Figure 14-8. Adding a new Swift file

Call this ModelParser, and ensure that the checkbox targeting it to the firstlite project is checked (Figure 14-9).

Figure 14-9. Adding ModelParser.swift to your project

This will add a ModelParser.swift file to your project that you can edit to add the inference logic. First, ensure that the imports at the top of the file include TensorFlowLite:

import Foundation
import TensorFlowLite

You’ll pass a reference to the model file, model.tflite, to this class—you haven’t added it yet, but you will soon:

typealias FileInfo = (name: String, extension: String)

enum ModelFile {
  static let modelInfo: FileInfo = (name: "model", extension: "tflite")
}

This typealias and enum make the code a little more compact. You’ll see them in use in a moment. Next you’ll need to load the model into an interpreter, so first declare the interpreter as a private variable to the class:

private var interpreter: Interpreter

Swift requires variables to be initialized, which you can do within an init function. The following function will take two input parameters. The first, modelFileInfo, is the FileInfo type you just declared. The second, threadCount, is the number of threads to use to initialize the interpreter, which we’ll set to 1. Within this function you’ll create a reference to the model file that you described earlier (model.tflite):

init?(modelFileInfo: FileInfo, threadCount: Int = 1) {
  let modelFilename = modelFileInfo.name

  guard let modelPath = Bundle.main.path
  (
    forResource: modelFilename,
    ofType: modelFileInfo.extension
  )
  else {
    print("Failed to load the model file")
    return nil
  }

Once you have the path to the model file in the bundle, you can load it:

do
  {
    interpreter = try Interpreter(modelPath: modelPath)
  }
  catch let error
  {
    print("Failed to create the interpreter")
    return nil
  }

Step 5. Perform the Inference

Within the ModelParser class, you can then do the inference. The user will type a string value in the text field, which will be converted to a float, so you’ll need a function that takes a float, passes it to the model, runs the inference, and parses the return value.

Start by creating a function called runModel. Your code will need to catch errors, so start it with a do{:

func runModel(withInput input: Float) -> Float? {
    do{

Next, you’ll need to allocate tensors on the interpreter. This initializes it and readies it for inference:

    try interpreter.allocateTensors()

Then you’ll create the input tensor. As Swift doesn’t have a Tensor data type, you’ll need to write the data directly to memory in an UnsafeMutableBufferPointer. You can specify the type of this, which will be Float, and write one value (as you only have one float), starting from the address of the variable called data. This will effectively copy all the bytes for the float into the buffer:

    var data: Float = input
      let buffer: UnsafeMutableBufferPointer<Float> = 
          UnsafeMutableBufferPointer(start: &data, count: 1)

With the data in the buffer, you can then copy it to the interpreter at input 0. You only have one input tensor, so you can specify it as the buffer:

    try interpreter.copy(Data(buffer: buffer), toInputAt: 0)

To execute the inference, you invoke the interpreter:

    try interpreter.invoke()

There’s only one output tensor, so you can read it by taking the output at 0:

    let outputTensor = try interpreter.output(at: 0)

Similar to when inputting the values, you’re dealing with low-level memory, which is unsafe data. It’s in an array of Float32 values (it only has one element but still needs to be treated as an array), which can be read like this:

    let results: [Float32] = 
          [Float32](unsafeData: outputTensor.data) ?? []

If you’re not familiar with the ?? syntax, this says to make the results an array of Float32 by copying the output tensor into it, and if that fails, to make it an empty array. For this code to work, you’ll need to implement an Array extension; the full code for that will be shown in a moment.

Once you have the results in an array, the first element will be your result. If this fails, just return nil:

    guard let result = results.first else {
        return nil
      }
      return result
    }

The function began with a do{, so you’ll need to catch any errors, print them, and return nil in that event:

  catch {
      print(error)
      return nil
    }
  }
}

Finally, still in ModelParser.swift, you can add the Array extension that handles the unsafe data and loads it into an array:

extension Array {
  init?(unsafeData: Data) {
    guard unsafeData.count % MemoryLayout<Element>.stride == 0
        else { return nil }
    #if swift(>=5.0)
    self = unsafeData.withUnsafeBytes {
      .init($0.bindMemory(to: Element.self))
    }
    #else
    self = unsafeData.withUnsafeBytes {
      .init(UnsafeBufferPointer<Element>(
        start: $0,
        count: unsafeData.count / MemoryLayout<Element>.stride
      ))
    }
    #endif  // swift(>=5.0)
  }
}

This is a handy helper that you can use if you want to parse floats directly out of a TensorFlow Lite model.

Now that the class for parsing the model is done, the next step is to add the model to your app.

Step 6. Add the Model to Your App

To add the model to your app, you’ll need a models directory within the app. In Xcode, right-click on the firstlite folder and select New Group (Figure 14-10). Call the new group models.

Figure 14-10. Adding a new group to your app

You can get the model by training the simple Y = 2X – 1 sample from Chapter 12. If you don’t have it already, you can use the Colab in the book’s GitHub repository.

Once you have the converted model file (called model.tflite), you can drag and drop it into Xcode on the models group you just added. Select “Copy items if needed” and ensure you add it to the target firstlite by checking the box beside it (Figure 14-11).

Figure 14-11. Adding the model to your project

The model will now be in your project and available for inference. The final step is to complete the user interface logic—then you’ll be ready to go!

Step 7. Add the UI Logic

Earlier, you created the storyboard containing the UI description and began editing the ViewController.swift file containing the UI logic. As most of the work of inference has now been offloaded to the ModelParser class, the UI logic should be very light.

Start by adding a private variable declaring an instance of the ModelParser class:

private var modelParser: ModelParser? =
    ModelParser(modelFileInfo: ModelFile.modelInfo)

Previously, you created an action on the button called btnGo. This will be called when the user touches the button. Update that to execute a function called doInference when the user takes that action:

@IBAction func btnGo(_ sender: Any) {
  doInference()
}

Next you’ll construct the doInference function:

private func doInference() {

The text field that the user will enter data into is called txtUserData. Read this value, and if it’s empty just set the result to 0.00 and don’t bother with any inference:

guard let text = txtUserData.text, text.count > 0 else {
    txtResult.text = "0.00"
    return
  }

Otherwise, convert it to a float. If this fails, exit the function:

guard let value = Float(text) else {
    return
  }

If the code has reached this point, you can now run the model, passing it that input. The ModelParser will do the rest, returning you either a result or nil. If the return value is nil, then you’ll exit the function:

guard let result = self.modelParser?.runModel(withInput: value) else {
    return
  }

Finally, if you’ve reached this point, you have a result, so you can load it into the label (called txtResult) by formatting the float as a string:

txtResult.text = String(format: "%.2f", result)

That’s it! The complexity of the model loading and inference has been handled by the ModelParser class, keeping your ViewController very light. For convenience, here’s the complete listing:

import UIKit

class ViewController: UIViewController {
  private var modelParser: ModelParser? =
      ModelParser(modelFileInfo: ModelFile.modelInfo)
  @IBOutlet weak var txtUserData: UITextField!
    
  @IBOutlet weak var txtResult: UILabel!
  @IBAction func btnGo(_ sender: Any) {
    doInference()
  }
  override func viewDidLoad() {
    super.viewDidLoad()
    // Do any additional setup after loading the view.
  }
  private func doInference() {
      
    guard let text = txtUserData.text, text.count > 0 else {
      txtResult.text = "0.00"
      return
    }
    guard let value = Float(text) else {
      return
    }
    guard let result = self.modelParser?.runModel(withInput: value) else {
      return
    }
    txtResult.text = String(format: "%.2f", result)
  }

}

You’ve now done everything you need to get the app working. Run it, and you should see it in the simulator. Type a number in the text field, press the button, and you should see a result in the results field, as shown in Figure 14-12.

Figure 14-12. Running the app in the iPhone Simulator

While this was a long journey for a very simple app, it should provide a good template to help you understand how TensorFlow Lite works. In this walkthrough you saw how to:

  • Use pods to add the TensorFlow Lite dependencies.

  • Add a TensorFlow Lite model to your app.

  • Load the model into an interpreter.

  • Access the input tensors, and write directly to their memory.

  • Read the memory from the output tensors and copy that to high-level data structures like float arrays.

  • Wire it all up to a user interface with a storyboard and view controller.

In the next section, you’ll move beyond this simple scenario and look at handling more complex data.

Moving Beyond “Hello World”—Processing Images

In the previous example you saw how to create a full app that uses TensorFlow Lite to do very simple inference. However, despite the simplicity of the app, the process of getting data into the model and parsing data out of the model can be a little unintuitive because you’re handling low-level bits and bytes. As you get into more complex scenarios, such as managing images, the good news is that the process isn’t that much more complicated.

Consider the Dogs vs. Cats model you created in Chapter 12. In this section you’ll see how to create an iOS app in Swift with a trained model that, given an image of a cat or a dog, will be able to infer what is in the picture. The full app code is available in the GitHub repo for this book.

First, recall that the tensor for an image has three dimensions: width, height, and color depth. So, for example, when using the MobileNet architecture that the Dogs vs. Cats mobile sample is based on, the dimensions are 224 × 224 × 3—each image is 224 × 224 pixels and has 3 bytes for color depth. Note that each pixel is represented by a value between 0 and 1 indicating the intensity of that pixel on the red, green, and blue channels.

In iOS, images are typically represented as instances of the UIImage class, which has a useful pixelBuffer property that returns a buffer of all the pixels in the image.

Within the CoreImage libraries, there’s a CVPixelBufferGetPixelFormatType API that will return the type of the pixel buffer:

let sourcePixelFormat = CVPixelBufferGetPixelFormatType(pixelBuffer)

This will typically be a 32-bit image with channels for alpha (aka transparency), red, green, and blue. However, there are multiple variants, generally with these channels in different orders. You’ll want to ensure that it’s one of these formats, as the rest of the code won’t work if the image is stored in a different format:

assert(sourcePixelFormat == kCVPixelFormatType_32ARGB ||
  sourcePixelFormat == kCVPixelFormatType_32BGRA ||
  sourcePixelFormat == kCVPixelFormatType_32RGBA)

As the desired format is 224 × 224, which is square, the best thing to do next is to crop the image to the largest square in its center, using the centerThumbnail property, and then scale this down to 224 × 224:

let scaledSize = CGSize(width: 224, height: 224)
guard let thumbnailPixelBuffer = 
    pixelBuffer.centerThumbnail(ofSize: scaledSize) 
else {
  return nil
}

Now that you have the image resized to 224 × 224, the next step is to remove the alpha channel. Remember that the model was trained on 224 × 224 × 3, where the 3 is the RGB channels, so there is no alpha.

Now that you have a pixel buffer, you need to extract the RGB data from it. This helper function achieves that for you by finding the alpha channel and slicing it out:

private func rgbDataFromBuffer(_ buffer: CVPixelBuffer,
                                byteCount: Int) -> Data? {

  CVPixelBufferLockBaseAddress(buffer, .readOnly)
  defer { CVPixelBufferUnlockBaseAddress(buffer, .readOnly) }
  guard let mutableRawPointer = 
      CVPixelBufferGetBaseAddress(buffer) 
  else {
    return nil
  }
    
  let count = CVPixelBufferGetDataSize(buffer)
  let bufferData = Data(bytesNoCopy: mutableRawPointer,
                          count: count, deallocator: .none)

  var rgbBytes = [Float](repeating: 0, count: byteCount)
  var index = 0

  for component in bufferData.enumerated() {
    let offset = component.offset
    let isAlphaComponent = (offset % alphaComponent.baseOffset) == 
     alphaComponent.moduloRemainder

    guard !isAlphaComponent else { continue }

     rgbBytes[index] = Float(component.element) / 255.0
    index += 1
  }
    
  return rgbBytes.withUnsafeBufferPointer(Data.init)
  
}

This code uses an extension called Data that copies the raw bytes into an array:

extension Data {
  init<T>(copyingBufferOf array: [T]) {
    self = array.withUnsafeBufferPointer(Data.init)
  }
}

Now you can pass the thumbnail pixel buffer you just created to rgbDataFromBuffer:

guard let rgbData = rgbDataFromBuffer(
    thumbnailPixelBuffer,
    byteCount: 224 * 224 * 3
    ) 
else {
  print("Failed to convert the image buffer to RGB data.")
  return nil
}

At this point you have the raw RGB data that is in the format the model expects, and you can copy it directly to the input tensor:

try interpreter.allocateTensors()
try interpreter.copy(rgbData, toInputAt: 0)

You can then invoke the interpreter and read the output tensor:

try interpreter.invoke()
outputTensor = try interpreter.output(at: 0)

In the case of Dogs vs. Cats, you have as output a float array with two values, the first being the probability that the image is of a cat and the second that it’s a dog. This is the same results code as you saw earlier, and it uses the same Array extension from the previous example:

let results = [Float32](unsafeData: outputTensor.data) ?? []

As you can see, although this is a more complex example, the same design pattern holds. You must understand your model’s architecture, and the raw input and output formats. You must then structure your input data in the way the model expects—which often means getting down to raw bytes that you write into a buffer, or at least simulate using an array. You then have to read the raw stream of bytes coming out of the model and create a data structure to hold them. From the output perspective this will almost always be something like we’ve seen in this chapter—an array of floats. With the helper code you’ve implemented, you’re most of the way there!

TensorFlow Lite Sample Apps

The TensorFlow team has built a large set of sample apps and is constantly adding to it. Armed with what you’ve learned in this chapter, you’ll be able to explore these and understand their input/output logic. At the time of writing, for iOS there are sample apps for:

Image classification
Read the device’s camera and classify up to a thousand different items.
Object detection
Read the device’s camera and give bounding boxes to objects that are detected.
Pose estimation
Take a look at the figures in the camera and infer their poses.
Speech recognition
Recognize common verbal commands.
Gesture recognition
Train a model for hand gestures and recognize them in the camera.
Image segmentation
Similar to object detection, but predict which class each pixel in an image belongs to.
Digit classifier
Recognize handwritten digits.

Summary

In this chapter you learned how to incorporate TensorFlow Lite into iOS apps by taking a comprehensive walkthrough of building a simple app that used the interpreter to invoke a model to perform inference. In particular, you saw how when dealing with models you have to get low-level with the data, ensuring that your input matches what the model expects. You also saw how to parse the raw data that comes out of the model. This is just the beginning of a long and fun journey toward putting machine learning into the hands of iOS users. In the next chapter we’ll move away from native mobile development to look at how TensorFlow.js can be used to train and run inference on models in the browser.