{"id":66642,"date":"2024-09-26T15:55:19","date_gmt":"2024-09-26T10:25:19","guid":{"rendered":"https:\/\/www.tothenew.com\/blog\/?p=66642"},"modified":"2024-09-26T16:23:29","modified_gmt":"2024-09-26T10:53:29","slug":"building-a-text-recognition-app-using-camerax-and-ml-kit-in-jetpack-compose","status":"publish","type":"post","link":"https:\/\/www.tothenew.com\/blog\/building-a-text-recognition-app-using-camerax-and-ml-kit-in-jetpack-compose\/","title":{"rendered":"Building a Text Recognition App Using CameraX and ML Kit in Android"},"content":{"rendered":"<p>With the increasing demand for intelligent apps that can process and understand visual data, <a href=\"https:\/\/www.tothenew.com\/blog\/text-extraction-from-pdf-using-ocr-optical-character-recognition-in-python\/\">text recognition<\/a> is becoming a key feature in many applications. This blog will walk you through building a powerful text recognition app using Google\u2019s MLKit, CameraX APIs, and Jetpack Compose. MLKit offers a robust <a href=\"https:\/\/www.tothenew.com\/data-analytics\/data-science\">Machine Learning solution<\/a> for on-device text recognition, while CameraX provides an easy way to integrate camera functionality. Combining these with Jetpack Compose\u2019s modern <a href=\"https:\/\/www.tothenew.com\/cx\/experience-design\">UI<\/a> toolkit, we&#8217;ll create a seamless and responsive app. Before we start diving into the implementation, let&#8217;s first understand the Key components used for the implementation.<\/p>\n<h3>Why Use ML Kit for Text Recognition?<\/h3>\n<p><strong>ML Kit<\/strong> is a machine learning framework provided by Google, designed to bring powerful machine learning capabilities to mobile apps without needing in-depth knowledge of ML algorithms. One of its key features is text recognition, which allows developers to extract text from images with high accuracy. It&#8217;s a cloud-independent solution, meaning it works even offline, making it highly suitable for mobile apps that need robust and quick text recognition.<\/p>\n<h4>Read More: <a href=\"https:\/\/www.tothenew.com\/blog\/react-speech-recognition-hook\/\">Web Speech API<\/a><\/h4>\n<h3>Using CameraX for Capturing Image<\/h3>\n<p><strong>CameraX<\/strong> is an Android Jetpack library that simplifies the camera implementation for developers. It supports various use cases such as preview, image capture, and video recording. In our app, we are using CameraX for <strong>image capture<\/strong>, but it could also be adapted for <strong>continuous recognition<\/strong>.<\/p>\n<h4>Single Image Capture<\/h4>\n<p>CameraX can be used to capture a single image for processing. This is the approach used in our app, where the user manually captures an image by pressing a button. This method is better suited when you&#8217;re capturing static documents or screenshots for text recognition. Alternatively, if you don&#8217;t need Single Image Capture you can consider Continuous Recognition.<\/p>\n<h4>Continuous Text Recognition with CameraX<\/h4>\n<p>For continuous recognition, CameraX&#8217;s ImageAnalysis use case can be used. Instead of capturing a single image and processing it, ImageAnalysis continuously analyzes frames from the camera and sends them to ML Kit for text recognition. This approach is useful when you want to scan text continuously, like in barcode or document-scanning apps. We can use ImageAnalysis for Continuous Text Recognition.<\/p>\n<p><strong>Now Let&#8217;s Begin with the project.<\/strong> And understand how we can achieve text Recognition. We will start with the project setup.<\/p>\n<h3>Project Setup<\/h3>\n<p>Before we begin, ensure you&#8217;ve added the necessary dependencies in your build. gradle:<\/p>\n<pre>- Added dependency in library libs.versions.toml \r\n#camera \r\ncameraX = \"1.3.4\" \r\n\r\n#MLkit\r\nplayServicesMlkitTextRecognitionCommon = \"19.1.0\" \r\ntextRecognition = \"16.0.1\" \r\n\r\n- Implementing dependency in build.gradle(app) \r\n\r\n\/\/Dependency for camera \r\nimplementation(libs.camera2) \r\nimplementation(libs.cameraView) \r\nimplementation(libs.cameraLifecycle) \r\n\r\n\/\/ Dependency For Google ML Kit \r\nimplementation(libs.play.services.mlkit.text.recognition.common) \r\nimplementation (libs.text.recognition)<\/pre>\n<h3>Handling Camera Permissions<\/h3>\n<p>First, we will add permissions to the AndroidManifest file:<\/p>\n<pre>&lt;!-- Permission for using camera --&gt;\r\n&lt;uses-feature android:name=\"android.hardware.camera.any\" \/&gt;\r\n&lt;uses-permission android:name=\"android.permission.CAMERA\" \/&gt;<\/pre>\n<p>Next, We&#8217;d first need to request camera permission from the user. When we get permission then we will proceed further. Here&#8217;s how we handle permissions in the CameraPermissionHandler composable:<\/p>\n<pre>- Requested user to provide permission for camera.\r\n \r\n@Composable\r\nfun CameraPermissionHandler(onPermissionGranted: () -&gt; Unit) {\r\n val cameraPermission = Manifest.permission.CAMERA\r\n val context = LocalContext.current\r\n\r\n val permissionLauncher = rememberLauncherForActivityResult(\r\n   contract = ActivityResultContracts.RequestPermission(),\r\n     onResult = { isGranted -&gt;\r\n       if (isGranted) {\r\n         onPermissionGranted()\r\n       } else {\r\n         Toast.makeText(context, \"Camera permission denied\", Toast.LENGTH_SHORT).show()\r\n       }\r\n   }\r\n )\r\n\r\n\/\/ We will be showing permission popup to the user.\r\n\r\n LaunchedEffect(key1 = true) {\r\n   when {\r\n      ContextCompat.checkSelfPermission(\r\n       context,\r\n       cameraPermission\r\n      ) == PackageManager.PERMISSION_GRANTED -&gt; {\r\n        onPermissionGranted()\r\n      }\r\n\r\n      else -&gt; {\r\n         permissionLauncher.launch(cameraPermission)\r\n      }\r\n   }\r\n  }\r\n}\r\n\r\n\r\n- Now when user grant permission then we will start camera\r\n\r\n@Composable\r\nfun CameraPermissionScreen() {\r\n   var permissionGranted by remember { mutableStateOf(false) }\r\n\r\n    \/\/ Handle the permission request\r\n   CameraPermissionHandler(\r\n      onPermissionGranted = {\r\n         permissionGranted = true\r\n      }\r\n   )\r\n\r\n   \/\/ Show the TextRecognitionScreen only if permission is granted\r\n  if (permissionGranted) {\r\n       TextRecognitionScreen()\r\n  }\r\n}<\/pre>\n<ul>\n<li><strong>CameraPermissionHandler:<\/strong> This composable is responsible for requesting camera permission from the user.<\/li>\n<li><strong>State Handling:<\/strong> Compose&#8217;s remember and mutableStateOf are used to manage the state of whether the permission is granted or not.<\/li>\n<\/ul>\n<h3>Capturing the Image<\/h3>\n<p>Once the permission is granted, we can proceed to display the camera preview and capture images. This is handled by the CameraPreview composable:<\/p>\n<pre>@Composable\r\nfun CameraPreview(modifier: Modifier, onCapture: (ImageProxy) -&gt; Unit) {\r\n  val context = LocalContext.current\r\n  val lifecycleOwner = LocalLifecycleOwner.current\r\n  val previewView = remember { PreviewView(context) }\r\n\r\n  var imageCapture: ImageCapture? by remember { mutableStateOf(null) }\r\n\r\n\r\n  Box(modifier = Modifier.padding(bottom = 50.dp)) {\r\n    AndroidView({ previewView }, modifier = modifier) { view -&gt;\r\n      val cameraProviderFuture = ProcessCameraProvider.getInstance(context)\r\n      cameraProviderFuture.addListener({\r\n        val cameraProvider = cameraProviderFuture.get()\r\n        val preview = androidx.camera.core.Preview.Builder().build()\r\n        val cameraSelector = CameraSelector.DEFAULT_BACK_CAMERA\r\n\r\n        imageCapture = ImageCapture.Builder().build()\r\n\r\n        preview.setSurfaceProvider(view.surfaceProvider)\r\n\r\n        try {\r\n      \/\/ We are here binding the cameraSelector, preview and image capture with lifecycle. So that camera can behaves properly during activity lifecycle events \r\n          cameraProvider.unbindAll()\r\n          cameraProvider.bindToLifecycle(\r\n            lifecycleOwner,\r\n            cameraSelector,\r\n            preview,\r\n            imageCapture\r\n         )\r\n       } catch (e: Exception) {\r\n         Log.e(\"CameraPreview\", \"Use case binding failed\", e)\r\n       }\r\n      }, ContextCompat.getMainExecutor(context))\r\n    }\r\n\/\/ This Btn will be used to capture image.\r\n    FloatingActionButton(\r\n      onClick = {\r\n        imageCapture?.takePicture(\r\n          ContextCompat.getMainExecutor(context),\r\n          object : ImageCapture.OnImageCapturedCallback() {\r\n            override fun onCaptureSuccess(imageProxy: ImageProxy) {\r\n              onCapture(imageProxy)\r\n              imageProxy.close()\r\n            }\r\n\r\n            override fun onError(exception: ImageCaptureException) {\r\n              Log.e(\"CameraCapture\", \"Capture failed: ${exception.message}\")\r\n            }\r\n          }\r\n        )\r\n      },\r\n    modifier = Modifier\r\n       .padding(32.dp)\r\n       .align(Alignment.BottomCenter)\r\n\r\n    ) {\r\n       Text(\"Capture Image\")\r\n     }\r\n   }\r\n}<\/pre>\n<p>We are creating a camera preview and Button to capture images. We are preparing camera functionality with the UI we want to show to the user.<\/p>\n<ul>\n<li><strong>Camera Preview:<\/strong> Displays the camera feed using PreviewView from CameraX, embedded in a Compose UI via AndroidView.<\/li>\n<li><strong>Capture Button:<\/strong> A floating action button captures an image when clicked.<\/li>\n<li><strong>Image Capture:<\/strong> CameraX captures the image and passes it to the callback (onCapture), where further processing can occur (e.g., text recognition).<\/li>\n<li><strong>Lifecycle Management:<\/strong> Camera use cases are bound to the lifecycle of the composable, ensuring the camera behaves properly during activity lifecycle events (e.g., backgrounding or closing the app).<\/li>\n<\/ul>\n<div id=\"attachment_66735\" style=\"width: 471px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-66735\" decoding=\"async\" loading=\"lazy\" class=\"wp-image-66735 size-large\" src=\"https:\/\/www.tothenew.com\/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot_20240919_190529-461x1024.png\" alt=\"Camera Preview With Button\" width=\"461\" height=\"1024\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot_20240919_190529-461x1024.png 461w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot_20240919_190529-135x300.png 135w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot_20240919_190529-691x1536.png 691w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot_20240919_190529-624x1387.png 624w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot_20240919_190529.png 720w\" sizes=\"(max-width: 461px) 100vw, 461px\" \/><p id=\"caption-attachment-66735\" class=\"wp-caption-text\">Camera Preview With Button<\/p><\/div>\n<h3>Processing the Image for Text Recognition<\/h3>\n<p>Once an image is captured, it\u2019s passed to the ML Kit text recognizer in the TextRecognitionViewModel. This is where the core functionality of the app lies.<\/p>\n<pre>class TextRecognitionViewModel : ViewModel() {\r\n  private val _recognizedText = mutableStateOf&lt;String?&gt;(null)\r\n  val recognizedText = _recognizedText\r\n\r\n  fun recognizeText(bitmap: Bitmap) {\r\n    val recognizer = TextRecognition.getClient(TextRecognizerOptions.DEFAULT_OPTIONS)\r\n    val image = InputImage.fromBitmap(bitmap, 0)\r\n\r\n    recognizer.process(image)\r\n      .addOnSuccessListener { visionText -&gt;\r\n        _recognizedText.value = visionText.text\r\n      }\r\n      .addOnFailureListener { e -&gt;\r\n        _recognizedText.value = \"Error: ${e.message}\"\r\n      }\r\n   }\r\n}\r\n\r\n<\/pre>\n<p>The recognize text function takes a Bitmap as input and uses the TextRecognition.getClient() method to recognize text from the image. The recognized text is then stored in the _recognizedText state.<\/p>\n<h3>Displaying the Recognized Text<\/h3>\n<p>The recognized text is displayed on a bottom sheet. The user can copy the recognized text by clicking on it.<\/p>\n<p>Now, Here we will Implement TextRecognitionScreen which will use the camera preview.<\/p>\n<pre>@OptIn(ExperimentalMaterial3Api::class)\r\n@Composable\r\nfun TextRecognitionScreen(viewModel: TextRecognitionViewModel = viewModel()) {\r\n  val recognizedText by viewModel.recognizedText\r\n  val context = LocalContext.current\r\n\r\n\r\n    \/\/ Bottom Sheet State\r\n  val sheetState = rememberBottomSheetScaffoldState()\r\n  val coroutineScope = rememberCoroutineScope()\r\n  val clipboard: ClipboardManager = LocalClipboardManager.current\r\n\r\n\/\/ We will be using BottomSheetScaffold for getting the text and showing it on bottom sheet.\r\n  BottomSheetScaffold(\r\n    sheetContent = {\r\n    recognizedText?.let {\r\n      \/\/ Content of the bottom sheet\r\n    LazyColumn(\r\n      modifier = Modifier\r\n        .fillMaxWidth()\r\n        .padding(16.dp)\r\n        .padding(bottom = 60.dp)\r\n        .heightIn(max = 500.dp) \/\/ Limiting the height for the bottom sheet\r\n      ) {\r\n        item {\r\n             \/\/ If the text extracted is not empty then we will be able to copy the text which is returned from viewModel.\r\n           if (it.isNotEmpty()) {\r\n              Text(\r\n                text = it,\r\n                modifier = Modifier\r\n                  .fillMaxWidth()\r\n                  .padding(16.dp)\r\n                  .clickable {\r\n                      clipboard.setText(AnnotatedString((it)))\r\n                      Toast.makeText(context,\"Text Copied!\",Toast.LENGTH_SHORT).show()\r\n                   }\r\n            )\r\n          } else {\r\n            Text(\r\n              text = \"No text recognized yet\",\r\n              modifier = Modifier\r\n                .fillMaxWidth()\r\n                .padding(16.dp)\r\n                .padding(bottom = 100.dp)\r\n            )\r\n        }\r\n\r\n      }\r\n    }\r\n  } ?: Text(\r\n        text = \"No text recognized yet\",\r\n        modifier = Modifier\r\n          .fillMaxWidth()\r\n          .padding(16.dp)\r\n          .padding(bottom = 100.dp)\r\n       )\r\n  },\r\n  scaffoldState = sheetState,\r\n  sheetPeekHeight = 0.dp,\r\n  modifier = Modifier.fillMaxSize()\r\n  ) {\r\n    Box(\r\n      modifier = Modifier.fillMaxSize()\r\n    ) {\r\n      CameraPreview(modifier = Modifier.fillMaxSize()) { imageProxy -&gt;\r\n      val bitmap = imageProxy.toBitmapImage()\r\n      if (bitmap != null) {\r\n        viewModel.recognizeText(bitmap)\r\n\r\n        coroutineScope.launch {\r\n          sheetState.bottomSheetState.expand()\r\n        }\r\n      }\r\n     }\r\n    }\r\n  }\r\n}<\/pre>\n<p>The camera preview is displayed using PreviewView, and a floating action button (FAB) is provided to capture the image. The captured image is passed as an <strong>ImageProxy<\/strong> object to the <strong>onCapture<\/strong> callback.<\/p>\n<h3>Converting ImageProxy to Bitmap<\/h3>\n<p>The ImageProxy object needs to be converted to a Bitmap before being passed to MLKit. Here&#8217;s how it&#8217;s done:<\/p>\n<pre>private fun ImageProxy.toBitmapImage(): Bitmap? {\r\n  val buffer: ByteBuffer = planes[0].buffer\r\n  val bytes = ByteArray(buffer.remaining())\r\n  buffer[bytes]\r\n  return BitmapFactory.decodeByteArray(bytes, 0, bytes.size, null)\r\n}<\/pre>\n<p>Now at the end you just need to add the method in your project where you want to implement:<\/p>\n<pre>\/\/ This method is implemented in oncreate in this blog.\r\nsetContent {\r\n  TextVisionTheme {\r\n    Surface(\r\n      modifier = Modifier.fillMaxSize(),\r\n      color = MaterialTheme.colorScheme.background\r\n    ) {\r\n      CameraPermissionScreen()\r\n    }\r\n  }\r\n}<\/pre>\n<p>&nbsp;<\/p>\n<div id=\"attachment_66733\" style=\"width: 471px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-66733\" decoding=\"async\" loading=\"lazy\" class=\"wp-image-66733 size-large\" src=\"https:\/\/www.tothenew.com\/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot_20240919_190922-461x1024.png\" alt=\"Extracted Text\" width=\"461\" height=\"1024\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot_20240919_190922-461x1024.png 461w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot_20240919_190922-135x300.png 135w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot_20240919_190922-691x1536.png 691w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot_20240919_190922-624x1387.png 624w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot_20240919_190922.png 720w\" sizes=\"(max-width: 461px) 100vw, 461px\" \/><p id=\"caption-attachment-66733\" class=\"wp-caption-text\">Screen After Getting Text- Final Output.<\/p><\/div>\n<h3>Conclusion<\/h3>\n<p>This app demonstrates how to integrate CameraX for capturing images and ML Kit for recognizing text from images in real time. The use of Jetpack Compose makes UI development modern and efficient. With these tools, building a powerful text recognition app is straightforward and seamless. TO THE NEW&#8217;s\u00a0 <a href=\"https:\/\/www.tothenew.com\/data-analytics\/data-science\">Advanced Analytics<\/a> offering enables your business to mitigate risk by letting you make decisions instantly to help your business grow.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>With the increasing demand for intelligent apps that can process and understand visual data, text recognition is becoming a key feature in many applications. This blog will walk you through building a powerful text recognition app using Google\u2019s MLKit, CameraX APIs, and Jetpack Compose. MLKit offers a robust Machine Learning solution for on-device text recognition, [&hellip;]<\/p>\n","protected":false},"author":1968,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"iawp_total_views":408},"categories":[518],"tags":[4845,6585,6584,6586],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/66642"}],"collection":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/users\/1968"}],"replies":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/comments?post=66642"}],"version-history":[{"count":14,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/66642\/revisions"}],"predecessor-version":[{"id":67905,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/66642\/revisions\/67905"}],"wp:attachment":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/media?parent=66642"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/categories?post=66642"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/tags?post=66642"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}