How to split a string in JVM languages like Java, Scala & Kotlin

Published On: 2019/06/04

In this post, we will look into the details of various ways of splitting a given string based on a specified delimiter. This is mostly usefull while processing a csv, tsv files which is a common scenario in file processing.

When we say JVM languages, the main 3 most popular languages come into mind are Java, Scala and Kotlin. We will be going through the sample code and its explanation in this article.

Let us split the string “HP|DL360p|Xenon|E5-2603|1.80 GHz|16 GB DDR3”. Since we are using pipe character as the delimiter, it has to be escaped to split the string. This constraint is applicable only to some of the special characters like “|” and “.” .

Java

In this sample I have implemented the string split with split function of String class and using a StringTokenizer. The tokenizer ignores the null values during split but the split function preserves the null values.

package com.asyncstream.examples;

import java.util.StringTokenizer;

public class StringSplitMain {

    public static void main(String[] args) {
	String toSplit1 = "HP|DL360p|Xenon|E5-2603|1.80 GHz|16 GB DDR3";
        String toSplit2 = "HP|DL360p|Xenon|E5-2603||16 GB DDR3";
        StringSplitMain main = new StringSplitMain();
        System.out.println("** Split using string.split function **");
        System.out.println("== Scenario 1 ==");
        main.splitWithEscapedCharater(toSplit1,"\\|");
        System.out.println("== Scenario 2 ==");
        main.splitWithEscapedCharater(toSplit2,"\\|");
        System.out.println("** Split using a tokenizer **");
        System.out.println("== Scenario 1 ==");
        main.splitWithTokenizer(toSplit1,"|");
        System.out.println("== Scenario 2 ==");
        main.splitWithTokenizer(toSplit2,"|");
    }

    private void splitWithEscapedCharater(String value,String regex){
        String[] wordArray = value.split("\\|");
        for (String word:wordArray) {
            System.out.println(word);
        }
    }

    private void splitWithTokenizer(String value,String delimiter){
        StringTokenizer tokenizer = new StringTokenizer(value,delimiter);
        while (tokenizer.hasMoreElements()){
            System.out.println(tokenizer.nextElement());
        }

    }
}

Output is:

** Split using string.split function **
== Scenario 1 ==
HP
DL360p
Xenon
E5-2603
1.80 GHz
16 GB DDR3
== Scenario 2 ==
HP
DL360p
Xenon
E5-2603

16 GB DDR3
** Split using a tokenizer **
== Scenario 1 ==
HP
DL360p
Xenon
E5-2603
1.80 GHz
16 GB DDR3
== Scenario 2 ==
HP
DL360p
Xenon
E5-2603
16 GB DDR3

Scala

object Split {
  val toSplit1 = "HP|DL360p|Xenon|E5-2603|1.80 GHz|16 GB DDR3";
                                                  //> toSplit1  : String = HP|DL360p|Xenon|E5-2603|1.80 GHz|16 GB DDR3
  val toSplit2 = "HP|DL360p|Xenon|E5-2603||16 GB DDR3";
                                                  //> toSplit2  : String = HP|DL360p|Xenon|E5-2603||16 GB DDR3
  toSplit1.split("\\|").foreach(e=>println(e));   //> HP
                                                  //| DL360p
                                                  //| Xenon
                                                  //| E5-2603
                                                  //| 1.80 GHz
                                                  //| 16 GB DDR3

  toSplit2.split("\\|").foreach(e=>println(e))
                                                  //> HP
                                                  //| DL360p
                                                  //| Xenon
                                                  //| E5-2603
                                                  //| 
                                                  //| 16 GB DDR3
}

Kotlin

fun main(args: Array<String>){
    val toSplit1 = "HP|DL360p|Xenon|E5-2603|1.80 GHz|16 GB DDR3";
    val toSplit2 = "HP|DL360p|Xenon|E5-2603||16 GB DDR3";
    toSplit1.split("|").forEach{e-> println(e)}
    println("== Scenario two with null values ==")
    toSplit2.split("|").forEach{e-> println(e)}
}

Conclusion

As we have seen above the split function is almost alike in these 3 JVM languages.

comments powered by Disqus