How to split a string in JVM languages like Java, Scala & Kotlin
Published On: 2019/06/04
In this post, we will look into the details of various ways of splitting a given string based on a specified delimiter. This is mostly usefull while processing a csv, tsv files which is a common scenario in file processing.
When we say JVM languages, the main 3 most popular languages come into mind are Java, Scala and Kotlin. We will be going through the sample code and its explanation in this article.
Let us split the string “HP|DL360p|Xenon|E5-2603|1.80 GHz|16 GB DDR3”. Since we are using pipe character as the delimiter, it has to be escaped to split the string. This constraint is applicable only to some of the special characters like “|” and “.” .
Java
In this sample I have implemented the string split with split function of String class and using a StringTokenizer. The tokenizer ignores the null values during split but the split function preserves the null values.
package com.asyncstream.examples;
import java.util.StringTokenizer;
public class StringSplitMain {
public static void main(String[] args) {
String toSplit1 = "HP|DL360p|Xenon|E5-2603|1.80 GHz|16 GB DDR3";
String toSplit2 = "HP|DL360p|Xenon|E5-2603||16 GB DDR3";
StringSplitMain main = new StringSplitMain();
System.out.println("** Split using string.split function **");
System.out.println("== Scenario 1 ==");
main.splitWithEscapedCharater(toSplit1,"\\|");
System.out.println("== Scenario 2 ==");
main.splitWithEscapedCharater(toSplit2,"\\|");
System.out.println("** Split using a tokenizer **");
System.out.println("== Scenario 1 ==");
main.splitWithTokenizer(toSplit1,"|");
System.out.println("== Scenario 2 ==");
main.splitWithTokenizer(toSplit2,"|");
}
private void splitWithEscapedCharater(String value,String regex){
String[] wordArray = value.split("\\|");
for (String word:wordArray) {
System.out.println(word);
}
}
private void splitWithTokenizer(String value,String delimiter){
StringTokenizer tokenizer = new StringTokenizer(value,delimiter);
while (tokenizer.hasMoreElements()){
System.out.println(tokenizer.nextElement());
}
}
}
Output is:
** Split using string.split function **
== Scenario 1 ==
HP
DL360p
Xenon
E5-2603
1.80 GHz
16 GB DDR3
== Scenario 2 ==
HP
DL360p
Xenon
E5-2603
16 GB DDR3
** Split using a tokenizer **
== Scenario 1 ==
HP
DL360p
Xenon
E5-2603
1.80 GHz
16 GB DDR3
== Scenario 2 ==
HP
DL360p
Xenon
E5-2603
16 GB DDR3
Scala
object Split {
val toSplit1 = "HP|DL360p|Xenon|E5-2603|1.80 GHz|16 GB DDR3";
//> toSplit1 : String = HP|DL360p|Xenon|E5-2603|1.80 GHz|16 GB DDR3
val toSplit2 = "HP|DL360p|Xenon|E5-2603||16 GB DDR3";
//> toSplit2 : String = HP|DL360p|Xenon|E5-2603||16 GB DDR3
toSplit1.split("\\|").foreach(e=>println(e)); //> HP
//| DL360p
//| Xenon
//| E5-2603
//| 1.80 GHz
//| 16 GB DDR3
toSplit2.split("\\|").foreach(e=>println(e))
//> HP
//| DL360p
//| Xenon
//| E5-2603
//|
//| 16 GB DDR3
}
Kotlin
fun main(args: Array<String>){
val toSplit1 = "HP|DL360p|Xenon|E5-2603|1.80 GHz|16 GB DDR3";
val toSplit2 = "HP|DL360p|Xenon|E5-2603||16 GB DDR3";
toSplit1.split("|").forEach{e-> println(e)}
println("== Scenario two with null values ==")
toSplit2.split("|").forEach{e-> println(e)}
}
Conclusion
As we have seen above the split function is almost alike in these 3 JVM languages.